1% libxenctrl (libxc) Domain Image Format 2% David Vrabel <<david.vrabel@citrix.com>> 3 Andrew Cooper <<andrew.cooper3@citrix.com>> 4 Wen Congyang <<wency@cn.fujitsu.com>> 5 Yang Hongyang <<hongyang.yang@easystack.cn>> 6% Revision 3 7 8Introduction 9============ 10 11Purpose 12------- 13 14The _domain save image_ is the context of a running domain used for 15snapshots of a domain or for transferring domains between hosts during 16migration. 17 18There are a number of problems with the format of the domain save 19image used in Xen 4.4 and earlier (the _legacy format_). 20 21* Dependant on toolstack word size. A number of fields within the 22 image are native types such as `unsigned long` which have different 23 sizes between 32-bit and 64-bit toolstacks. This prevents domains 24 from being migrated between hosts running 32-bit and 64-bit 25 toolstacks. 26 27* There is no header identifying the image. 28 29* The image has no version information. 30 31A new format that addresses the above is required. 32 33ARM does not yet have have a domain save image format specified and 34the format described in this specification should be suitable. 35 36Not Yet Included 37---------------- 38 39The following features are not yet fully specified and will be 40included in a future draft. 41 42* Page data compression. 43 44* ARM 45 46 47Overview 48======== 49 50The image format consists of two main sections: 51 52* _Headers_ 53* _Records_ 54 55Headers 56------- 57 58There are two headers: the _image header_, and the _domain header_. 59The image header describes the format of the image (version etc.). 60The _domain header_ contains general information about the domain 61(architecture, type etc.). 62 63Records 64------- 65 66The main part of the format is a sequence of different _records_. 67Each record type contains information about the domain context. At a 68minimum there is a END record marking the end of the records section. 69 70 71Fields 72------ 73 74All the fields within the headers and records have a fixed width. 75 76Fields are always aligned to their size. 77 78Padding and reserved fields are set to zero on save and must be 79ignored during restore. 80 81Integer (numeric) fields in the image header are always in big-endian 82byte order. 83 84Integer fields in the domain header and in the records are in the 85endianness described in the image header (which will typically be the 86native ordering). 87 88\clearpage 89 90Headers 91======= 92 93Image Header 94------------ 95 96The image header identifies an image as a Xen domain save image. It 97includes the version of this specification that the image complies 98with. 99 100Tools supporting version _V_ of the specification shall always save 101images using version _V_. Tools shall support restoring from version 102_V_. If the previous Xen release produced version _V_ - 1 images, 103tools shall supported restoring from these. Tools may additionally 104support restoring from earlier versions. 105 106The marker field can be used to distinguish between legacy images and 107those corresponding to this specification. Legacy images will have at 108one or more zero bits within the first 8 octets of the image. 109 110Fields within the image header are always in _big-endian_ byte order, 111regardless of the setting of the endianness bit. 112 113 0 1 2 3 4 5 6 7 octet 114 +-------------------------------------------------+ 115 | marker | 116 +-----------------------+-------------------------+ 117 | id | version | 118 +-----------+-----------+-------------------------+ 119 | options | (reserved) | 120 +-----------+-------------------------------------+ 121 122 123-------------------------------------------------------------------- 124Field Description 125----------- -------------------------------------------------------- 126marker 0xFFFFFFFFFFFFFFFF. 127 128id 0x58454E46 ("XENF" in ASCII). 129 130version 0x00000003. The version of this specification. 131 132options bit 0: Endianness. 0 = little-endian, 1 = big-endian. 133 134 bit 1-15: Reserved. 135-------------------------------------------------------------------- 136 137The endianness shall be 0 (little-endian) for images generated on an 138i386, x86_64, or arm host. 139 140\clearpage 141 142Domain Header 143------------- 144 145The domain header includes general properties of the domain. 146 147 0 1 2 3 4 5 6 7 octet 148 +-----------------------+-----------+-------------+ 149 | type | page_shift| (reserved) | 150 +-----------------------+-----------+-------------+ 151 | xen_major | xen_minor | 152 +-----------------------+-------------------------+ 153 154-------------------------------------------------------------------- 155Field Description 156----------- -------------------------------------------------------- 157type 0x0000: Reserved. 158 159 0x0001: x86 PV. 160 161 0x0002: x86 HVM. 162 163 0x0003 - 0xFFFFFFFF: Reserved. 164 165page_shift Size of a guest page as a power of two. 166 167 i.e., page size = 2 ^page_shift^. 168 169xen_major The Xen major version when this image was saved. 170 171xen_minor The Xen minor version when this image was saved. 172-------------------------------------------------------------------- 173 174The legacy stream conversion tool writes a `xen_major` version of 0, and sets 175`xen_minor` to the version of itself. 176 177\clearpage 178 179Records 180======= 181 182A record has a record header, type specific data and a trailing 183footer. If `body_length` is not a multiple of 8, the body is padded 184with zeroes to align the end of the record on an 8 octet boundary. 185 186 0 1 2 3 4 5 6 7 octet 187 +-----------------------+-------------------------+ 188 | type | body_length | 189 +-----------+-----------+-------------------------+ 190 | body... | 191 ... 192 | | padding (0 to 7 octets) | 193 +-----------+-------------------------------------+ 194 195-------------------------------------------------------------------- 196Field Description 197----------- ------------------------------------------------------- 198type 0x00000000: END 199 200 0x00000001: PAGE_DATA 201 202 0x00000002: X86_PV_INFO 203 204 0x00000003: X86_PV_P2M_FRAMES 205 206 0x00000004: X86_PV_VCPU_BASIC 207 208 0x00000005: X86_PV_VCPU_EXTENDED 209 210 0x00000006: X86_PV_VCPU_XSAVE 211 212 0x00000007: SHARED_INFO 213 214 0x00000008: X86_TSC_INFO 215 216 0x00000009: HVM_CONTEXT 217 218 0x0000000A: HVM_PARAMS 219 220 0x0000000B: TOOLSTACK (deprecated) 221 222 0x0000000C: X86_PV_VCPU_MSRS 223 224 0x0000000D: VERIFY 225 226 0x0000000E: CHECKPOINT 227 228 0x0000000F: CHECKPOINT_DIRTY_PFN_LIST (Secondary -> Primary) 229 230 0x00000010: STATIC_DATA_END 231 232 0x00000011: X86_CPUID_POLICY 233 234 0x00000012: X86_MSR_POLICY 235 236 0x00000013 - 0x7FFFFFFF: Reserved for future _mandatory_ 237 records. 238 239 0x80000000 - 0xFFFFFFFF: Reserved for future _optional_ 240 records. 241 242body_length Length in octets of the record body. 243 244body Content of the record. 245 246padding 0 to 7 octets of zeros to pad the whole record to a multiple 247 of 8 octets. 248-------------------------------------------------------------------- 249 250Records may be _mandatory_ or _optional_. Optional records have bit 25131 set in their type. Restoring an image that has unrecognised or 252unsupported mandatory record must fail. The contents of optional 253records may be ignored during a restore. 254 255The following sub-sections specify the record body format for each of 256the record types. 257 258\clearpage 259 260END 261---- 262 263An end record marks the end of the image, and shall be the final record 264in the stream. 265 266 0 1 2 3 4 5 6 7 octet 267 +-------------------------------------------------+ 268 269The end record contains no fields; its body_length is 0. 270 271\clearpage 272 273PAGE_DATA 274--------- 275 276The bulk of an image consists of many PAGE_DATA records containing the 277memory contents. 278 279 0 1 2 3 4 5 6 7 octet 280 +-----------------------+-------------------------+ 281 | count (C) | (reserved) | 282 +-----------------------+-------------------------+ 283 | pfn[0] | 284 +-------------------------------------------------+ 285 ... 286 +-------------------------------------------------+ 287 | pfn[C-1] | 288 +-------------------------------------------------+ 289 | page_data[0]... | 290 ... 291 +-------------------------------------------------+ 292 | page_data[N-1]... | 293 ... 294 +-------------------------------------------------+ 295 296-------------------------------------------------------------------- 297Field Description 298----------- -------------------------------------------------------- 299count Number of pages described in this record. 300 301pfn An array of count PFNs and their types. 302 303 Bit 63-60: XEN_DOMCTL_PFINFO_* type (from 304 `public/domctl.h` but shifted by 32 bits) 305 306 Bit 59-52: Reserved. 307 308 Bit 51-0: PFN. 309 310page_data page_size octets of uncompressed page contents for each 311 page set as present in the pfn array. 312-------------------------------------------------------------------- 313 314Note: Count is strictly > 0. N is strictly <= C and it is possible for there 315to be no page_data in the record if all pfns are of invalid types. 316 317-------------------------------------------------------------------- 318PFINFO type Value Description 319------------- --------- ------------------------------------------ 320NOTAB 0x0 Normal page. 321 322L1TAB 0x1 L1 page table page. 323 324L2TAB 0x2 L2 page table page. 325 326L3TAB 0x3 L3 page table page. 327 328L4TAB 0x4 L4 page table page. 329 330 0x5-0x8 Reserved. 331 332L1TAB_PIN 0x9 L1 page table page (pinned). 333 334L2TAB_PIN 0xA L2 page table page (pinned). 335 336L3TAB_PIN 0xB L3 page table page (pinned). 337 338L4TAB_PIN 0xC L4 page table page (pinned). 339 340BROKEN 0xD Broken page. 341 342XALLOC 0xE Allocate only. 343 344XTAB 0xF Invalid page. 345-------------------------------------------------------------------- 346 347Table: XEN_DOMCTL_PFINFO_* Page Types. 348 349PFNs with type `BROKEN`, `XALLOC`, or `XTAB` do not have any 350corresponding `page_data`. 351 352The saver uses the `XTAB` type for PFNs that become invalid in the 353guest's P2M table during a live migration[^2]. 354 355Restoring an image with unrecognised page types shall fail. 356 357[^2]: In the legacy format, this is the list of unmapped PFNs in the 358tail. 359 360\clearpage 361 362X86_PV_INFO 363----------- 364 365 0 1 2 3 4 5 6 7 octet 366 +-----+-----+-----------+-------------------------+ 367 | w | ptl | (reserved) | 368 +-----+-----+-----------+-------------------------+ 369 370-------------------------------------------------------------------- 371Field Description 372----------- --------------------------------------------------- 373guest_width (w) Guest width in octets (either 4 or 8). 374 375pt_levels (ptl) Number of page table levels (either 3 or 4). 376-------------------------------------------------------------------- 377 378\clearpage 379 380X86_PV_P2M_FRAMES 381----------------- 382 383 0 1 2 3 4 5 6 7 octet 384 +-----+-----+-----+-----+-------------------------+ 385 | p2m_start_pfn (S) | p2m_end_pfn (E) | 386 +-----+-----+-----+-----+-------------------------+ 387 | p2m_pfn[p2m frame containing pfn S] | 388 +-------------------------------------------------+ 389 ... 390 +-------------------------------------------------+ 391 | p2m_pfn[p2m frame containing pfn E] | 392 +-------------------------------------------------+ 393 394-------------------------------------------------------------------- 395Field Description 396------------- --------------------------------------------------- 397p2m_start_pfn First pfn index in the p2m_pfn array. 398 399p2m_end_pfn Last pfn index in the p2m_pfn array. 400 401p2m_pfn Array of PFNs containing the guest's P2M table, for 402 the PFN frames containing the PFN range S to E 403 (inclusive). 404 405-------------------------------------------------------------------- 406 407\clearpage 408 409X86_PV_VCPU_BASIC, EXTENDED, XSAVE, MSRS 410---------------------------------------- 411 412The format of these records are identical. They are all binary blobs 413of data which are accessed using specific pairs of domctl hypercalls. 414 415 0 1 2 3 4 5 6 7 octet 416 +-----------------------+-------------------------+ 417 | vcpu_id | (reserved) | 418 +-----------------------+-------------------------+ 419 | context... | 420 ... 421 +-------------------------------------------------+ 422 423--------------------------------------------------------------------- 424Field Description 425----------- ---------------------------------------------------- 426vcpu_id The VCPU ID. 427 428context Binary data for this VCPU. 429--------------------------------------------------------------------- 430 431--------------------------------------------------------------------- 432Record type Accessor hypercalls 433----------------------- ---------------------------------------- 434X86_PV_VCPU_BASIC XEN_DOMCTL_{get,set}vcpucontext 435 436X86_PV_VCPU_EXTENDED XEN_DOMCTL_{get,set}\_ext_vcpucontext 437 438X86_PV_VCPU_XSAVE XEN_DOMCTL_{get,set}vcpuextstate 439 440X86_PV_VCPU_MSRS XEN_DOMCTL_{get,set}\_vcpu_msrs 441--------------------------------------------------------------------- 442 443\clearpage 444 445SHARED_INFO 446----------- 447 448The content of the Shared Info page. 449 450 0 1 2 3 4 5 6 7 octet 451 +-------------------------------------------------+ 452 | shared_info | 453 ... 454 +-------------------------------------------------+ 455 456-------------------------------------------------------------------- 457Field Description 458----------- --------------------------------------------------- 459shared_info Contents of the shared info page. This record 460 should be exactly 1 page long. 461-------------------------------------------------------------------- 462 463\clearpage 464 465X86_TSC_INFO 466------------ 467 468Domain TSC information, as accessed by the 469XEN_DOMCTL_{get,set}tscinfo hypercall sub-ops. 470 471 0 1 2 3 4 5 6 7 octet 472 +------------------------+------------------------+ 473 | mode | khz | 474 +------------------------+------------------------+ 475 | nsec | 476 +------------------------+------------------------+ 477 | incarnation | (reserved) | 478 +------------------------+------------------------+ 479 480-------------------------------------------------------------------- 481Field Description 482----------- --------------------------------------------------- 483mode TSC mode, TSC_MODE_* constant. 484 485khz TSC frequency, in kHz. 486 487nsec Elapsed time, in nanoseconds. 488 489incarnation Incarnation. 490-------------------------------------------------------------------- 491 492\clearpage 493 494HVM_CONTEXT 495----------- 496 497HVM Domain context, as accessed by the 498XEN_DOMCTL_{get,set}hvmcontext hypercall sub-ops. 499 500 0 1 2 3 4 5 6 7 octet 501 +-------------------------------------------------+ 502 | hvm_ctx | 503 ... 504 +-------------------------------------------------+ 505 506-------------------------------------------------------------------- 507Field Description 508----------- --------------------------------------------------- 509hvm_ctx The HVM Context blob from Xen. 510-------------------------------------------------------------------- 511 512\clearpage 513 514HVM_PARAMS 515---------- 516 517HVM Domain parameters, as accessed by the 518HVMOP_{get,set}\_param hypercall sub-ops. 519 520 0 1 2 3 4 5 6 7 octet 521 +------------------------+------------------------+ 522 | count (C) | (reserved) | 523 +------------------------+------------------------+ 524 | param[0].index | 525 +-------------------------------------------------+ 526 | param[0].value | 527 +-------------------------------------------------+ 528 ... 529 +-------------------------------------------------+ 530 | param[C-1].index | 531 +-------------------------------------------------+ 532 | param[C-1].value | 533 +-------------------------------------------------+ 534 535-------------------------------------------------------------------- 536Field Description 537----------- --------------------------------------------------- 538count The number of parameters contained in this record. 539 Each parameter in the record contains an index and 540 value. 541 542param index Parameter index. 543 544param value Parameter value. 545-------------------------------------------------------------------- 546 547\clearpage 548 549TOOLSTACK (deprecated) 550---------------------- 551 552> *This record was only present for transitionary purposes during 553> development. It is should not be used.* 554 555An opaque blob provided by and supplied to the higher layers of the 556toolstack (e.g., libxl) during save and restore. 557 558 0 1 2 3 4 5 6 7 octet 559 +------------------------+------------------------+ 560 | data | 561 ... 562 +-------------------------------------------------+ 563 564-------------------------------------------------------------------- 565Field Description 566----------- --------------------------------------------------- 567data Blob of toolstack-specific data. 568-------------------------------------------------------------------- 569 570\clearpage 571 572VERIFY 573------ 574 575A verify record indicates that, while all memory has now been sent, the sender 576shall send further memory records for debugging purposes. 577 578 0 1 2 3 4 5 6 7 octet 579 +-------------------------------------------------+ 580 581The verify record contains no fields; its body_length is 0. 582 583\clearpage 584 585CHECKPOINT 586---------- 587 588A checkpoint record indicates that all the preceding records in the stream 589represent a consistent view of VM state. 590 591 0 1 2 3 4 5 6 7 octet 592 +-------------------------------------------------+ 593 594The checkpoint record contains no fields; its body_length is 0 595 596If the stream is embedded in a higher level toolstack stream, the 597CHECKPOINT record marks the end of the libxc portion of the stream 598and the stream is handed back to the higher level for further 599processing. 600 601The higher level stream may then hand the stream back to libxc to 602process another set of records for the next consistent VM state 603snapshot. This next set of records may be terminated by another 604CHECKPOINT record or an END record. 605 606\clearpage 607 608CHECKPOINT_DIRTY_PFN_LIST 609------------------------- 610 611A checkpoint dirty pfn list record is used to convey information about 612dirty memory in the VM. It is an unordered list of PFNs. Currently only 613applicable in the backchannel of a checkpointed stream. It is only used 614by COLO, more detail please reference README.colo. 615 616 0 1 2 3 4 5 6 7 octet 617 +-------------------------------------------------+ 618 | pfn[0] | 619 +-------------------------------------------------+ 620 ... 621 +-------------------------------------------------+ 622 | pfn[C-1] | 623 +-------------------------------------------------+ 624 625The count of pfns is: record->length/sizeof(uint64_t). 626 627\clearpage 628 629STATIC_DATA_END 630--------------- 631 632A static data end record marks the end of the static state. I.e. state which 633is invariant of guest execution. 634 635 636 0 1 2 3 4 5 6 7 octet 637 +-------------------------------------------------+ 638 639The end record contains no fields; its body_length is 0. 640 641\clearpage 642 643X86_CPUID_POLICY 644---------------- 645 646CPUID policy content, as accessed by the XEN_DOMCTL_{get,set}_cpu_policy 647hypercall sub-ops. 648 649 0 1 2 3 4 5 6 7 octet 650 +-------------------------------------------------+ 651 | CPUID_policy | 652 ... 653 +-------------------------------------------------+ 654 655-------------------------------------------------------------------- 656Field Description 657------------ --------------------------------------------------- 658CPUID_policy Array of xen_cpuid_leaf_t[]'s 659-------------------------------------------------------------------- 660 661\clearpage 662 663X86_MSR_POLICY 664-------------- 665 666MSR policy content, as accessed by the XEN_DOMCTL_{get,set}_cpu_policy 667hypercall sub-ops. 668 669 0 1 2 3 4 5 6 7 octet 670 +-------------------------------------------------+ 671 | MSR_policy | 672 ... 673 +-------------------------------------------------+ 674 675-------------------------------------------------------------------- 676Field Description 677---------- --------------------------------------------------- 678MSR_policy Array of xen_msr_entry_t[]'s 679-------------------------------------------------------------------- 680 681\clearpage 682 683 684Layout 685====== 686 687The set of valid records depends on the guest architecture and type. No 688assumptions should be made about the ordering or interleaving of 689independent records. Record dependencies are noted below. 690 691Some records are used for signalling, and explicitly have zero length. All 692other records contain data relevant to the migration. Data records with no 693content should be elided on the source side, as their presence serves no 694purpose, but results in extra work for the restore side. 695 696x86 PV Guest 697------------ 698 699A typical save record for an x86 PV guest image would look like: 700 701* Image header 702* Domain header 703* Static data records: 704 * X86_PV_INFO record 705 * X86_{CPUID,MSR}_POLICY 706 * STATIC_DATA_END 707* X86_PV_P2M_FRAMES record 708* Many PAGE_DATA records 709* X86_TSC_INFO 710* SHARED_INFO record 711* VCPU context records for each online VCPU 712 * X86_PV_VCPU_BASIC record 713 * X86_PV_VCPU_EXTENDED record 714 * X86_PV_VCPU_XSAVE record 715 * X86_PV_VCPU_MSRS record 716* END record 717 718There are some strict ordering requirements. The following records must 719be present in the following order as each of them depends on information 720present in the preceding ones. 721 722* X86_PV_INFO record 723* X86_PV_P2M_FRAMES record 724* PAGE_DATA records 725* VCPU records 726 727x86 HVM Guest 728------------- 729 730A typical save record for an x86 HVM guest image would look like: 731 732* Image header 733* Domain header 734* Static data records: 735 * X86_{CPUID,MSR}_POLICY 736 * STATIC_DATA_END 737* Many PAGE_DATA records 738* X86_TSC_INFO 739* HVM_PARAMS 740* HVM_CONTEXT 741* END record 742 743HVM_PARAMS must precede HVM_CONTEXT, as certain parameters can affect 744the validity of architectural state in the context. 745 746Compatibility with older versions 747================================= 748 749v3 compat with v2 750----------------- 751 752A v3 stream is compatible with a v2 stream, but mandates the presense of a 753STATIC_DATA_END record ahead of any memory/register content. This is to ease 754the introduction of new static configuration records over time. 755 756A v3-compatible reciever interpreting a v2 stream should infer the position of 757STATIC_DATA_END based on finding the first X86_PV_P2M_FRAMES record (for PV 758guests), or PAGE_DATA record (for HVM guests) and behave as if STATIC_DATA_END 759had been sent. 760 761Legacy Images (x86 only) 762------------------------ 763 764Restoring legacy images from older tools shall be handled by 765translating the legacy format image into this new format. 766 767It shall not be possible to save in the legacy format. 768 769There are two different legacy images depending on whether they were 770generated by a 32-bit or a 64-bit toolstack. These shall be 771distinguished by inspecting octets 4-7 in the image. If these are 772zero then it is a 64-bit image. 773 774Toolstack Field Value 775--------- ----- ----- 77664-bit Bit 31-63 of the p2m_size field 0 (since p2m_size < 2^32^) 77732-bit extended-info chunk ID (PV) 0xFFFFFFFF 77832-bit Chunk type (HVM) < 0 77932-bit Page count (HVM) > 0 780 781Table: Possible values for octet 4-7 in legacy images 782 783This assumes the presence of the extended-info chunk which was 784introduced in Xen 3.0. 785 786 787Future Extensions 788================= 789 790All changes to this specification should bump the revision number in 791the title block. 792 793All changes to the image or domain headers require the image version 794to be increased. 795 796The format may be extended by adding additional record types. 797 798Extending an existing record type must be done by adding a new record 799type. This allows old images with the old record to still be 800restored. 801 802The image header may only be extended by _appending_ additional 803fields. In particular, the `marker`, `id` and `version` fields must 804never change size or location. 805 806 807Errata 808====== 809 8101. For compatibility with older code, the receving side of a stream should 811 tolerate and ignore variable sized records with zero content. Xen releases 812 between 4.6 and 4.8 could end up generating valid HVM_PARAMS or 813 X86_PV_VCPU_{EXTENDED,XSAVE,MSRS} records with zero-length content. 814