1# -*- coding: utf-8 -*- 2 3""" 4Legacy migration stream information. 5 6Documentation and record structures for legacy migration, for both libxc 7and libxl. 8""" 9 10""" 11Libxc: 12 13SAVE/RESTORE/MIGRATE PROTOCOL 14============================= 15 16The general form of a stream of chunks is a header followed by a 17body consisting of a variable number of chunks (terminated by a 18chunk with type 0) followed by a trailer. 19 20For a rolling/checkpoint (e.g. remus) migration then the body and 21trailer phases can be repeated until an external event 22(e.g. failure) causes the process to terminate and commit to the 23most recent complete checkpoint. 24 25HEADER 26------ 27 28unsigned long : p2m_size 29 30extended-info (PV-only, optional): 31 32 If first unsigned long == ~0UL then extended info is present, 33 otherwise unsigned long is part of p2m. Note that p2m_size above 34 does not include the length of the extended info. 35 36 extended-info: 37 38 unsigned long : signature == ~0UL 39 uint32_t : number of bytes remaining in extended-info 40 41 1 or more extended-info blocks of form: 42 char[4] : block identifier 43 uint32_t : block data size 44 bytes : block data 45 46 defined extended-info blocks: 47 "vcpu" : VCPU context info containing vcpu_guest_context_t. 48 The precise variant of the context structure 49 (e.g. 32 vs 64 bit) is distinguished by 50 the block size. 51 "extv" : Presence indicates use of extended VCPU context in 52 tail, data size is 0. 53 54p2m (PV-only): 55 56 consists of p2m_size bytes comprising an array of xen_pfn_t sized entries. 57 58BODY PHASE - Format A (for live migration or Remus without compression) 59---------- 60 61A series of chunks with a common header: 62 int : chunk type 63 64If the chunk type is +ve then chunk contains guest memory data, and the 65type contains the number of pages in the batch: 66 67 unsigned long[] : PFN array, length == number of pages in batch 68 Each entry consists of XEN_DOMCTL_PFINFO_* 69 in bits 31-28 and the PFN number in bits 27-0. 70 page data : PAGE_SIZE bytes for each page marked present in PFN 71 array 72 73If the chunk type is -ve then chunk consists of one of a number of 74metadata types. See definitions of XC_SAVE_ID_* below. 75 76If chunk type is 0 then body phase is complete. 77 78 79BODY PHASE - Format B (for Remus with compression) 80---------- 81 82A series of chunks with a common header: 83 int : chunk type 84 85If the chunk type is +ve then chunk contains array of PFNs corresponding 86to guest memory and type contains the number of PFNs in the batch: 87 88 unsigned long[] : PFN array, length == number of pages in batch 89 Each entry consists of XEN_DOMCTL_PFINFO_* 90 in bits 31-28 and the PFN number in bits 27-0. 91 92If the chunk type is -ve then chunk consists of one of a number of 93metadata types. See definitions of XC_SAVE_ID_* below. 94 95If the chunk type is -ve and equals XC_SAVE_ID_COMPRESSED_DATA, then the 96chunk consists of compressed page data, in the following format: 97 98 unsigned long : Size of the compressed chunk to follow 99 compressed data : variable length data of size indicated above. 100 This chunk consists of compressed page data. 101 The number of pages in one chunk depends on 102 the amount of space available in the sender's 103 output buffer. 104 105Format of compressed data: 106 compressed_data = <deltas>* 107 delta = <marker, run*> 108 marker = (RUNFLAG|SKIPFLAG) bitwise-or RUNLEN [1 byte marker] 109 RUNFLAG = 0 110 SKIPFLAG = 1 << 7 111 RUNLEN = 7-bit unsigned value indicating number of WORDS in the run 112 run = string of bytes of length sizeof(WORD) * RUNLEN 113 114 If marker contains RUNFLAG, then RUNLEN * sizeof(WORD) bytes of data following 115 the marker is copied into the target page at the appropriate offset indicated by 116 the offset_ptr 117 If marker contains SKIPFLAG, then the offset_ptr is advanced 118 by RUNLEN * sizeof(WORD). 119 120If chunk type is 0 then body phase is complete. 121 122There can be one or more chunks with type XC_SAVE_ID_COMPRESSED_DATA, 123containing compressed pages. The compressed chunks are collated to form 124one single compressed chunk for the entire iteration. The number of pages 125present in this final compressed chunk will be equal to the total number 126of valid PFNs specified by the +ve chunks. 127 128At the sender side, compressed pages are inserted into the output stream 129in the same order as they would have been if compression logic was absent. 130 131Until last iteration, the BODY is sent in Format A, to maintain live 132migration compatibility with receivers of older Xen versions. 133At the last iteration, if Remus compression was enabled, the sender sends 134a trigger, XC_SAVE_ID_ENABLE_COMPRESSION to tell the receiver to parse the 135BODY in Format B from the next iteration onwards. 136 137An example sequence of chunks received in Format B: 138 +16 +ve chunk 139 unsigned long[16] PFN array 140 +100 +ve chunk 141 unsigned long[100] PFN array 142 +50 +ve chunk 143 unsigned long[50] PFN array 144 145 XC_SAVE_ID_COMPRESSED_DATA TAG 146 N Length of compressed data 147 N bytes of DATA Decompresses to 166 pages 148 149 XC_SAVE_ID_* other xc save chunks 150 0 END BODY TAG 151 152Corner case with checkpoint compression: 153 At sender side, after pausing the domain, dirty pages are usually 154 copied out to a temporary buffer. After the domain is resumed, 155 compression is done and the compressed chunk(s) are sent, followed by 156 other XC_SAVE_ID_* chunks. 157 If the temporary buffer gets full while scanning for dirty pages, 158 the sender stops buffering of dirty pages, compresses the temporary 159 buffer and sends the compressed data with XC_SAVE_ID_COMPRESSED_DATA. 160 The sender then resumes the buffering of dirty pages and continues 161 scanning for the dirty pages. 162 For e.g., assume that the temporary buffer can hold 4096 pages and 163 there are 5000 dirty pages. The following is the sequence of chunks 164 that the receiver will see: 165 166 +1024 +ve chunk 167 unsigned long[1024] PFN array 168 +1024 +ve chunk 169 unsigned long[1024] PFN array 170 +1024 +ve chunk 171 unsigned long[1024] PFN array 172 +1024 +ve chunk 173 unsigned long[1024] PFN array 174 175 XC_SAVE_ID_COMPRESSED_DATA TAG 176 N Length of compressed data 177 N bytes of DATA Decompresses to 4096 pages 178 179 +4 +ve chunk 180 unsigned long[4] PFN array 181 182 XC_SAVE_ID_COMPRESSED_DATA TAG 183 M Length of compressed data 184 M bytes of DATA Decompresses to 4 pages 185 186 XC_SAVE_ID_* other xc save chunks 187 0 END BODY TAG 188 189 In other words, XC_SAVE_ID_COMPRESSED_DATA can be interleaved with 190 +ve chunks arbitrarily. But at the receiver end, the following condition 191 always holds true until the end of BODY PHASE: 192 num(PFN entries +ve chunks) >= num(pages received in compressed form) 193 194TAIL PHASE 195---------- 196 197Content differs for PV and HVM guests. 198 199HVM TAIL: 200 201 "Magic" pages: 202 uint64_t : I/O req PFN 203 uint64_t : Buffered I/O req PFN 204 uint64_t : Store PFN 205 Xen HVM Context: 206 uint32_t : Length of context in bytes 207 bytes : Context data 208 Qemu context: 209 char[21] : Signature: 210 "QemuDeviceModelRecord" : Read Qemu save data until EOF 211 "DeviceModelRecord0002" : uint32_t length field followed by that many 212 bytes of Qemu save data 213 "RemusDeviceModelState" : Currently the same as "DeviceModelRecord0002". 214 215PV TAIL: 216 217 Unmapped PFN list : list of all the PFNs that were not in map at the close 218 unsigned int : Number of unmapped pages 219 unsigned long[] : PFNs of unmapped pages 220 221 VCPU context data : A series of VCPU records, one per present VCPU 222 Maximum and present map supplied in XC_SAVE_ID_VCPUINFO 223 bytes: : VCPU context structure. Size is determined by size 224 provided in extended-info header 225 bytes[128] : Extended VCPU context (present IFF "extv" block 226 present in extended-info header) 227 228 Shared Info Page : 4096 bytes of shared info page 229""" 230 231CHUNK_end = 0 232CHUNK_enable_verify_mode = -1 233CHUNK_vcpu_info = -2 234CHUNK_hvm_ident_pt = -3 235CHUNK_hvm_vm86_tss = -4 236CHUNK_tmem = -5 237CHUNK_tmem_extra = -6 238CHUNK_tsc_info = -7 239CHUNK_hvm_console_pfn = -8 240CHUNK_last_checkpoint = -9 241CHUNK_hvm_acpi_ioports_location = -10 242CHUNK_hvm_viridian = -11 243CHUNK_compressed_data = -12 244CHUNK_enable_compression = -13 245CHUNK_hvm_generation_id_addr = -14 246CHUNK_hvm_paging_ring_pfn = -15 247CHUNK_hvm_monitor_ring_pfn = -16 248CHUNK_hvm_sharing_ring_pfn = -17 249CHUNK_toolstack = -18 250CHUNK_hvm_ioreq_server_pfn = -19 251CHUNK_hvm_nr_ioreq_server_pages = -20 252 253chunk_type_to_str = { 254 CHUNK_end : "end", 255 CHUNK_enable_verify_mode : "enable_verify_mode", 256 CHUNK_vcpu_info : "vcpu_info", 257 CHUNK_hvm_ident_pt : "hvm_ident_pt", 258 CHUNK_hvm_vm86_tss : "hvm_vm86_tss", 259 CHUNK_tmem : "tmem", 260 CHUNK_tmem_extra : "tmem_extra", 261 CHUNK_tsc_info : "tsc_info", 262 CHUNK_hvm_console_pfn : "hvm_console_pfn", 263 CHUNK_last_checkpoint : "last_checkpoint", 264 CHUNK_hvm_acpi_ioports_location : "hvm_acpi_ioports_location", 265 CHUNK_hvm_viridian : "hvm_viridian", 266 CHUNK_compressed_data : "compressed_data", 267 CHUNK_enable_compression : "enable_compression", 268 CHUNK_hvm_generation_id_addr : "hvm_generation_id_addr", 269 CHUNK_hvm_paging_ring_pfn : "hvm_paging_ring_pfn", 270 CHUNK_hvm_monitor_ring_pfn : "hvm_monitor_ring_pfn", 271 CHUNK_hvm_sharing_ring_pfn : "hvm_sharing_ring_pfn", 272 CHUNK_toolstack : "toolstack", 273 CHUNK_hvm_ioreq_server_pfn : "hvm_ioreq_server_pfn", 274 CHUNK_hvm_nr_ioreq_server_pages : "hvm_nr_ioreq_server_pages", 275} 276 277# Up to 1024 pages (4MB) at a time 278MAX_BATCH = 1024 279 280# Maximum #VCPUs currently supported for save/restore 281MAX_VCPU_ID = 4095 282 283 284""" 285Libxl: 286 287Legacy "toolstack" record layout: 288 289Version 1: 290 uint32_t version 291 QEMU physmap data: 292 uint32_t count 293 libxl__physmap_info * count 294 295The problem is that libxl__physmap_info was declared as: 296 297struct libxl__physmap_info { 298 uint64_t phys_offset; 299 uint64_t start_addr; 300 uint64_t size; 301 uint32_t namelen; 302 char name[]; 303}; 304 305Which has 4 bytes of padding at the end in a 64bit build, thus not the 306same between 32 and 64bit builds. 307 308Because of the pointer arithmatic used to construct the record, the 'name' was 309shifted up to start at the padding, leaving the erronious 4 bytes at the end 310of the name string, after the NUL terminator. 311 312Instead, the information described here has been changed to fit in a new 313EMULATOR_XENSTORE_DATA record made of NUL terminated strings. 314""" 315