1# -*- coding: utf-8 -*-
2
3"""
4Legacy migration stream information.
5
6Documentation and record structures for legacy migration, for both libxc
7and libxl.
8"""
9
10"""
11Libxc:
12
13SAVE/RESTORE/MIGRATE PROTOCOL
14=============================
15
16The general form of a stream of chunks is a header followed by a
17body consisting of a variable number of chunks (terminated by a
18chunk with type 0) followed by a trailer.
19
20For a rolling/checkpoint (e.g. remus) migration then the body and
21trailer phases can be repeated until an external event
22(e.g. failure) causes the process to terminate and commit to the
23most recent complete checkpoint.
24
25HEADER
26------
27
28unsigned long        : p2m_size
29
30extended-info (PV-only, optional):
31
32  If first unsigned long == ~0UL then extended info is present,
33  otherwise unsigned long is part of p2m. Note that p2m_size above
34  does not include the length of the extended info.
35
36  extended-info:
37
38    unsigned long    : signature == ~0UL
39    uint32_t	        : number of bytes remaining in extended-info
40
41    1 or more extended-info blocks of form:
42    char[4]          : block identifier
43    uint32_t         : block data size
44    bytes            : block data
45
46    defined extended-info blocks:
47    "vcpu"		: VCPU context info containing vcpu_guest_context_t.
48                       The precise variant of the context structure
49                       (e.g. 32 vs 64 bit) is distinguished by
50                       the block size.
51    "extv"           : Presence indicates use of extended VCPU context in
52                       tail, data size is 0.
53
54p2m (PV-only):
55
56  consists of p2m_size bytes comprising an array of xen_pfn_t sized entries.
57
58BODY PHASE - Format A (for live migration or Remus without compression)
59----------
60
61A series of chunks with a common header:
62  int              : chunk type
63
64If the chunk type is +ve then chunk contains guest memory data, and the
65type contains the number of pages in the batch:
66
67    unsigned long[]  : PFN array, length == number of pages in batch
68                       Each entry consists of XEN_DOMCTL_PFINFO_*
69                       in bits 31-28 and the PFN number in bits 27-0.
70    page data        : PAGE_SIZE bytes for each page marked present in PFN
71                       array
72
73If the chunk type is -ve then chunk consists of one of a number of
74metadata types.  See definitions of XC_SAVE_ID_* below.
75
76If chunk type is 0 then body phase is complete.
77
78
79BODY PHASE - Format B (for Remus with compression)
80----------
81
82A series of chunks with a common header:
83  int              : chunk type
84
85If the chunk type is +ve then chunk contains array of PFNs corresponding
86to guest memory and type contains the number of PFNs in the batch:
87
88    unsigned long[]  : PFN array, length == number of pages in batch
89                       Each entry consists of XEN_DOMCTL_PFINFO_*
90                       in bits 31-28 and the PFN number in bits 27-0.
91
92If the chunk type is -ve then chunk consists of one of a number of
93metadata types.  See definitions of XC_SAVE_ID_* below.
94
95If the chunk type is -ve and equals XC_SAVE_ID_COMPRESSED_DATA, then the
96chunk consists of compressed page data, in the following format:
97
98    unsigned long        : Size of the compressed chunk to follow
99    compressed data :      variable length data of size indicated above.
100                           This chunk consists of compressed page data.
101                           The number of pages in one chunk depends on
102                           the amount of space available in the sender's
103                           output buffer.
104
105Format of compressed data:
106  compressed_data = <deltas>*
107  delta           = <marker, run*>
108  marker          = (RUNFLAG|SKIPFLAG) bitwise-or RUNLEN [1 byte marker]
109  RUNFLAG         = 0
110  SKIPFLAG        = 1 << 7
111  RUNLEN          = 7-bit unsigned value indicating number of WORDS in the run
112  run             = string of bytes of length sizeof(WORD) * RUNLEN
113
114   If marker contains RUNFLAG, then RUNLEN * sizeof(WORD) bytes of data following
115  the marker is copied into the target page at the appropriate offset indicated by
116  the offset_ptr
117   If marker contains SKIPFLAG, then the offset_ptr is advanced
118  by RUNLEN * sizeof(WORD).
119
120If chunk type is 0 then body phase is complete.
121
122There can be one or more chunks with type XC_SAVE_ID_COMPRESSED_DATA,
123containing compressed pages. The compressed chunks are collated to form
124one single compressed chunk for the entire iteration. The number of pages
125present in this final compressed chunk will be equal to the total number
126of valid PFNs specified by the +ve chunks.
127
128At the sender side, compressed pages are inserted into the output stream
129in the same order as they would have been if compression logic was absent.
130
131Until last iteration, the BODY is sent in Format A, to maintain live
132migration compatibility with receivers of older Xen versions.
133At the last iteration, if Remus compression was enabled, the sender sends
134a trigger, XC_SAVE_ID_ENABLE_COMPRESSION to tell the receiver to parse the
135BODY in Format B from the next iteration onwards.
136
137An example sequence of chunks received in Format B:
138    +16                              +ve chunk
139    unsigned long[16]                PFN array
140    +100                             +ve chunk
141    unsigned long[100]               PFN array
142    +50                              +ve chunk
143    unsigned long[50]                PFN array
144
145    XC_SAVE_ID_COMPRESSED_DATA       TAG
146      N                              Length of compressed data
147      N bytes of DATA                Decompresses to 166 pages
148
149    XC_SAVE_ID_*                     other xc save chunks
150    0                                END BODY TAG
151
152Corner case with checkpoint compression:
153    At sender side, after pausing the domain, dirty pages are usually
154  copied out to a temporary buffer. After the domain is resumed,
155  compression is done and the compressed chunk(s) are sent, followed by
156  other XC_SAVE_ID_* chunks.
157    If the temporary buffer gets full while scanning for dirty pages,
158  the sender stops buffering of dirty pages, compresses the temporary
159  buffer and sends the compressed data with XC_SAVE_ID_COMPRESSED_DATA.
160  The sender then resumes the buffering of dirty pages and continues
161  scanning for the dirty pages.
162    For e.g., assume that the temporary buffer can hold 4096 pages and
163  there are 5000 dirty pages. The following is the sequence of chunks
164  that the receiver will see:
165
166    +1024                       +ve chunk
167    unsigned long[1024]         PFN array
168    +1024                       +ve chunk
169    unsigned long[1024]         PFN array
170    +1024                       +ve chunk
171    unsigned long[1024]         PFN array
172    +1024                       +ve chunk
173    unsigned long[1024]         PFN array
174
175    XC_SAVE_ID_COMPRESSED_DATA  TAG
176     N                          Length of compressed data
177     N bytes of DATA            Decompresses to 4096 pages
178
179    +4                          +ve chunk
180    unsigned long[4]            PFN array
181
182    XC_SAVE_ID_COMPRESSED_DATA  TAG
183     M                          Length of compressed data
184     M bytes of DATA            Decompresses to 4 pages
185
186    XC_SAVE_ID_*                other xc save chunks
187    0                           END BODY TAG
188
189    In other words, XC_SAVE_ID_COMPRESSED_DATA can be interleaved with
190  +ve chunks arbitrarily. But at the receiver end, the following condition
191  always holds true until the end of BODY PHASE:
192   num(PFN entries +ve chunks) >= num(pages received in compressed form)
193
194TAIL PHASE
195----------
196
197Content differs for PV and HVM guests.
198
199HVM TAIL:
200
201 "Magic" pages:
202    uint64_t         : I/O req PFN
203    uint64_t         : Buffered I/O req PFN
204    uint64_t         : Store PFN
205 Xen HVM Context:
206    uint32_t         : Length of context in bytes
207    bytes            : Context data
208 Qemu context:
209    char[21]         : Signature:
210      "QemuDeviceModelRecord" : Read Qemu save data until EOF
211      "DeviceModelRecord0002" : uint32_t length field followed by that many
212                                bytes of Qemu save data
213      "RemusDeviceModelState" : Currently the same as "DeviceModelRecord0002".
214
215PV TAIL:
216
217 Unmapped PFN list   : list of all the PFNs that were not in map at the close
218    unsigned int     : Number of unmapped pages
219    unsigned long[]  : PFNs of unmapped pages
220
221 VCPU context data   : A series of VCPU records, one per present VCPU
222                       Maximum and present map supplied in XC_SAVE_ID_VCPUINFO
223    bytes:           : VCPU context structure. Size is determined by size
224                       provided in extended-info header
225    bytes[128]       : Extended VCPU context (present IFF "extv" block
226                       present in extended-info header)
227
228 Shared Info Page    : 4096 bytes of shared info page
229"""
230
231CHUNK_end                       = 0
232CHUNK_enable_verify_mode        = -1
233CHUNK_vcpu_info                 = -2
234CHUNK_hvm_ident_pt              = -3
235CHUNK_hvm_vm86_tss              = -4
236CHUNK_tmem                      = -5
237CHUNK_tmem_extra                = -6
238CHUNK_tsc_info                  = -7
239CHUNK_hvm_console_pfn           = -8
240CHUNK_last_checkpoint           = -9
241CHUNK_hvm_acpi_ioports_location = -10
242CHUNK_hvm_viridian              = -11
243CHUNK_compressed_data           = -12
244CHUNK_enable_compression        = -13
245CHUNK_hvm_generation_id_addr    = -14
246CHUNK_hvm_paging_ring_pfn       = -15
247CHUNK_hvm_monitor_ring_pfn      = -16
248CHUNK_hvm_sharing_ring_pfn      = -17
249CHUNK_toolstack                 = -18
250CHUNK_hvm_ioreq_server_pfn      = -19
251CHUNK_hvm_nr_ioreq_server_pages = -20
252
253chunk_type_to_str = {
254    CHUNK_end                       : "end",
255    CHUNK_enable_verify_mode        : "enable_verify_mode",
256    CHUNK_vcpu_info                 : "vcpu_info",
257    CHUNK_hvm_ident_pt              : "hvm_ident_pt",
258    CHUNK_hvm_vm86_tss              : "hvm_vm86_tss",
259    CHUNK_tmem                      : "tmem",
260    CHUNK_tmem_extra                : "tmem_extra",
261    CHUNK_tsc_info                  : "tsc_info",
262    CHUNK_hvm_console_pfn           : "hvm_console_pfn",
263    CHUNK_last_checkpoint           : "last_checkpoint",
264    CHUNK_hvm_acpi_ioports_location : "hvm_acpi_ioports_location",
265    CHUNK_hvm_viridian              : "hvm_viridian",
266    CHUNK_compressed_data           : "compressed_data",
267    CHUNK_enable_compression        : "enable_compression",
268    CHUNK_hvm_generation_id_addr    : "hvm_generation_id_addr",
269    CHUNK_hvm_paging_ring_pfn       : "hvm_paging_ring_pfn",
270    CHUNK_hvm_monitor_ring_pfn      : "hvm_monitor_ring_pfn",
271    CHUNK_hvm_sharing_ring_pfn      : "hvm_sharing_ring_pfn",
272    CHUNK_toolstack                 : "toolstack",
273    CHUNK_hvm_ioreq_server_pfn      : "hvm_ioreq_server_pfn",
274    CHUNK_hvm_nr_ioreq_server_pages : "hvm_nr_ioreq_server_pages",
275}
276
277# Up to 1024 pages (4MB) at a time
278MAX_BATCH = 1024
279
280# Maximum #VCPUs currently supported for save/restore
281MAX_VCPU_ID = 4095
282
283
284"""
285Libxl:
286
287Legacy "toolstack" record layout:
288
289Version 1:
290  uint32_t version
291  QEMU physmap data:
292    uint32_t count
293    libxl__physmap_info * count
294
295The problem is that libxl__physmap_info was declared as:
296
297struct libxl__physmap_info {
298    uint64_t phys_offset;
299    uint64_t start_addr;
300    uint64_t size;
301    uint32_t namelen;
302    char name[];
303};
304
305Which has 4 bytes of padding at the end in a 64bit build, thus not the
306same between 32 and 64bit builds.
307
308Because of the pointer arithmatic used to construct the record, the 'name' was
309shifted up to start at the padding, leaving the erronious 4 bytes at the end
310of the name string, after the NUL terminator.
311
312Instead, the information described here has been changed to fit in a new
313EMULATOR_XENSTORE_DATA record made of NUL terminated strings.
314"""
315