1# Xenstore Migration
2
3## Background
4
5The design for *Non-Cooperative Migration of Guests*[1] explains that extra
6save records are required in the migrations stream to allow a guest running PV
7drivers to be migrated without its co-operation. Moreover the save records must
8include details of registered xenstore watches as well as content; information
9that cannot currently be recovered from `xenstored`, and hence some extension
10to the xenstored implementations will also be required.
11
12As a similar set of data is needed for transferring xenstore data from one
13instance to another when live updating xenstored this document proposes an
14image format for a 'migration stream' suitable for both purposes.
15
16## Proposal
17
18The image format consists of a _header_ followed by 1 or more _records_. Each
19record consists of a type and length field, followed by any data mandated by
20the record type. At minimum there will be a single record of type `END`
21(defined below).
22
23### Header
24
25The header identifies the stream as a `xenstore` stream, including the version
26of the specification that it complies with.
27
28All fields in this header must be in _big-endian_ byte order, regardless of
29the setting of the endianness bit.
30
31
32```
33    0       1       2       3       4       5       6       7    octet
34+-------+-------+-------+-------+-------+-------+-------+-------+
35| ident                                                         |
36+-------------------------------+-------------------------------|
37| version                       | flags                         |
38+-------------------------------+-------------------------------+
39```
40
41
42| Field     | Description                                       |
43|-----------|---------------------------------------------------|
44| `ident`   | 0x78656e73746f7265 ('xenstore' in ASCII)          |
45|           |                                                   |
46| `version` | The version of the specification, defined values: |
47|           | 0x00000001: all fields and records without any    |
48|           |             explicitly mentioned version          |
49|           |             dependency are valid.                 |
50|           | 0x00000002: all fields and records valid for      |
51|           |             version 1 plus fields and records     |
52|           |             explicitly stated to be supported in  |
53|           |             version 2 are valid.                  |
54|           |                                                   |
55| `flags`   | 0 (LSB): Endianness: 0 = little, 1 = big          |
56|           |                                                   |
57|           | 1-31: Reserved (must be zero)                     |
58
59### Records
60
61Records immediately follow the header and have the following format:
62
63
64```
65    0       1       2       3       4       5       6       7    octet
66+-------+-------+-------+-------+-------+-------+-------+-------+
67| type                          | len                           |
68+-------------------------------+-------------------------------+
69| body
70...
71|       | padding (0 to 7 octets)                               |
72+-------+-------------------------------------------------------+
73```
74
75NOTE: padding octets or fields not valid in the used version here and in all
76      subsequent format specifications must be written as zero and should be
77      ignored when the stream is read.
78
79
80| Field  | Description                                          |
81|--------|------------------------------------------------------|
82| `type` | 0x00000000: END                                      |
83|        | 0x00000001: GLOBAL_DATA                              |
84|        | 0x00000002: CONNECTION_DATA                          |
85|        | 0x00000003: WATCH_DATA                               |
86|        | 0x00000004: TRANSACTION_DATA                         |
87|        | 0x00000005: NODE_DATA                                |
88|        | 0x00000006: GLOBAL_QUOTA_DATA                        |
89|        | 0x00000007: DOMAIN_DATA                              |
90|        | 0x00000008: WATCH_DATA_EXTENDED (version 2 and up)   |
91|        | 0x00000009 - 0xFFFFFFFF: reserved for future use     |
92|        |                                                      |
93| `len`  | The length (in octets) of `body`                     |
94|        |                                                      |
95| `body` | The type-specific record data                        |
96
97Some records will depend on other records in the migration stream. Records
98upon which other records depend must always appear earlier in the stream.
99
100The various formats of the type-specific data are described in the following
101sections:
102
103\pagebreak
104
105### END
106
107The end record marks the end of the image, and is the final record
108in the stream.
109
110```
111    0       1       2       3       4       5       6       7    octet
112+-------+-------+-------+-------+-------+-------+-------+-------+
113```
114
115
116The end record contains no fields; its body length is 0.
117
118\pagebreak
119
120### GLOBAL_DATA
121
122This record is only relevant for live update. It contains details of global
123xenstored state that needs to be restored.
124
125```
126    0       1       2       3    octet
127+-------+-------+-------+-------+
128| rw-socket-fd                  |
129+-------------------------------+
130| evtchn-fd                     |
131+-------------------------------+
132```
133
134
135| Field          | Description                                  |
136|----------------|----------------------------------------------|
137| `rw-socket-fd` | The file descriptor of the socket accepting  |
138|                | read-write connections                       |
139|                |                                              |
140| `evtchn-fd`    | The file descriptor used to communicate with |
141|                | the event channel driver                     |
142
143xenstored will resume in the original process context. Hence `rw-socket-fd`
144simply specifies the file descriptor of the socket. Sockets are not always
145used, however, and so -1 will be used to denote an unused socket.
146
147\pagebreak
148
149### CONNECTION_DATA
150
151For live update the image format will contain a `CONNECTION_DATA` record for
152each connection to xenstore. For migration it will only contain a record for
153the domain being migrated.
154
155
156```
157    0       1       2       3       4       5       6       7    octet
158+-------+-------+-------+-------+-------+-------+-------+-------+
159| conn-id                       | conn-type     | fields        |
160+-------------------------------+---------------+---------------+
161| conn-spec
162...
163+---------------+---------------+-------------------------------+
164| in-data-len   | out-resp-len  | out-data-len                  |
165+---------------+---------------+-------------------------------+
166| data
167...
168+---------------------------------------------------------------+
169| unique-id                                                     |
170+---------------------------------------------------------------+
171```
172
173
174| Field          | Description                                  |
175|----------------|----------------------------------------------|
176| `conn-id`      | A non-zero number used to identify this      |
177|                | connection in subsequent connection-specific |
178|                | records                                      |
179|                |                                              |
180| `conn-type`    | 0x0000: shared ring                          |
181|                | 0x0001: socket                               |
182|                | 0x0002 - 0xFFFF: reserved for future use     |
183|                |                                              |
184| `fields`       | A collection of flags indicating presence    |
185|                | of additional fields after the variable      |
186|                | length `data` part. The additional fields    |
187|                | will start after a possible padding for      |
188|                | aligning to a 8 octet boundary.              |
189|                | Defined flag values (to be or-ed):           |
190|                | 0x0001: `unique_id` present (only needed for |
191|                |         `shared ring` connection in live     |
192|                |         update streams).                     |
193|                |                                              |
194| `conn-spec`    | See below                                    |
195|                |                                              |
196| `in-data-len`  | The length (in octets) of any data read      |
197|                | from the connection not yet processed        |
198|                |                                              |
199| `out-resp-len` | The length (in octets) of a partial response |
200|                | not yet written to the connection            |
201|                |                                              |
202| `out-data-len` | The length (in octets) of any pending data   |
203|                | not yet written to the connection, including |
204|                | a partial response (see `out-resp-len`)      |
205|                |                                              |
206| `data`         | Pending data: first in-data-len octets of    |
207|                | read data, then out-data-len octets of       |
208|                | written data (any of both may be empty)      |
209|                |                                              |
210| `unique-id`    | Unique identifier of a domain                |
211|                |                                              |
212
213In case of live update the connection record for the connection via which
214the live update command was issued will contain the response for the live
215update command in the pending not yet written data.
216
217\pagebreak
218
219The format of `conn-spec` is dependent upon `conn-type`.
220
221For `shared ring` connections it is as follows:
222
223
224```
225    0       1       2       3       4       5       6       7    octet
226+---------------+---------------+---------------+---------------+
227| domid         | tdomid        | evtchn                        |
228+-------------------------------+-------------------------------+
229```
230
231
232| Field     | Description                                       |
233|-----------|---------------------------------------------------|
234| `domid`   | The domain-id that owns the shared page           |
235|           |                                                   |
236| `tdomid`  | The domain-id that `domid` acts on behalf of if   |
237|           | it has been subject to an SET_TARGET              |
238|           | operation [2] or DOMID_INVALID [3] otherwise      |
239|           |                                                   |
240| `evtchn`  | The port number of the interdomain channel used   |
241|           | by xenstored to communicate with `domid`          |
242|           |                                                   |
243
244The GFN of the shared page is not preserved because the ABI reserves
245entry 1 in `domid`'s grant table to point to the xenstore shared page.
246Note there is no guarantee the page will still be valid at the time of
247the restore because a domain can revoke the permission.
248
249For `socket` connections it is as follows:
250
251
252```
253+---------------+---------------+---------------+---------------+
254| socket-fd                     | pad                           |
255+-------------------------------+-------------------------------+
256```
257
258
259| Field       | Description                                     |
260|-------------|-------------------------------------------------|
261| `socket-fd` | The file descriptor of the connected socket     |
262
263This type of connection is only relevant for live update, where the xenstored
264resumes in the original process context. Hence `socket-fd` simply specify
265the file descriptor of the socket connection.
266
267\pagebreak
268
269### WATCH_DATA
270
271The image format will contain either a `WATCH_DATA` or a `WATCH_DATA_EXTENDED`
272record for each watch registered by a connection for which there is
273`CONNECTION_DATA` record previously present.
274
275```
276    0       1       2       3    octet
277+-------+-------+-------+-------+
278| conn-id                       |
279+---------------+---------------+
280| wpath-len     | token-len     |
281+---------------+---------------+
282| wpath
283...
284| token
285...
286```
287
288
289| Field       | Description                                     |
290|-------------|-------------------------------------------------|
291| `conn-id`   | The connection that issued the `WATCH`          |
292|             | operation [2]                                   |
293|             |                                                 |
294| `wpath-len` | The length (in octets) of `wpath` including the |
295|             | NUL terminator                                  |
296|             |                                                 |
297| `token-len` | The length (in octets) of `token` including the |
298|             | NUL terminator                                  |
299|             |                                                 |
300| `wpath`     | The watch path, as specified in the `WATCH`     |
301|             | operation                                       |
302|             |                                                 |
303| `token`     | The watch identifier token, as specified in the |
304|             | `WATCH` operation                               |
305
306\pagebreak
307
308### WATCH_DATA_EXTENDED
309
310The image format will contain either a `WATCH_DATA` or a `WATCH_DATA_EXTENDED`
311record for each watch registered by a connection for which there is
312`CONNECTION_DATA` record previously present. The `WATCH_DATA_EXTENDED` record
313type is valid only in version 2 and later.
314
315```
316    0       1       2       3    octet
317+-------+-------+-------+-------+
318| conn-id                       |
319+---------------+---------------+
320| wpath-len     | token-len     |
321+---------------+---------------+
322| depth         | pad           |
323+---------------+---------------+
324| wpath
325...
326| token
327...
328```
329
330
331| Field       | Description                                     |
332|-------------|-------------------------------------------------|
333| `conn-id`   | The connection that issued the `WATCH`          |
334|             | operation [2]                                   |
335|             |                                                 |
336| `wpath-len` | The length (in octets) of `wpath` including the |
337|             | NUL terminator                                  |
338|             |                                                 |
339| `token-len` | The length (in octets) of `token` including the |
340|             | NUL terminator                                  |
341|             |                                                 |
342| `depth`     | The number of directory levels below the        |
343|             | watched path to consider for a match.           |
344|             | A value of 0xffff is used for unlimited depth.  |
345|             |                                                 |
346| `wpath`     | The watch path, as specified in the `WATCH`     |
347|             | operation                                       |
348|             |                                                 |
349| `token`     | The watch identifier token, as specified in the |
350|             | `WATCH` operation                               |
351
352\pagebreak
353
354### TRANSACTION_DATA
355
356The image format will contain a `TRANSACTION_DATA` record for each transaction
357that is pending on a connection for which there is `CONNECTION_DATA` record
358previously present.
359
360
361```
362    0       1       2       3    octet
363+-------+-------+-------+-------+
364| conn-id                       |
365+-------------------------------+
366| tx-id                         |
367+-------------------------------+
368```
369
370
371| Field          | Description                                  |
372|----------------|----------------------------------------------|
373| `conn-id`      | The connection that issued the               |
374|                | `TRANSACTION_START` operation [2]            |
375|                |                                              |
376| `tx-id`        | The transaction id passed back to the domain |
377|                | by the `TRANSACTION_START` operation         |
378
379\pagebreak
380
381### NODE_DATA
382
383For live update the image format will contain a `NODE_DATA` record for each
384node in xenstore. For migration it will only contain a record for the nodes
385relating to the domain being migrated. The `NODE_DATA` may be related to
386a _committed_ node (globally visible in xenstored) or a _pending_ node (created
387or modified by a transaction for which there is also a `TRANSACTION_DATA`
388record previously present).
389
390Each _committed_ node in the stream is required to have an already known parent
391node. A parent node is known if it was either in the node database before the
392stream was started to be processed, or if a `NODE_DATA` record for that parent
393node has already been processed in the stream.
394
395
396```
397    0       1       2       3    octet
398+-------+-------+-------+-------+
399| conn-id                       |
400+-------------------------------+
401| tx-id                         |
402+---------------+---------------+
403| path-len      | value-len     |
404+---------------+---------------+
405| access        | perm-count    |
406+---------------+---------------+
407| perm1                         |
408+-------------------------------+
409...
410+-------------------------------+
411| permN                         |
412+---------------+---------------+
413| path
414...
415| value
416...
417```
418
419
420| Field        | Description                                    |
421|--------------|------------------------------------------------|
422| `conn-id`    | If this value is non-zero then this record     |
423|              | related to a pending transaction               |
424|              |                                                |
425| `tx-id`      | This value should be ignored if `conn-id` is   |
426|              | zero. Otherwise it specifies the id of the     |
427|              | pending transaction                            |
428|              |                                                |
429| `path-len`   | The length (in octets) of `path` including the |
430|              | NUL terminator                                 |
431|              |                                                |
432| `value-len`  | The length (in octets) of `value` (which will  |
433|              | be zero for a deleted node)                    |
434|              |                                                |
435| `access`     | This value should be ignored if this record    |
436|              | does not relate to a pending transaction,      |
437|              | otherwise it specifies the accesses made to    |
438|              | the node and hence is a bitwise OR of:         |
439|              |                                                |
440|              | 0x0001: read                                   |
441|              | 0x0002: written                                |
442|              |                                                |
443|              | The value will be zero for a deleted node      |
444|              |                                                |
445| `perm-count` | The number (N) of node permission specifiers   |
446|              | (which will be 0 for a node deleted in a       |
447|              | pending transaction)                           |
448|              |                                                |
449| `perm1..N`   | A list of zero or more node permission         |
450|              | specifiers (see below)                         |
451|              |                                                |
452| `path`       | The absolute path of the node                  |
453|              |                                                |
454| `value`      | The node value (which may be empty or contain  |
455|              | NUL octets)                                    |
456
457
458A node permission specifier has the following format:
459
460
461```
462    0       1       2       3    octet
463+-------+-------+-------+-------+
464| perm  | flags | domid         |
465+-------+-------+---------------+
466```
467
468| Field   | Description                                         |
469|---------|-----------------------------------------------------|
470| `perm`  | One of the ASCII values `w`, `r`, `b` or `n` as     |
471|         | specified for the `SET_PERMS` operation [2]         |
472|         |                                                     |
473| `flags` | A bit-wise OR of:                                   |
474|         | 0x01: stale permission, ignore when checking        |
475|         |       permissions                                   |
476|         |                                                     |
477| `domid` | The domain-id to which the permission relates       |
478
479Note that perm1 defines the domain owning the node. See [4] for more
480explanation of node permissions.
481
482\pagebreak
483
484### GLOBAL_QUOTA_DATA
485
486This record is only relevant for live update. It contains the global settings
487of xenstored quota.
488
489```
490    0       1       2       3    octet
491+-------+-------+-------+-------+
492| n-dom-quota   | n-glob-quota  |
493+---------------+---------------+
494| quota-val 1                   |
495+-------------------------------+
496...
497+-------------------------------+
498| quota-val N                   |
499+-------------------------------+
500| quota-names
501...
502```
503
504
505| Field          | Description                                  |
506|----------------|----------------------------------------------|
507| `n-dom-quota`  | Number of quota values which apply per       |
508|                | domain by default.                                      |
509|                |                                              |
510| `n-glob-quota` | Number of quota values which apply globally  |
511|                | only.                                        |
512|                |                                              |
513| `quota-val`    | Quota values, first the ones applying per    |
514|                | domain, then the ones applying globally. A   |
515|                | value of 0 has the semantics of "unlimited". |
516|                |                                              |
517| `quota-names`  | 0 delimited strings of the quota names in    |
518|                | the same sequence as the `quota-val` values. |
519
520
521Allowed quota names are those explicitly named in [2] for the `GET_QUOTA`
522and `SET_QUOTA` commands, plus implementation specific ones. Quota names not
523recognized by the receiving side should not have any effect on behavior for
524the receiving side (they can be ignored or preserved for inclusion in
525future live migration/update streams).
526
527\pagebreak
528
529### DOMAIN_DATA
530
531This record is optional and can be present once for each domain.
532
533
534```
535    0       1       2       3     octet
536+-------+-------+-------+-------+
537| domain-id     | n-quota       |
538+---------------+---------------+
539| features                      |
540+-------------------------------+
541| quota-val 1                   |
542+-------------------------------+
543...
544+-------------------------------+
545| quota-val N                   |
546+-------------------------------+
547| quota-names
548...
549```
550
551
552| Field          | Description                                  |
553|----------------|----------------------------------------------|
554| `domain-id`    | The domain-id of the domain this record      |
555|                | belongs to.                                  |
556|                |                                              |
557| `n-quota`      | Number of quota values.                      |
558|                |                                              |
559| `features`     | Value of the feature field visible by the    |
560|                | guest at offset 2064 of the ring page.       |
561|                | Only valid for version 2 and later.          |
562|                |                                              |
563| `quota-val`    | Quota values, a value of 0 has the semantics |
564|                | "unlimited".                                 |
565|                |                                              |
566| `quota-names`  | 0 delimited strings of the quota names in    |
567|                | the same sequence as the `quota-val` values. |
568
569Allowed quota names are those explicitly named in [2] for the `GET_QUOTA`
570and `SET_QUOTA` commands, plus implementation specific ones. Quota names not
571recognized by the receiving side should not have any effect on behavior for
572the receiving side (they can be ignored or preserved for inclusion in
573future live migration/update streams).
574
575\pagebreak
576
577
578* * *
579
580[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
581
582[2] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore.txt
583
584[3] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/xen.h;hb=HEAD#l612
585
586[4] https://wiki.xen.org/wiki/XenBus
587