1# Xenstore Migration 2 3## Background 4 5The design for *Non-Cooperative Migration of Guests*[1] explains that extra 6save records are required in the migrations stream to allow a guest running PV 7drivers to be migrated without its co-operation. Moreover the save records must 8include details of registered xenstore watches as well as content; information 9that cannot currently be recovered from `xenstored`, and hence some extension 10to the xenstored implementations will also be required. 11 12As a similar set of data is needed for transferring xenstore data from one 13instance to another when live updating xenstored this document proposes an 14image format for a 'migration stream' suitable for both purposes. 15 16## Proposal 17 18The image format consists of a _header_ followed by 1 or more _records_. Each 19record consists of a type and length field, followed by any data mandated by 20the record type. At minimum there will be a single record of type `END` 21(defined below). 22 23### Header 24 25The header identifies the stream as a `xenstore` stream, including the version 26of the specification that it complies with. 27 28All fields in this header must be in _big-endian_ byte order, regardless of 29the setting of the endianness bit. 30 31 32``` 33 0 1 2 3 4 5 6 7 octet 34+-------+-------+-------+-------+-------+-------+-------+-------+ 35| ident | 36+-------------------------------+-------------------------------| 37| version | flags | 38+-------------------------------+-------------------------------+ 39``` 40 41 42| Field | Description | 43|-----------|---------------------------------------------------| 44| `ident` | 0x78656e73746f7265 ('xenstore' in ASCII) | 45| | | 46| `version` | The version of the specification, defined values: | 47| | 0x00000001: all fields and records without any | 48| | explicitly mentioned version | 49| | dependency are valid. | 50| | 0x00000002: all fields and records valid for | 51| | version 1 plus fields and records | 52| | explicitly stated to be supported in | 53| | version 2 are valid. | 54| | | 55| `flags` | 0 (LSB): Endianness: 0 = little, 1 = big | 56| | | 57| | 1-31: Reserved (must be zero) | 58 59### Records 60 61Records immediately follow the header and have the following format: 62 63 64``` 65 0 1 2 3 4 5 6 7 octet 66+-------+-------+-------+-------+-------+-------+-------+-------+ 67| type | len | 68+-------------------------------+-------------------------------+ 69| body 70... 71| | padding (0 to 7 octets) | 72+-------+-------------------------------------------------------+ 73``` 74 75NOTE: padding octets or fields not valid in the used version here and in all 76 subsequent format specifications must be written as zero and should be 77 ignored when the stream is read. 78 79 80| Field | Description | 81|--------|------------------------------------------------------| 82| `type` | 0x00000000: END | 83| | 0x00000001: GLOBAL_DATA | 84| | 0x00000002: CONNECTION_DATA | 85| | 0x00000003: WATCH_DATA | 86| | 0x00000004: TRANSACTION_DATA | 87| | 0x00000005: NODE_DATA | 88| | 0x00000006: GLOBAL_QUOTA_DATA | 89| | 0x00000007: DOMAIN_DATA | 90| | 0x00000008: WATCH_DATA_EXTENDED (version 2 and up) | 91| | 0x00000009 - 0xFFFFFFFF: reserved for future use | 92| | | 93| `len` | The length (in octets) of `body` | 94| | | 95| `body` | The type-specific record data | 96 97Some records will depend on other records in the migration stream. Records 98upon which other records depend must always appear earlier in the stream. 99 100The various formats of the type-specific data are described in the following 101sections: 102 103\pagebreak 104 105### END 106 107The end record marks the end of the image, and is the final record 108in the stream. 109 110``` 111 0 1 2 3 4 5 6 7 octet 112+-------+-------+-------+-------+-------+-------+-------+-------+ 113``` 114 115 116The end record contains no fields; its body length is 0. 117 118\pagebreak 119 120### GLOBAL_DATA 121 122This record is only relevant for live update. It contains details of global 123xenstored state that needs to be restored. 124 125``` 126 0 1 2 3 octet 127+-------+-------+-------+-------+ 128| rw-socket-fd | 129+-------------------------------+ 130| evtchn-fd | 131+-------------------------------+ 132``` 133 134 135| Field | Description | 136|----------------|----------------------------------------------| 137| `rw-socket-fd` | The file descriptor of the socket accepting | 138| | read-write connections | 139| | | 140| `evtchn-fd` | The file descriptor used to communicate with | 141| | the event channel driver | 142 143xenstored will resume in the original process context. Hence `rw-socket-fd` 144simply specifies the file descriptor of the socket. Sockets are not always 145used, however, and so -1 will be used to denote an unused socket. 146 147\pagebreak 148 149### CONNECTION_DATA 150 151For live update the image format will contain a `CONNECTION_DATA` record for 152each connection to xenstore. For migration it will only contain a record for 153the domain being migrated. 154 155 156``` 157 0 1 2 3 4 5 6 7 octet 158+-------+-------+-------+-------+-------+-------+-------+-------+ 159| conn-id | conn-type | fields | 160+-------------------------------+---------------+---------------+ 161| conn-spec 162... 163+---------------+---------------+-------------------------------+ 164| in-data-len | out-resp-len | out-data-len | 165+---------------+---------------+-------------------------------+ 166| data 167... 168+---------------------------------------------------------------+ 169| unique-id | 170+---------------------------------------------------------------+ 171``` 172 173 174| Field | Description | 175|----------------|----------------------------------------------| 176| `conn-id` | A non-zero number used to identify this | 177| | connection in subsequent connection-specific | 178| | records | 179| | | 180| `conn-type` | 0x0000: shared ring | 181| | 0x0001: socket | 182| | 0x0002 - 0xFFFF: reserved for future use | 183| | | 184| `fields` | A collection of flags indicating presence | 185| | of additional fields after the variable | 186| | length `data` part. The additional fields | 187| | will start after a possible padding for | 188| | aligning to a 8 octet boundary. | 189| | Defined flag values (to be or-ed): | 190| | 0x0001: `unique_id` present (only needed for | 191| | `shared ring` connection in live | 192| | update streams). | 193| | | 194| `conn-spec` | See below | 195| | | 196| `in-data-len` | The length (in octets) of any data read | 197| | from the connection not yet processed | 198| | | 199| `out-resp-len` | The length (in octets) of a partial response | 200| | not yet written to the connection | 201| | | 202| `out-data-len` | The length (in octets) of any pending data | 203| | not yet written to the connection, including | 204| | a partial response (see `out-resp-len`) | 205| | | 206| `data` | Pending data: first in-data-len octets of | 207| | read data, then out-data-len octets of | 208| | written data (any of both may be empty) | 209| | | 210| `unique-id` | Unique identifier of a domain | 211| | | 212 213In case of live update the connection record for the connection via which 214the live update command was issued will contain the response for the live 215update command in the pending not yet written data. 216 217\pagebreak 218 219The format of `conn-spec` is dependent upon `conn-type`. 220 221For `shared ring` connections it is as follows: 222 223 224``` 225 0 1 2 3 4 5 6 7 octet 226+---------------+---------------+---------------+---------------+ 227| domid | tdomid | evtchn | 228+-------------------------------+-------------------------------+ 229``` 230 231 232| Field | Description | 233|-----------|---------------------------------------------------| 234| `domid` | The domain-id that owns the shared page | 235| | | 236| `tdomid` | The domain-id that `domid` acts on behalf of if | 237| | it has been subject to an SET_TARGET | 238| | operation [2] or DOMID_INVALID [3] otherwise | 239| | | 240| `evtchn` | The port number of the interdomain channel used | 241| | by xenstored to communicate with `domid` | 242| | | 243 244The GFN of the shared page is not preserved because the ABI reserves 245entry 1 in `domid`'s grant table to point to the xenstore shared page. 246Note there is no guarantee the page will still be valid at the time of 247the restore because a domain can revoke the permission. 248 249For `socket` connections it is as follows: 250 251 252``` 253+---------------+---------------+---------------+---------------+ 254| socket-fd | pad | 255+-------------------------------+-------------------------------+ 256``` 257 258 259| Field | Description | 260|-------------|-------------------------------------------------| 261| `socket-fd` | The file descriptor of the connected socket | 262 263This type of connection is only relevant for live update, where the xenstored 264resumes in the original process context. Hence `socket-fd` simply specify 265the file descriptor of the socket connection. 266 267\pagebreak 268 269### WATCH_DATA 270 271The image format will contain either a `WATCH_DATA` or a `WATCH_DATA_EXTENDED` 272record for each watch registered by a connection for which there is 273`CONNECTION_DATA` record previously present. 274 275``` 276 0 1 2 3 octet 277+-------+-------+-------+-------+ 278| conn-id | 279+---------------+---------------+ 280| wpath-len | token-len | 281+---------------+---------------+ 282| wpath 283... 284| token 285... 286``` 287 288 289| Field | Description | 290|-------------|-------------------------------------------------| 291| `conn-id` | The connection that issued the `WATCH` | 292| | operation [2] | 293| | | 294| `wpath-len` | The length (in octets) of `wpath` including the | 295| | NUL terminator | 296| | | 297| `token-len` | The length (in octets) of `token` including the | 298| | NUL terminator | 299| | | 300| `wpath` | The watch path, as specified in the `WATCH` | 301| | operation | 302| | | 303| `token` | The watch identifier token, as specified in the | 304| | `WATCH` operation | 305 306\pagebreak 307 308### WATCH_DATA_EXTENDED 309 310The image format will contain either a `WATCH_DATA` or a `WATCH_DATA_EXTENDED` 311record for each watch registered by a connection for which there is 312`CONNECTION_DATA` record previously present. The `WATCH_DATA_EXTENDED` record 313type is valid only in version 2 and later. 314 315``` 316 0 1 2 3 octet 317+-------+-------+-------+-------+ 318| conn-id | 319+---------------+---------------+ 320| wpath-len | token-len | 321+---------------+---------------+ 322| depth | pad | 323+---------------+---------------+ 324| wpath 325... 326| token 327... 328``` 329 330 331| Field | Description | 332|-------------|-------------------------------------------------| 333| `conn-id` | The connection that issued the `WATCH` | 334| | operation [2] | 335| | | 336| `wpath-len` | The length (in octets) of `wpath` including the | 337| | NUL terminator | 338| | | 339| `token-len` | The length (in octets) of `token` including the | 340| | NUL terminator | 341| | | 342| `depth` | The number of directory levels below the | 343| | watched path to consider for a match. | 344| | A value of 0xffff is used for unlimited depth. | 345| | | 346| `wpath` | The watch path, as specified in the `WATCH` | 347| | operation | 348| | | 349| `token` | The watch identifier token, as specified in the | 350| | `WATCH` operation | 351 352\pagebreak 353 354### TRANSACTION_DATA 355 356The image format will contain a `TRANSACTION_DATA` record for each transaction 357that is pending on a connection for which there is `CONNECTION_DATA` record 358previously present. 359 360 361``` 362 0 1 2 3 octet 363+-------+-------+-------+-------+ 364| conn-id | 365+-------------------------------+ 366| tx-id | 367+-------------------------------+ 368``` 369 370 371| Field | Description | 372|----------------|----------------------------------------------| 373| `conn-id` | The connection that issued the | 374| | `TRANSACTION_START` operation [2] | 375| | | 376| `tx-id` | The transaction id passed back to the domain | 377| | by the `TRANSACTION_START` operation | 378 379\pagebreak 380 381### NODE_DATA 382 383For live update the image format will contain a `NODE_DATA` record for each 384node in xenstore. For migration it will only contain a record for the nodes 385relating to the domain being migrated. The `NODE_DATA` may be related to 386a _committed_ node (globally visible in xenstored) or a _pending_ node (created 387or modified by a transaction for which there is also a `TRANSACTION_DATA` 388record previously present). 389 390Each _committed_ node in the stream is required to have an already known parent 391node. A parent node is known if it was either in the node database before the 392stream was started to be processed, or if a `NODE_DATA` record for that parent 393node has already been processed in the stream. 394 395 396``` 397 0 1 2 3 octet 398+-------+-------+-------+-------+ 399| conn-id | 400+-------------------------------+ 401| tx-id | 402+---------------+---------------+ 403| path-len | value-len | 404+---------------+---------------+ 405| access | perm-count | 406+---------------+---------------+ 407| perm1 | 408+-------------------------------+ 409... 410+-------------------------------+ 411| permN | 412+---------------+---------------+ 413| path 414... 415| value 416... 417``` 418 419 420| Field | Description | 421|--------------|------------------------------------------------| 422| `conn-id` | If this value is non-zero then this record | 423| | related to a pending transaction | 424| | | 425| `tx-id` | This value should be ignored if `conn-id` is | 426| | zero. Otherwise it specifies the id of the | 427| | pending transaction | 428| | | 429| `path-len` | The length (in octets) of `path` including the | 430| | NUL terminator | 431| | | 432| `value-len` | The length (in octets) of `value` (which will | 433| | be zero for a deleted node) | 434| | | 435| `access` | This value should be ignored if this record | 436| | does not relate to a pending transaction, | 437| | otherwise it specifies the accesses made to | 438| | the node and hence is a bitwise OR of: | 439| | | 440| | 0x0001: read | 441| | 0x0002: written | 442| | | 443| | The value will be zero for a deleted node | 444| | | 445| `perm-count` | The number (N) of node permission specifiers | 446| | (which will be 0 for a node deleted in a | 447| | pending transaction) | 448| | | 449| `perm1..N` | A list of zero or more node permission | 450| | specifiers (see below) | 451| | | 452| `path` | The absolute path of the node | 453| | | 454| `value` | The node value (which may be empty or contain | 455| | NUL octets) | 456 457 458A node permission specifier has the following format: 459 460 461``` 462 0 1 2 3 octet 463+-------+-------+-------+-------+ 464| perm | flags | domid | 465+-------+-------+---------------+ 466``` 467 468| Field | Description | 469|---------|-----------------------------------------------------| 470| `perm` | One of the ASCII values `w`, `r`, `b` or `n` as | 471| | specified for the `SET_PERMS` operation [2] | 472| | | 473| `flags` | A bit-wise OR of: | 474| | 0x01: stale permission, ignore when checking | 475| | permissions | 476| | | 477| `domid` | The domain-id to which the permission relates | 478 479Note that perm1 defines the domain owning the node. See [4] for more 480explanation of node permissions. 481 482\pagebreak 483 484### GLOBAL_QUOTA_DATA 485 486This record is only relevant for live update. It contains the global settings 487of xenstored quota. 488 489``` 490 0 1 2 3 octet 491+-------+-------+-------+-------+ 492| n-dom-quota | n-glob-quota | 493+---------------+---------------+ 494| quota-val 1 | 495+-------------------------------+ 496... 497+-------------------------------+ 498| quota-val N | 499+-------------------------------+ 500| quota-names 501... 502``` 503 504 505| Field | Description | 506|----------------|----------------------------------------------| 507| `n-dom-quota` | Number of quota values which apply per | 508| | domain by default. | 509| | | 510| `n-glob-quota` | Number of quota values which apply globally | 511| | only. | 512| | | 513| `quota-val` | Quota values, first the ones applying per | 514| | domain, then the ones applying globally. A | 515| | value of 0 has the semantics of "unlimited". | 516| | | 517| `quota-names` | 0 delimited strings of the quota names in | 518| | the same sequence as the `quota-val` values. | 519 520 521Allowed quota names are those explicitly named in [2] for the `GET_QUOTA` 522and `SET_QUOTA` commands, plus implementation specific ones. Quota names not 523recognized by the receiving side should not have any effect on behavior for 524the receiving side (they can be ignored or preserved for inclusion in 525future live migration/update streams). 526 527\pagebreak 528 529### DOMAIN_DATA 530 531This record is optional and can be present once for each domain. 532 533 534``` 535 0 1 2 3 octet 536+-------+-------+-------+-------+ 537| domain-id | n-quota | 538+---------------+---------------+ 539| features | 540+-------------------------------+ 541| quota-val 1 | 542+-------------------------------+ 543... 544+-------------------------------+ 545| quota-val N | 546+-------------------------------+ 547| quota-names 548... 549``` 550 551 552| Field | Description | 553|----------------|----------------------------------------------| 554| `domain-id` | The domain-id of the domain this record | 555| | belongs to. | 556| | | 557| `n-quota` | Number of quota values. | 558| | | 559| `features` | Value of the feature field visible by the | 560| | guest at offset 2064 of the ring page. | 561| | Only valid for version 2 and later. | 562| | | 563| `quota-val` | Quota values, a value of 0 has the semantics | 564| | "unlimited". | 565| | | 566| `quota-names` | 0 delimited strings of the quota names in | 567| | the same sequence as the `quota-val` values. | 568 569Allowed quota names are those explicitly named in [2] for the `GET_QUOTA` 570and `SET_QUOTA` commands, plus implementation specific ones. Quota names not 571recognized by the receiving side should not have any effect on behavior for 572the receiving side (they can be ignored or preserved for inclusion in 573future live migration/update streams). 574 575\pagebreak 576 577 578* * * 579 580[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md 581 582[2] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore.txt 583 584[3] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/xen.h;hb=HEAD#l612 585 586[4] https://wiki.xen.org/wiki/XenBus 587