1# Xenstore Migration 2 3## Background 4 5The design for *Non-Cooperative Migration of Guests*[1] explains that extra 6save records are required in the migrations stream to allow a guest running PV 7drivers to be migrated without its co-operation. Moreover the save records must 8include details of registered xenstore watches as well as content; information 9that cannot currently be recovered from `xenstored`, and hence some extension 10to the xenstored implementations will also be required. 11 12As a similar set of data is needed for transferring xenstore data from one 13instance to another when live updating xenstored this document proposes an 14image format for a 'migration stream' suitable for both purposes. 15 16## Proposal 17 18The image format consists of a _header_ followed by 1 or more _records_. Each 19record consists of a type and length field, followed by any data mandated by 20the record type. At minimum there will be a single record of type `END` 21(defined below). 22 23### Header 24 25The header identifies the stream as a `xenstore` stream, including the version 26of the specification that it complies with. 27 28All fields in this header must be in _big-endian_ byte order, regardless of 29the setting of the endianness bit. 30 31 32``` 33 0 1 2 3 4 5 6 7 octet 34+-------+-------+-------+-------+-------+-------+-------+-------+ 35| ident | 36+-------------------------------+-------------------------------| 37| version | flags | 38+-------------------------------+-------------------------------+ 39``` 40 41 42| Field | Description | 43|-----------|---------------------------------------------------| 44| `ident` | 0x78656e73746f7265 ('xenstore' in ASCII) | 45| | | 46| `version` | The version of the specification, defined values: | 47| | 0x00000001: all fields and records without any | 48| | explicitly mentioned version | 49| | dependency are valid. | 50| | 0x00000002: all fields and records valid for | 51| | version 1 plus fields and records | 52| | explicitly stated to be supported in | 53| | version 2 are valid. | 54| | | 55| `flags` | 0 (LSB): Endianness: 0 = little, 1 = big | 56| | | 57| | 1-31: Reserved (must be zero) | 58 59### Records 60 61Records immediately follow the header and have the following format: 62 63 64``` 65 0 1 2 3 4 5 6 7 octet 66+-------+-------+-------+-------+-------+-------+-------+-------+ 67| type | len | 68+-------------------------------+-------------------------------+ 69| body 70... 71| | padding (0 to 7 octets) | 72+-------+-------------------------------------------------------+ 73``` 74 75NOTE: padding octets or fields not valid in the used version here and in all 76 subsequent format specifications must be written as zero and should be 77 ignored when the stream is read. 78 79 80| Field | Description | 81|--------|------------------------------------------------------| 82| `type` | 0x00000000: END | 83| | 0x00000001: GLOBAL_DATA | 84| | 0x00000002: CONNECTION_DATA | 85| | 0x00000003: WATCH_DATA | 86| | 0x00000004: TRANSACTION_DATA | 87| | 0x00000005: NODE_DATA | 88| | 0x00000006: GLOBAL_QUOTA_DATA | 89| | 0x00000007: DOMAIN_DATA | 90| | 0x00000008: WATCH_DATA_EXTENDED (version 2 and up) | 91| | 0x00000009 - 0xFFFFFFFF: reserved for future use | 92| | | 93| `len` | The length (in octets) of `body` | 94| | | 95| `body` | The type-specific record data | 96 97Some records will depend on other records in the migration stream. Records 98upon which other records depend must always appear earlier in the stream. 99 100The various formats of the type-specific data are described in the following 101sections: 102 103\pagebreak 104 105### END 106 107The end record marks the end of the image, and is the final record 108in the stream. 109 110``` 111 0 1 2 3 4 5 6 7 octet 112+-------+-------+-------+-------+-------+-------+-------+-------+ 113``` 114 115 116The end record contains no fields; its body length is 0. 117 118\pagebreak 119 120### GLOBAL_DATA 121 122This record is only relevant for live update. It contains details of global 123xenstored state that needs to be restored. 124 125``` 126 0 1 2 3 octet 127+-------+-------+-------+-------+ 128| rw-socket-fd | 129+-------------------------------+ 130| evtchn-fd | 131+-------------------------------+ 132``` 133 134 135| Field | Description | 136|----------------|----------------------------------------------| 137| `rw-socket-fd` | The file descriptor of the socket accepting | 138| | read-write connections | 139| | | 140| `evtchn-fd` | The file descriptor used to communicate with | 141| | the event channel driver | 142 143xenstored will resume in the original process context. Hence `rw-socket-fd` 144simply specifies the file descriptor of the socket. Sockets are not always 145used, however, and so -1 will be used to denote an unused socket. 146 147\pagebreak 148 149### CONNECTION_DATA 150 151For live update the image format will contain a `CONNECTION_DATA` record for 152each connection to xenstore. For migration it will only contain a record for 153the domain being migrated. 154 155 156``` 157 0 1 2 3 4 5 6 7 octet 158+-------+-------+-------+-------+-------+-------+-------+-------+ 159| conn-id | conn-type | | 160+-------------------------------+---------------+---------------+ 161| conn-spec 162... 163+---------------+---------------+-------------------------------+ 164| in-data-len | out-resp-len | out-data-len | 165+---------------+---------------+-------------------------------+ 166| data 167... 168``` 169 170 171| Field | Description | 172|----------------|----------------------------------------------| 173| `conn-id` | A non-zero number used to identify this | 174| | connection in subsequent connection-specific | 175| | records | 176| | | 177| `conn-type` | 0x0000: shared ring | 178| | 0x0001: socket | 179| | 0x0002 - 0xFFFF: reserved for future use | 180| | | 181| `conn-spec` | See below | 182| | | 183| `in-data-len` | The length (in octets) of any data read | 184| | from the connection not yet processed | 185| | | 186| `out-resp-len` | The length (in octets) of a partial response | 187| | not yet written to the connection | 188| | | 189| `out-data-len` | The length (in octets) of any pending data | 190| | not yet written to the connection, including | 191| | a partial response (see `out-resp-len`) | 192| | | 193| `data` | Pending data: first in-data-len octets of | 194| | read data, then out-data-len octets of | 195| | written data (any of both may be empty) | 196 197In case of live update the connection record for the connection via which 198the live update command was issued will contain the response for the live 199update command in the pending not yet written data. 200 201\pagebreak 202 203The format of `conn-spec` is dependent upon `conn-type`. 204 205For `shared ring` connections it is as follows: 206 207 208``` 209 0 1 2 3 4 5 6 7 octet 210+---------------+---------------+---------------+---------------+ 211| domid | tdomid | evtchn | 212+-------------------------------+-------------------------------+ 213``` 214 215 216| Field | Description | 217|-----------|---------------------------------------------------| 218| `domid` | The domain-id that owns the shared page | 219| | | 220| `tdomid` | The domain-id that `domid` acts on behalf of if | 221| | it has been subject to an SET_TARGET | 222| | operation [2] or DOMID_INVALID [3] otherwise | 223| | | 224| `evtchn` | The port number of the interdomain channel used | 225| | by xenstored to communicate with `domid` | 226| | | 227 228The GFN of the shared page is not preserved because the ABI reserves 229entry 1 in `domid`'s grant table to point to the xenstore shared page. 230Note there is no guarantee the page will still be valid at the time of 231the restore because a domain can revoke the permission. 232 233For `socket` connections it is as follows: 234 235 236``` 237+---------------+---------------+---------------+---------------+ 238| socket-fd | pad | 239+-------------------------------+-------------------------------+ 240``` 241 242 243| Field | Description | 244|-------------|-------------------------------------------------| 245| `socket-fd` | The file descriptor of the connected socket | 246 247This type of connection is only relevant for live update, where the xenstored 248resumes in the original process context. Hence `socket-fd` simply specify 249the file descriptor of the socket connection. 250 251\pagebreak 252 253### WATCH_DATA 254 255The image format will contain either a `WATCH_DATA` or a `WATCH_DATA_EXTENDED` 256record for each watch registered by a connection for which there is 257`CONNECTION_DATA` record previously present. 258 259``` 260 0 1 2 3 octet 261+-------+-------+-------+-------+ 262| conn-id | 263+---------------+---------------+ 264| wpath-len | token-len | 265+---------------+---------------+ 266| wpath 267... 268| token 269... 270``` 271 272 273| Field | Description | 274|-------------|-------------------------------------------------| 275| `conn-id` | The connection that issued the `WATCH` | 276| | operation [2] | 277| | | 278| `wpath-len` | The length (in octets) of `wpath` including the | 279| | NUL terminator | 280| | | 281| `token-len` | The length (in octets) of `token` including the | 282| | NUL terminator | 283| | | 284| `wpath` | The watch path, as specified in the `WATCH` | 285| | operation | 286| | | 287| `token` | The watch identifier token, as specified in the | 288| | `WATCH` operation | 289 290\pagebreak 291 292### WATCH_DATA_EXTENDED 293 294The image format will contain either a `WATCH_DATA` or a `WATCH_DATA_EXTENDED` 295record for each watch registered by a connection for which there is 296`CONNECTION_DATA` record previously present. The `WATCH_DATA_EXTENDED` record 297type is valid only in version 2 and later. 298 299``` 300 0 1 2 3 octet 301+-------+-------+-------+-------+ 302| conn-id | 303+---------------+---------------+ 304| wpath-len | token-len | 305+---------------+---------------+ 306| depth | pad | 307+---------------+---------------+ 308| wpath 309... 310| token 311... 312``` 313 314 315| Field | Description | 316|-------------|-------------------------------------------------| 317| `conn-id` | The connection that issued the `WATCH` | 318| | operation [2] | 319| | | 320| `wpath-len` | The length (in octets) of `wpath` including the | 321| | NUL terminator | 322| | | 323| `token-len` | The length (in octets) of `token` including the | 324| | NUL terminator | 325| | | 326| `depth` | The number of directory levels below the | 327| | watched path to consider for a match. | 328| | A value of 0xffff is used for unlimited depth. | 329| | | 330| `wpath` | The watch path, as specified in the `WATCH` | 331| | operation | 332| | | 333| `token` | The watch identifier token, as specified in the | 334| | `WATCH` operation | 335 336\pagebreak 337 338### TRANSACTION_DATA 339 340The image format will contain a `TRANSACTION_DATA` record for each transaction 341that is pending on a connection for which there is `CONNECTION_DATA` record 342previously present. 343 344 345``` 346 0 1 2 3 octet 347+-------+-------+-------+-------+ 348| conn-id | 349+-------------------------------+ 350| tx-id | 351+-------------------------------+ 352``` 353 354 355| Field | Description | 356|----------------|----------------------------------------------| 357| `conn-id` | The connection that issued the | 358| | `TRANSACTION_START` operation [2] | 359| | | 360| `tx-id` | The transaction id passed back to the domain | 361| | by the `TRANSACTION_START` operation | 362 363\pagebreak 364 365### NODE_DATA 366 367For live update the image format will contain a `NODE_DATA` record for each 368node in xenstore. For migration it will only contain a record for the nodes 369relating to the domain being migrated. The `NODE_DATA` may be related to 370a _committed_ node (globally visible in xenstored) or a _pending_ node (created 371or modified by a transaction for which there is also a `TRANSACTION_DATA` 372record previously present). 373 374Each _committed_ node in the stream is required to have an already known parent 375node. A parent node is known if it was either in the node data base before the 376stream was started to be processed, or if a `NODE_DATA` record for that parent 377node has already been processed in the stream. 378 379 380``` 381 0 1 2 3 octet 382+-------+-------+-------+-------+ 383| conn-id | 384+-------------------------------+ 385| tx-id | 386+---------------+---------------+ 387| path-len | value-len | 388+---------------+---------------+ 389| access | perm-count | 390+---------------+---------------+ 391| perm1 | 392+-------------------------------+ 393... 394+-------------------------------+ 395| permN | 396+---------------+---------------+ 397| path 398... 399| value 400... 401``` 402 403 404| Field | Description | 405|--------------|------------------------------------------------| 406| `conn-id` | If this value is non-zero then this record | 407| | related to a pending transaction | 408| | | 409| `tx-id` | This value should be ignored if `conn-id` is | 410| | zero. Otherwise it specifies the id of the | 411| | pending transaction | 412| | | 413| `path-len` | The length (in octets) of `path` including the | 414| | NUL terminator | 415| | | 416| `value-len` | The length (in octets) of `value` (which will | 417| | be zero for a deleted node) | 418| | | 419| `access` | This value should be ignored if this record | 420| | does not relate to a pending transaction, | 421| | otherwise it specifies the accesses made to | 422| | the node and hence is a bitwise OR of: | 423| | | 424| | 0x0001: read | 425| | 0x0002: written | 426| | | 427| | The value will be zero for a deleted node | 428| | | 429| `perm-count` | The number (N) of node permission specifiers | 430| | (which will be 0 for a node deleted in a | 431| | pending transaction) | 432| | | 433| `perm1..N` | A list of zero or more node permission | 434| | specifiers (see below) | 435| | | 436| `path` | The absolute path of the node | 437| | | 438| `value` | The node value (which may be empty or contain | 439| | NUL octets) | 440 441 442A node permission specifier has the following format: 443 444 445``` 446 0 1 2 3 octet 447+-------+-------+-------+-------+ 448| perm | flags | domid | 449+-------+-------+---------------+ 450``` 451 452| Field | Description | 453|---------|-----------------------------------------------------| 454| `perm` | One of the ASCII values `w`, `r`, `b` or `n` as | 455| | specified for the `SET_PERMS` operation [2] | 456| | | 457| `flags` | A bit-wise OR of: | 458| | 0x01: stale permission, ignore when checking | 459| | permissions | 460| | | 461| `domid` | The domain-id to which the permission relates | 462 463Note that perm1 defines the domain owning the node. See [4] for more 464explanation of node permissions. 465 466\pagebreak 467 468### GLOBAL_QUOTA_DATA 469 470This record is only relevant for live update. It contains the global settings 471of xenstored quota. 472 473``` 474 0 1 2 3 octet 475+-------+-------+-------+-------+ 476| n-dom-quota | n-glob-quota | 477+---------------+---------------+ 478| quota-val 1 | 479+-------------------------------+ 480... 481+-------------------------------+ 482| quota-val N | 483+-------------------------------+ 484| quota-names 485... 486``` 487 488 489| Field | Description | 490|----------------|----------------------------------------------| 491| `n-dom-quota` | Number of quota values which apply per | 492| | domain by default. | 493| | | 494| `n-glob-quota` | Number of quota values which apply globally | 495| | only. | 496| | | 497| `quota-val` | Quota values, first the ones applying per | 498| | domain, then the ones applying globally. A | 499| | value of 0 has the semantics of "unlimited". | 500| | | 501| `quota-names` | 0 delimited strings of the quota names in | 502| | the same sequence as the `quota-val` values. | 503 504 505Allowed quota names are those explicitly named in [2] for the `GET_QUOTA` 506and `SET_QUOTA` commands, plus implementation specific ones. Quota names not 507recognized by the receiving side should not have any effect on behavior for 508the receiving side (they can be ignored or preserved for inclusion in 509future live migration/update streams). 510 511\pagebreak 512 513### DOMAIN_DATA 514 515This record is optional and can be present once for each domain. 516 517 518``` 519 0 1 2 3 octet 520+-------+-------+-------+-------+ 521| domain-id | n-quota | 522+---------------+---------------+ 523| features | 524+-------------------------------+ 525| quota-val 1 | 526+-------------------------------+ 527... 528+-------------------------------+ 529| quota-val N | 530+-------------------------------+ 531| quota-names 532... 533``` 534 535 536| Field | Description | 537|----------------|----------------------------------------------| 538| `domain-id` | The domain-id of the domain this record | 539| | belongs to. | 540| | | 541| `n-quota` | Number of quota values. | 542| | | 543| `features` | Value of the feature field visible by the | 544| | guest at offset 2064 of the ring page. | 545| | Only valid for version 2 and later. | 546| | | 547| `quota-val` | Quota values, a value of 0 has the semantics | 548| | "unlimited". | 549| | | 550| `quota-names` | 0 delimited strings of the quota names in | 551| | the same sequence as the `quota-val` values. | 552 553Allowed quota names are those explicitly named in [2] for the `GET_QUOTA` 554and `SET_QUOTA` commands, plus implementation specific ones. Quota names not 555recognized by the receiving side should not have any effect on behavior for 556the receiving side (they can be ignored or preserved for inclusion in 557future live migration/update streams). 558 559\pagebreak 560 561 562* * * 563 564[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md 565 566[2] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore.txt 567 568[3] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/xen.h;hb=HEAD#l612 569 570[4] https://wiki.xen.org/wiki/XenBus 571