1Xenstore protocol specification
2-------------------------------
3
4Xenstore implements a database which maps filename-like pathnames
5(also known as `keys') to values.  Clients may read and write values,
6watch for changes, and set permissions to allow or deny access.  There
7is a rudimentary transaction system.
8
9While xenstore and most tools and APIs are capable of dealing with
10arbitrary binary data as values, this should generally be avoided.
11Data should generally be human-readable for ease of management and
12debugging; xenstore is not a high-performance facility and should be
13used only for small amounts of control plane data.  Therefore xenstore
14values should normally be 7-bit ASCII text strings containing bytes
150x20..0x7f only, and should not contain a trailing nul byte.  (The
16APIs used for accessing xenstore generally add a nul when reading, for
17the caller's convenience.)
18
19A separate specification will detail the keys and values which are
20used in the Xen system and what their meanings are.  (Sadly that
21specification currently exists only in multiple out-of-date versions.)
22
23
24Paths are /-separated and start with a /, just as Unix filenames.
25
26We can speak of two paths being <child> and <parent>, which is the
27case if they're identical, or if <parent> is /, or if <parent>/ is an
28initial substring of <child>.  (This includes <path> being a child of
29itself.)
30
31If a particular path exists, all of its parents do too.  Every
32existing path maps to a possibly empty value, and may also have zero
33or more immediate children.  There is thus no particular distinction
34between directories and leaf nodes.  However, it is conventional not
35to store nonempty values at nodes which also have children.
36
37The permitted character for paths set is ASCII alphanumerics and plus
38the four punctuation characters -/_@ (hyphen slash underscore atsign).
39@ should be avoided except to specify special watches (see below).
40Doubled slashes and trailing slashes (except to specify the root) are
41forbidden.  The empty path is also forbidden.  Paths longer than 3072
42bytes are forbidden; clients specifying relative paths should keep
43them to within 2048 bytes.  (See XENSTORE_*_PATH_MAX in xs_wire.h.)
44
45
46Communication with xenstore is via either sockets, or event channel
47and shared memory, as specified in io/xs_wire.h: each message in
48either direction is a header formatted as a struct xsd_sockmsg
49followed by xsd_sockmsg.len bytes of payload.
50
51The payload syntax varies according to the type field.  Generally
52requests each generate a reply with an identical type, req_id and
53tx_id.  However, if an error occurs, a reply will be returned with
54type ERROR, and only req_id and tx_id copied from the request.
55
56A caller who sends several requests may receive the replies in any
57order and must use req_id (and tx_id, if applicable) to match up
58replies to requests.  (The current implementation always replies to
59requests in the order received but this should not be relied on.)
60
61The payload length (len field of the header) is limited to 4096
62(XENSTORE_PAYLOAD_MAX) in both directions.  If a client exceeds the
63limit, its xenstored connection will be immediately killed by
64xenstored, which is usually catastrophic from the client's point of
65view.  Clients (particularly domains, which cannot just reconnect)
66should avoid this.
67
68Existing clients do not always contain defences against overly long
69payloads.  Increasing xenstored's limit is therefore difficult; it
70would require negotiation with the client, and obviously would make
71parts of xenstore inaccessible to some clients.  In any case passing
72bulk data through xenstore is not recommended as the performance
73properties are poor.
74
75
76---------- Xenstore protocol details - introduction ----------
77
78The payload syntax and semantics of the requests and replies are
79described below.  In the payload syntax specifications we use the
80following notations:
81
82 |		A nul (zero) byte.
83 <foo>		A string guaranteed not to contain any nul bytes.
84 <foo|>		Binary data (which may contain zero or more nul bytes)
85 <foo>|*	Zero or more strings each followed by a trailing nul
86 <foo>|+	One or more strings each followed by a trailing nul
87 ?		Reserved value (may not contain nuls)
88 ??		Reserved value (may contain nuls)
89
90Except as otherwise noted, reserved values are believed to be sent as
91empty strings by all current clients.  Clients should not send
92nonempty strings for reserved values; those parts of the protocol may
93be used for extension in the future.
94
95
96Error replies are as follows:
97
98ERROR						E<something>|
99	Where E<something> is the name of an errno value
100	listed in io/xs_wire.h.  Note that the string name
101	is transmitted, not a numeric value.
102
103
104Where no reply payload format is specified below, success responses
105have the following payload:
106						OK|
107
108Values commonly included in payloads include:
109
110    <path>
111	Specifies a path in the hierarchical key structure.
112	If <path> starts with a / it simply represents that path.
113
114	<path> is allowed not to start with /, in which case the
115	caller must be a domain (rather than connected via a socket)
116	and the path is taken to be relative to /local/domain/<domid>
117	(eg, `x/y' sent by domain 3 would mean `/local/domain/3/x/y').
118
119    <domid>
120	Integer domid, represented as decimal number 0..65535.
121	Parsing errors and values out of range generally go
122	undetected.  The special DOMID_... values (see xen.h) are
123	represented as integers; unless otherwise specified it
124	is an error not to specify a real domain id.
125
126
127
128The following are the actual type values, including the request and
129reply payloads as applicable:
130
131
132---------- Database read, write and permissions operations ----------
133
134READ			<path>|			<value|>
135WRITE			<path>|<value|>
136	Store and read the octet string <value> at <path>.
137	WRITE creates any missing parent paths, with empty values.
138
139MKDIR			<path>|
140	Ensures that the <path> exists, by necessary by creating
141	it and any missing parents with empty values.  If <path>
142	or any parent already exists, its value is left unchanged.
143
144RM			<path>|
145	Ensures that the <path> does not exist, by deleting
146	it and all of its children.  It is not an error if <path> does
147	not exist, but it _is_ an error if <path>'s immediate parent
148	does not exist either.
149
150DIRECTORY		<path>|			<child-leaf-name>|*
151	Gives a list of the immediate children of <path>, as only the
152	leafnames.  The resulting children are each named
153	<path>/<child-leaf-name>.
154
155GET_PERMS	 	<path>|			<perm-as-string>|+
156SET_PERMS		<path>|<perm-as-string>|+?
157	<perm-as-string> is one of the following
158		w<domid>	write only
159		r<domid>	read only
160		b<domid>	both read and write
161		n<domid>	no access
162	See http://wiki.xen.org/wiki/XenBus section
163	`Permissions' for details of the permissions system.
164
165---------- Watches ----------
166
167WATCH			<wpath>|<token>|?
168	Adds a watch.
169
170	When a <path> is modified (including path creation, removal,
171	contents change or permissions change) this generates an event
172	on the changed <path>.  Changes made in transactions cause an
173	event only if and when committed.  Each occurring event is
174	matched against all the watches currently set up, and each
175	matching watch results in a WATCH_EVENT message (see below).
176
177	The event's path matches the watch's <wpath> if it is an child
178	of <wpath>.
179
180	<wpath> can be a <path> to watch or @<wspecial>.  In the
181	latter case <wspecial> may have any syntax but it matches
182	(according to the rules above) only the following special
183	events which are invented by xenstored:
184	    @introduceDomain	occurs on INTRODUCE
185	    @releaseDomain 	occurs on any domain crash or
186				shutdown, and also on RELEASE
187				and domain destruction
188
189	When a watch is first set up it is triggered once straight
190	away, with <path> equal to <wpath>.  Watches may be triggered
191	spuriously.  The tx_id in a WATCH request is ignored.
192
193	Watches are supposed to be restricted by the permissions
194	system but in practice the implementation is imperfect.
195	Applications should not rely on being sent a notification for
196	paths that they cannot read; however, an application may rely
197	on being sent a watch when a path which it _is_ able to read
198	is deleted even if that leaves only a nonexistent unreadable
199	parent.  A notification may omitted if a node's permissions
200	are changed so as to make it unreadable, in which case future
201	notifications may be suppressed (and if the node is later made
202	readable, some notifications may have been lost).
203
204WATCH_EVENT					<epath>|<token>|
205	Unsolicited `reply' generated for matching modification events
206	as described above.  req_id and tx_id are both 0.
207
208	<epath> is the event's path, ie the actual path that was
209	modified; however if the event was the recursive removal of an
210	parent of <wpath>, <epath> is just
211	<wpath> (rather than the actual path which was removed).  So
212	<epath> is a child of <wpath>, regardless.
213
214	Iff <wpath> for the watch was specified as a relative pathname,
215	the <epath> path will also be relative (with the same base,
216	obviously).
217
218UNWATCH			<wpath>|<token>|?
219
220RESET_WATCHES		|
221	Reset all watches and transactions of the caller.
222
223---------- Transactions ----------
224
225TRANSACTION_START	|			<transid>|
226	<transid> is an opaque uint32_t allocated by xenstored
227	represented as unsigned decimal.  After this, transaction may
228	be referenced by using <transid> (as 32-bit binary) in the
229	tx_id request header field.  When transaction is started whole
230	db is copied; reads and writes happen on the copy.
231	It is not legal to send non-0 tx_id in TRANSACTION_START.
232
233TRANSACTION_END		T|
234TRANSACTION_END		F|
235	tx_id must refer to existing transaction.  After this
236 	request the tx_id is no longer valid and may be reused by
237	xenstore.  If F, the transaction is discarded.  If T,
238	it is committed: if there were any other intervening writes
239	then our END gets get EAGAIN.
240
241	The plan is that in the future only intervening `conflicting'
242	writes cause EAGAIN, meaning only writes or other commits
243	which changed paths which were read or written in the
244	transaction at hand.
245
246---------- Domain management and xenstored communications ----------
247
248INTRODUCE		<domid>|<mfn>|<evtchn>|?
249	Notifies xenstored to communicate with this domain.
250
251	INTRODUCE is currently only used by xend (during domain
252	startup and various forms of restore and resume), and
253	xenstored prevents its use other than by dom0.
254
255	<domid> must be a real domain id (not 0 and not a special
256	DOMID_... value).  <mfn> must be a machine page in that domain
257	represented in signed decimal (!).  <evtchn> must be event
258	channel is an unbound event channel in <domid> (likewise in
259	decimal), on which xenstored will call bind_interdomain.
260	Violations of these rules may result in undefined behaviour;
261	for example passing a high-bit-set 32-bit mfn as an unsigned
262	decimal will attempt to use 0x7fffffff instead (!).
263
264RELEASE			<domid>|
265	Manually requests that xenstored disconnect from the domain.
266	The event channel is unbound at the xenstored end and the page
267	unmapped.  If the domain is still running it won't be able to
268	communicate with xenstored.  NB that xenstored will in any
269	case detect domain destruction and disconnect by itself.
270	xenstored prevents the use of RELEASE other than by dom0.
271
272GET_DOMAIN_PATH		<domid>|		<path>|
273	Returns the domain's base path, as is used for relative
274	transactions: ie, /local/domain/<domid> (with <domid>
275	normalised).  The answer will be useless unless <domid> is a
276	real domain id.
277
278IS_DOMAIN_INTRODUCED	<domid>|		T| or F|
279	Returns T if xenstored is in communication with the domain:
280	ie, if INTRODUCE for the domain has not yet been followed by
281	domain destruction or explicit RELEASE.
282
283RESUME			<domid>|
284
285	Arranges that @releaseDomain events will once more be
286	generated when the domain becomes shut down.  This might have
287	to be used if a domain were to be shut down (generating one
288	@releaseDomain) and then subsequently restarted, since the
289	state-sensitive algorithm in xenstored will not otherwise send
290	further watch event notifications if the domain were to be
291	shut down again.
292
293	It is not clear whether this is possible since one would
294	normally expect a domain not to be restarted after being shut
295	down without being destroyed in the meantime.  There are
296	currently no users of this request in xen-unstable.
297
298	xenstored prevents the use of RESUME other than by dom0.
299
300SET_TARGET		<domid>|<tdomid>|
301	Notifies xenstored that domain <domid> is targeting domain
302	<tdomid>. This grants domain <domid> full access to paths
303	owned by <tdomid>. Domain <domid> also inherits all
304	permissions granted to <tdomid> on all other paths. This
305	allows <domid> to behave as if it were dom0 when modifying
306	paths related to <tdomid>.
307
308	xenstored prevents the use of SET_TARGET other than by dom0.
309
310---------- Miscellaneous ----------
311
312DEBUG			print|<string>|??	    sends <string> to debug log
313DEBUG			print|<thing-with-no-nul>   EINVAL
314DEBUG			check|??		    checks xenstored innards
315DEBUG			<anything-else|>	    no-op (future extension)
316
317	These requests should not generally be used and may be
318	withdrawn in the future.
319
320
321