1.. SPDX-License-Identifier: CC-BY-4.0
2
3=============================================
4C Dialect and Translation Assumptions for Xen
5=============================================
6
7This document specifies the C language dialect used by Xen and
8the assumptions Xen makes on the translation toolchain.
9It covers, in particular:
10
111. the used language extensions;
122. the translation limits that the translation toolchains must be able
13   to accommodate;
143. the implementation-defined behaviors upon which Xen may depend.
15
16All points are of course relevant for portability.  In addition,
17programming in C is impossible without a detailed knowledge of the
18implementation-defined behaviors.  For this reason, it is recommended
19that Xen developers have familiarity with this document and the
20documentation referenced therein.
21
22This document needs maintenance and adaptation in the following
23circumstances:
24
25- whenever the compiler is changed or updated;
26- whenever the use of a certain language extension is added or removed;
27- whenever code modifications cause exceeding the stated translation limits.
28
29
30Applicable C Language Standard
31______________________________
32
33Xen is written in C99 with extensions.  The relevant ISO standard is
34
35    *ISO/IEC 9899:1999/Cor 3:2007*: Programming Languages - C,
36    Technical Corrigendum 3.
37    ISO/IEC, Geneva, Switzerland, 2007.
38
39
40Reference Documentation
41_______________________
42
43The following documents are referred to in the sequel:
44
45GCC_MANUAL:
46  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/gcc.pdf
47CPP_MANUAL:
48  https://gcc.gnu.org/onlinedocs/gcc-12.1.0/cpp.pdf
49ARM64_ABI_MANUAL:
50  https://github.com/ARM-software/abi-aa/blob/60a8eb8c55e999d74dac5e368fc9d7e36e38dda4/aapcs64/aapcs64.rst
51X86_64_ABI_MANUAL:
52  https://gitlab.com/x86-psABIs/x86-64-ABI/-/jobs/artifacts/master/raw/x86-64-ABI/abi.pdf?job=build
53
54
55C Language Extensions
56_____________________
57
58
59The following table lists the extensions currently used in Xen.
60The table columns are as follows:
61
62   Extension
63      a terse description of the extension;
64   Architectures
65      a set of Xen architectures making use of the extension;
66   References
67      when available, references to the documentation explaining
68      the syntax and semantics of (each instance of) the extension.
69
70
71.. list-table::
72   :widths: 30 15 55
73   :header-rows: 1
74
75   * - Extension
76     - Architectures
77     - References
78
79   * - Non-standard tokens
80     - ARM64, X86_64
81     - _Static_assert:
82          see Section "2.1 C Language" of GCC_MANUAL.
83       asm, __asm__:
84          see Sections "6.48 Alternate Keywords" and "6.47 How to Use Inline Assembly Language in C Code" of GCC_MANUAL.
85       __volatile__:
86          see Sections "6.48 Alternate Keywords" and "6.47.2.1 Volatile" of GCC_MANUAL.
87       __const__:
88          see Section "6.48 Alternate Keywords" of GCC_MANUAL.
89       __inline, __inline__:
90          see Section "6.48 Alternate Keywords" of GCC_MANUAL.
91       typeof, __typeof__:
92          see Section "6.7 Referring to a Type with typeof" of GCC_MANUAL.
93       __alignof__, __alignof:
94          see Sections "6.48 Alternate Keywords" and "6.44 Determining the Alignment of Functions, Types or Variables" of GCC_MANUAL.
95       __attribute__:
96          see Section "6.39 Attribute Syntax" of GCC_MANUAL.
97       __builtin_types_compatible_p:
98          see Section "6.59 Other Built-in Functions Provided by GCC" of GCC_MANUAL.
99       __builtin_va_arg:
100          non-documented GCC extension.
101       __builtin_offsetof:
102          see Section "6.53 Support for offsetof" of GCC_MANUAL.
103
104   * - Empty initialization list
105     - ARM64, X86_64
106     - Non-documented GCC extension.
107
108   * - Arithmetic operator on pointer to void
109     - ARM64, X86_64
110     - See Section "6.24 Arithmetic on void- and Function-Pointers" of GCC_MANUAL."
111
112   * - Statements and declarations in expressions
113     - ARM64, X86_64
114     - See Section "6.1 Statements and Declarations in Expressions" of GCC_MANUAL.
115
116   * - Structure or union definition with no members
117     - ARM64, X86_64
118     - See Section "6.19 Structures with No Members" of GCC_MANUAL.
119
120   * - Zero size array type
121     - ARM64, X86_64
122     - See Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
123
124   * - Binary conditional expression
125     - ARM64, X86_64
126     - See Section "6.8 Conditionals with Omitted Operands" of GCC_MANUAL.
127
128   * - 'Case' label with upper/lower values
129     - ARM64, X86_64
130     - See Section "6.30 Case Ranges" of GCC_MANUAL.
131
132   * - Unnamed field that is not a bit-field
133     - ARM64, X86_64
134     - See Section "6.63 Unnamed Structure and Union Fields" of GCC_MANUAL.
135
136   * - Empty declaration
137     - ARM64, X86_64
138     - Non-documented GCC extension.
139       Note: an empty declaration is caused by a semicolon at file scope
140       with nothing before it (not to be confused with an empty statement).
141
142   * - Incomplete enum declaration
143     - ARM64
144     - See Section "6.49 Incomplete enum Types" of GCC_MANUAL.
145
146   * - Implicit conversion from a pointer to an incompatible pointer
147     - ARM64, X86_64
148     - Non-documented GCC extension.  The documentation for option
149       -Wincompatible-pointer-types in Section
150       "3.8 Options to Request or Suppress Warnings" of GCC_MANUAL
151       is possibly relevant.
152
153   * - Pointer to a function is converted to a pointer to an object or a pointer to an object is converted to a pointer to a function
154     - X86_64
155     - Non-documented GCC extension.  The information provided in
156       https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83584
157       is possibly relevant.
158
159   * - Token pasting of ',' and __VA_ARGS__
160     - ARM64, X86_64
161     - See Section "6.21 Macros with a Variable Number of Arguments" of GCC_MANUAL.
162
163   * - Named variadic macro arguments
164     - ARM64, X86_64
165     - See Section "6.21 Macros with a Variable Number of Arguments" of GCC_MANUAL.
166
167   * - No arguments for '...' parameter of variadic macro
168     - ARM64, X86_64
169     - See Section "6.21 Macros with a Variable Number of Arguments" of GCC_MANUAL.
170
171   * - void function returning void expression
172     - ARM64, X86_64
173     - See the documentation for -Wreturn-type in Section "3.8 Options to Request or Suppress Warnings" of GCC_MANUAL.
174
175   * - GNU statement expressions from macro expansion
176     - ARM64, X86_64
177     - See Section "6.1 Statements and Declarations in Expressions" of GCC_MANUAL.
178
179   * - Invalid application of sizeof to a void type
180     - ARM64, X86_64
181     - See Section "6.24 Arithmetic on void- and Function-Pointers" of GCC_MANUAL.
182
183   * - Redeclaration of already-defined enum
184     - ARM64, X86_64
185     - See Section "6.49 Incomplete enum Types" of GCC_MANUAL.
186
187   * - struct with flexible array member nested in a struct
188     - ARM64, X86_64
189     - See Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
190
191   * - struct with flexible array member used as an array element
192     - ARM64, X86_64
193     - See Section "6.18 Arrays of Length Zero" of GCC_MANUAL.
194
195   * - enumerator value outside the range of int
196     - ARM64, X86_64
197     - Non-documented GCC extension.
198
199   * - Extended integer types
200     - X86_64
201     - See Section "6.9 128-bit Integers" of GCC_MANUAL.
202
203   * - Designated initializer for a range of elements
204     - ARM64, X86_64
205     - See Section "6.29 Designated Initializers" of GCC_MANUAL
206
207   * - Signed << compiler-defined behavior
208     - All architectures
209     - See Section "4.5 Integers" of GCC_MANUAL. As an extension to the
210       C language, GCC does not use the latitude given in C99 and C11
211       only to treat certain aspects of signed << as undefined.
212
213   * - Signed >> acts on negative numbers by sign extension
214     - All architectures
215     - See Section "4.5 Integers" of GCC_MANUAL.
216
217   * - Taking the address of a label
218     - All architectures
219     - See Section "6.3 Labels as Values" of GCC_MANUAL.
220
221Translation Limits
222__________________
223
224The following table lists the translation limits that a toolchain has
225to satisfy in order to translate Xen.  The numbers given are a
226compromise: on the one hand, many modern compilers have very generous
227limits (in several cases, the only limitation is the amount of
228available memory); on the other hand we prefer setting limits that are
229not too high, because compilers do not have any obligation of
230diagnosing when a limit has been exceeded, and not too low, so as to
231avoid frequently updating this document.  In the table, only the
232limits that go beyond the minima specified by the relevant C Standard
233are listed.
234
235The table columns are as follows:
236
237   Limit
238      a terse description of the translation limit;
239   Architectures
240      a set relevant of Xen architectures;
241   Threshold
242      a value that the Xen project does not wish to exceed for that limit
243      (this is typically below, often much below what the translation
244      toolchain supports);
245   References
246      when available, references to the documentation providing evidence
247      that the translation toolchain honors the threshold (and more).
248
249.. list-table::
250   :widths: 30 15 10 45
251   :header-rows: 1
252
253   * - Limit
254     - Architectures
255     - Threshold
256     - References
257
258   * - Size of an object
259     - ARM64, X86_64
260     - 8388608
261     - The maximum size of an object is defined in the MAX_SIZE macro, and for a 32 bit architecture is 8MB.
262       The maximum size for an array is defined in the PTRDIFF_MAX and in a 32 bit architecture is 2^30-1.
263       See occurrences of these macros in GCC_MANUAL.
264
265   * - Characters in one logical source line
266     - ARM64
267     - 5000
268     - See Section "11.2 Implementation limits" of CPP_MANUAL.
269
270   * - Characters in one logical source line
271     - X86_64
272     - 12000
273     - See Section "11.2 Implementation limits" of CPP_MANUAL.
274
275   * - Nesting levels for #include files
276     - ARM64
277     - 24
278     - See Section "11.2 Implementation limits" of CPP_MANUAL.
279
280   * - Nesting levels for #include files
281     - X86_64
282     - 32
283     - See Section "11.2 Implementation limits" of CPP_MANUAL.
284
285   * - case labels for a switch statement (excluding those for any nested switch statements)
286     - X86_64
287     - 1500
288     - See Section "4.12 Statements" of GCC_MANUAL.
289
290   * - Number of significant initial characters in an external identifier
291     - ARM64, X86_64
292     - 63
293     - See Section "4.3 Identifiers" of GCC_MANUAL.
294
295
296Implementation-Defined Behaviors
297________________________________
298
299The following table lists the C language implementation-defined behaviors
300relevant for MISRA C:2012 Dir 1.1 upon which Xen may possibly depend.
301
302The table columns are as follows:
303
304   I.-D.B.
305      a terse description of the implementation-defined behavior;
306   Architectures
307      a set relevant of Xen architectures;
308   Value(s)
309      for i.-d.b.'s with values, the values allowed;
310   References
311      when available, references to the documentation providing details
312      about how the i.-d.b. is resolved by the translation toolchain.
313
314.. list-table::
315   :widths: 30 15 10 45
316   :header-rows: 1
317
318   * - I.-D.B.
319     - Architectures
320     - Value(s)
321     - References
322
323   * - Allowable bit-field types other than _Bool, signed int, and unsigned int
324     - ARM64, X86_64
325     - All explicitly signed integer types, all unsigned integer types,
326       and enumerations.
327     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields".
328
329   * - #pragma preprocessing directive that is documented as causing translation failure or some other form of undefined behavior is encountered
330     - ARM64, X86_64
331     - pack, GCC visibility
332     - #pragma pack:
333          see Section "6.62.11 Structure-Layout Pragmas" of GCC_MANUAL.
334       #pragma GCC visibility:
335          see Section "6.62.14 Visibility Pragmas" of GCC_MANUAL.
336
337   * - The number of bits in a byte
338     - ARM64
339     - 8
340     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data types" of ARM64_ABI_MANUAL.
341
342   * - The number of bits in a byte
343     - X86_64
344     - 8
345     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
346
347   * - Whether signed integer types are represented using sign and magnitude, two's complement, or one's complement, and whether the extraordinary value is a trap representation or an ordinary value
348     - ARM64, X86_64
349     - Two's complement
350     - See Section "4.5 Integers" of GCC_MANUAL.
351
352   * - Any extended integer types that exist in the implementation
353     - X86_64
354     - __uint128_t
355     - See Section "6.9 128-bit Integers" of GCC_MANUAL.
356
357   * - The number, order, and encoding of bytes in any object
358     - ARM64
359     -
360     - See Section "4.15 Architecture" of GCC_MANUAL and Chapter 5 "Data types and alignment" of ARM64_ABI_MANUAL.
361
362   * - The number, order, and encoding of bytes in any object
363     - X86_64
364     -
365     - See Section "4.15 Architecture" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
366
367   * - Whether a bit-field can straddle a storage-unit boundary
368     - ARM64
369     -
370     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields of GCC_MANUAL and Section "8.1.8 Bit-fields" of ARM64_ABI_MANUAL.
371
372   * - Whether a bit-field can straddle a storage-unit boundary
373     - X86_64
374     -
375     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
376
377   * - The order of allocation of bit-fields within a unit
378     - ARM64
379     -
380     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields of GCC_MANUAL and Section "8.1.8 Bit-fields" of ARM64_ABI_MANUAL.
381
382   * - The order of allocation of bit-fields within a unit
383     - X86_64
384     -
385     - See Section "4.9 Structures, Unions, Enumerations, and Bit-Fields" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
386
387   * - What constitutes an access to an object that has volatile-qualified type
388     - ARM64, X86_64
389     -
390     - See Section "4.10 Qualifiers" of GCC_MANUAL.
391
392   * - The values or expressions assigned to the macros specified in the headers <float.h>, <limits.h>, and <stdint.h>
393     - ARM64
394     -
395     - See Section "4.15 Architecture" of GCC_MANUAL and Chapter 5 "Data types and alignment" of ARM64_ABI_MANUAL.
396
397   * - The values or expressions assigned to the macros specified in the headers <float.h>, <limits.h>, and <stdint.h>
398     - X86_64
399     -
400     - See Section "4.15 Architecture" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
401
402   * - Character not in the basic source character set is encountered in a source file, except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token
403     - ARM64
404     - UTF-8
405     - See Section "1.1 Character sets" of CPP_MANUAL.
406       We assume the locale is not restricting any UTF-8 characters being part of the source character set.
407
408   * - The value of a char object into which has been stored any character other than a member of the basic execution character set
409     - ARM64
410     -
411     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data types" of ARM64_ABI_MANUAL.
412
413   * - The value of a char object into which has been stored any character other than a member of the basic execution character set
414     - X86_64
415     -
416     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
417
418   * - The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character
419     - ARM64
420     -
421     - See Section "4.4 Characters" of GCC_MANUAL and Section "8.1 Data types" of ARM64_ABI_MANUAL.
422
423   * - The value of an integer character constant containing more than one character or containing a character or escape sequence that does not map to a single-byte execution character
424     - X86_64
425     -
426     - See Section "4.4 Characters" of GCC_MANUAL and Section "3.1.2 Data Representation" of X86_64_ABI_MANUAL.
427
428   * - The mapping of members of the source character set
429     - ARM64, X86_64
430     -
431     - See Section "4.4 Characters" of GCC_MANUAL and the documentation for -finput-charset=charset in the same manual.
432
433   * - The members of the source and execution character sets, except as explicitly specified in the Standard
434     - ARM64, X86_64
435     - UTF-8
436     - See Section "4.4 Characters" of GCC_MANUAL
437
438   * - The values of the members of the execution character set
439     - ARM64, X86_64
440     -
441     - See Section "4.4 Characters" of GCC_MANUAL and the documentation for -fexec-charset=charset in the same manual.
442
443   * - How a diagnostic is identified
444     - ARM64, X86_64
445     -
446     - See Section "4.1 Translation" of GCC_MANUAL.
447
448   * - The places that are searched for an included < > delimited header, and how the places are specified or the header is identified
449     - ARM64, X86_64
450     -
451     - See Chapter "2 Header Files" of CPP_MANUAL.
452
453   * - How the named source file is searched for in an included " " delimited header
454     - ARM64, X86_64
455     -
456     - See Chapter "2 Header Files" of CPP_MANUAL.
457
458   * - How sequences in both forms of header names are mapped to headers or external source file names
459     - ARM64, X86_64
460     -
461     - See Chapter "2 Header Files" of CPP_MANUAL.
462
463   * - Whether the # operator inserts a \ character before the \ character that begins a universal character name in a character constant or string literal
464     - ARM64, X86_64
465     -
466     - See Section "3.4 Stringizing" of CPP_MANUAL.
467
468   * - The current locale used to convert a wide string literal into corresponding wide character codes
469     - ARM64, X86_64
470     -
471     - See Section "4.4 Characters" of GCC_MANUAL and Section "11.1 Implementation-defined behavior" of CPP_MANUAL.
472
473   * - The value of a string literal containing a multibyte character or escape sequence not represented in the execution character set
474     - X86_64
475     -
476     - See Section "4.4 Characters" of GCC_MANUAL and Section "11.1 Implementation-defined behavior" of CPP_MANUAL.
477
478   * - The behavior on each recognized #pragma directive
479     - ARM64, X86_64
480     - pack, GCC visibility
481     - See Section "4.13 Preprocessing Directives" of GCC_MANUAL and Section "7 Pragmas" of CPP_MANUAL.
482
483   * - The method by which preprocessing tokens (possibly resulting from macro expansion) in a #include directive are combined into a header name
484     - X86_64
485     -
486     - See Section "4.13 Preprocessing Directives" of GCC_MANUAL and Section "11.1 Implementation-defined behavior" of CPP_MANUAL.
487
488
489Sizes of Integer types
490______________________
491
492Xen expects System V ABI on x86_64:
493  https://gitlab.com/x86-psABIs/x86-64-ABI
494
495Xen expects AAPCS32 on ARMv8-A AArch32 and ARMv7-A:
496  https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst
497
498Xen expects AAPCS64 LP64 on ARMv8-A AArch64:
499  https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
500
501A summary table of data types, sizes and alignment is below:
502
503.. list-table::
504   :widths: 10 10 10 45
505   :header-rows: 1
506
507   * - Type
508     - Size
509     - Alignment
510     - Architectures
511
512   * - char
513     - 8 bits
514     - 8 bits
515     - x86_32, ARMv8-A AArch32, ARMv8-R AArch32, ARMv7-A, x86_64,
516       ARMv8-A AArch64, RV64, PPC64
517
518   * - short
519     - 16 bits
520     - 16 bits
521     - x86_32, ARMv8-A AArch32, ARMv8-R AArch32, ARMv7-A, x86_64,
522       ARMv8-A AArch64, RV64, PPC64
523
524   * - int
525     - 32 bits
526     - 32 bits
527     - x86_32, ARMv8-A AArch32, ARMv8-R AArch32, ARMv7-A, x86_64,
528       ARMv8-A AArch64, RV64, PPC64
529
530   * - long
531     - 32 bits
532     - 32 bits
533     - x86_32, ARMv8-A AArch32, ARMv8-R AArch32, ARMv7-A
534
535   * - long
536     - 64 bits
537     - 64 bits
538     - x86_64, ARMv8-A AArch64, RV64, PPC64
539
540   * - long long
541     - 64-bit
542     - 32-bit
543     - x86_32
544
545   * - long long
546     - 64-bit
547     - 64-bit
548     - x86_64, ARMv8-A AArch64, RV64, PPC64, ARMv8-A AArch32, ARMv8-R
549       AArch32, ARMv7-A
550
551   * - pointer
552     - 32-bit
553     - 32-bit
554     - x86_32, ARMv8-A AArch32, ARMv8-R AArch32, ARMv7-A
555
556   * - pointer
557     - 64-bit
558     - 64-bit
559     - x86_64, ARMv8-A AArch64, RV64, PPC64
560
561
562END OF DOCUMENT.
563