1#	@(#)TOUR	8.1 (Berkeley) 5/31/93
2
3NOTE -- This is the original TOUR paper distributed with ash and
4does not represent the current state of the shell.  It is provided anyway
5since it provides helpful information for how the shell is structured,
6but be warned that things have changed -- the current shell is
7still under development.
8
9================================================================
10
11                       A Tour through Ash
12
13               Copyright 1989 by Kenneth Almquist.
14
15
16DIRECTORIES:  The subdirectory bltin contains commands which can
17be compiled stand-alone.  The rest of the source is in the main
18ash directory.
19
20SOURCE CODE GENERATORS:  Files whose names begin with "mk" are
21programs that generate source code.  A complete list of these
22programs is:
23
24        program         intput files        generates
25        -------         ------------        ---------
26        mkbuiltins      builtins            builtins.h builtins.c
27        mkinit          *.c                 init.c
28        mknodes         nodetypes           nodes.h nodes.c
29        mksignames          -               signames.h signames.c
30        mksyntax            -               syntax.h syntax.c
31        mktokens            -               token.h
32        bltin/mkexpr    unary_op binary_op  operators.h operators.c
33
34There are undoubtedly too many of these.  Mkinit searches all the
35C source files for entries looking like:
36
37        INIT {
38              x = 1;    /* executed during initialization */
39        }
40
41        RESET {
42              x = 2;    /* executed when the shell does a longjmp
43                           back to the main command loop */
44        }
45
46It pulls this code out into routines which are when particular
47events occur.  The intent is to improve modularity by isolating
48the information about which modules need to be explicitly
49initialized/reset within the modules themselves.
50
51Mkinit recognizes several constructs for placing declarations in
52the init.c file.
53        INCLUDE "file.h"
54includes a file.  The storage class MKINIT makes a declaration
55available in the init.c file, for example:
56        MKINIT int funcnest;    /* depth of function calls */
57MKINIT alone on a line introduces a structure or union declara-
58tion:
59        MKINIT
60        struct redirtab {
61              short renamed[10];
62        };
63Preprocessor #define statements are copied to init.c without any
64special action to request this.
65
66INDENTATION:  The ash source is indented in multiples of six
67spaces.  The only study that I have heard of on the subject con-
68cluded that the optimal amount to indent is in the range of four
69to six spaces.  I use six spaces since it is not too big a jump
70from the widely used eight spaces.  If you really hate six space
71indentation, use the adjind (source included) program to change
72it to something else.
73
74EXCEPTIONS:  Code for dealing with exceptions appears in
75exceptions.c.  The C language doesn't include exception handling,
76so I implement it using setjmp and longjmp.  The global variable
77exception contains the type of exception.  EXERROR is raised by
78calling error.  EXINT is an interrupt.
79
80INTERRUPTS:  In an interactive shell, an interrupt will cause an
81EXINT exception to return to the main command loop.  (Exception:
82EXINT is not raised if the user traps interrupts using the trap
83command.)  The INTOFF and INTON macros (defined in exception.h)
84provide uninterruptable critical sections.  Between the execution
85of INTOFF and the execution of INTON, interrupt signals will be
86held for later delivery.  INTOFF and INTON can be nested.
87
88MEMALLOC.C:  Memalloc.c defines versions of malloc and realloc
89which call error when there is no memory left.  It also defines a
90stack oriented memory allocation scheme.  Allocating off a stack
91is probably more efficient than allocation using malloc, but the
92big advantage is that when an exception occurs all we have to do
93to free up the memory in use at the time of the exception is to
94restore the stack pointer.  The stack is implemented using a
95linked list of blocks.
96
97STPUTC:  If the stack were contiguous, it would be easy to store
98strings on the stack without knowing in advance how long the
99string was going to be:
100        p = stackptr;
101        *p++ = c;       /* repeated as many times as needed */
102        stackptr = p;
103The folloing three macros (defined in memalloc.h) perform these
104operations, but grow the stack if you run off the end:
105        STARTSTACKSTR(p);
106        STPUTC(c, p);   /* repeated as many times as needed */
107        grabstackstr(p);
108
109We now start a top-down look at the code:
110
111MAIN.C:  The main routine performs some initialization, executes
112the user's profile if necessary, and calls cmdloop.  Cmdloop is
113repeatedly parses and executes commands.
114
115OPTIONS.C:  This file contains the option processing code.  It is
116called from main to parse the shell arguments when the shell is
117invoked, and it also contains the set builtin.  The -i and -j op-
118tions (the latter turns on job control) require changes in signal
119handling.  The routines setjobctl (in jobs.c) and setinteractive
120(in trap.c) are called to handle changes to these options.
121
122PARSING:  The parser code is all in parser.c.  A recursive des-
123cent parser is used.  Syntax tables (generated by mksyntax) are
124used to classify characters during lexical analysis.  There are
125three tables:  one for normal use, one for use when inside single
126quotes, and one for use when inside double quotes.  The tables
127are machine dependent because they are indexed by character vari-
128ables and the range of a char varies from machine to machine.
129
130PARSE OUTPUT:  The output of the parser consists of a tree of
131nodes.  The various types of nodes are defined in the file node-
132types.
133
134Nodes of type NARG are used to represent both words and the con-
135tents of here documents.  An early version of ash kept the con-
136tents of here documents in temporary files, but keeping here do-
137cuments in memory typically results in significantly better per-
138formance.  It would have been nice to make it an option to use
139temporary files for here documents, for the benefit of small
140machines, but the code to keep track of when to delete the tem-
141porary files was complex and I never fixed all the bugs in it.
142(AT&T has been maintaining the Bourne shell for more than ten
143years, and to the best of my knowledge they still haven't gotten
144it to handle temporary files correctly in obscure cases.)
145
146The text field of a NARG structure points to the text of the
147word.  The text consists of ordinary characters and a number of
148special codes defined in parser.h.  The special codes are:
149
150        CTLVAR              Variable substitution
151        CTLENDVAR           End of variable substitution
152        CTLBACKQ            Command substitution
153        CTLESC              Escape next character
154
155A variable substitution contains the following elements:
156
157        CTLVAR type name '=' [ alternative-text CTLENDVAR ]
158
159The type field is a single character specifying the type of sub-
160stitution.  The possible types are:
161
162        VSNORMAL            $var
163        VSMINUS             ${var-text}
164        VSMINUS|VSNUL       ${var:-text}
165        VSPLUS              ${var+text}
166        VSPLUS|VSNUL        ${var:+text}
167        VSQUESTION          ${var?text}
168        VSQUESTION|VSNUL    ${var:?text}
169        VSASSIGN            ${var=text}
170        VSASSIGN|VSNUL      ${var=text}
171
172The name of the variable comes next, terminated by an equals
173sign.  If the type is not VSNORMAL, then the text field in the
174substitution follows, terminated by a CTLENDVAR byte.
175
176Commands in back quotes are parsed and stored in a linked list.
177The locations of these commands in the string are indicated by
178the CTLBACKQ character.
179
180The character CTLESC escapes the next character, so that in case
181any of the CTL characters mentioned above appear in the input,
182they can be passed through transparently.  CTLESC is also used to
183escape '*', '?', '[', and '!' characters which were quoted by the
184user and thus should not be used for file name generation.
185
186CTLESC characters have proved to be particularly tricky to get
187right.  In the case of here documents which are not subject to
188variable and command substitution, the parser doesn't insert any
189CTLESC characters to begin with (so the contents of the text
190field can be written without any processing).  Other here docu-
191ments, and words which are not subject to splitting and file name
192generation, have the CTLESC characters removed during the vari-
193able and command substitution phase.  Words which are subject
194splitting and file name generation have the CTLESC characters re-
195moved as part of the file name phase.
196
197EXECUTION:  Command execution is handled by the following files:
198        eval.c     The top level routines.
199        redir.c    Code to handle redirection of input and output.
200        jobs.c     Code to handle forking, waiting, and job control.
201        exec.c     Code to to path searches and the actual exec sys call.
202        expand.c   Code to evaluate arguments.
203        var.c      Maintains the variable symbol table.  Called from expand.c.
204
205EVAL.C:  Evaltree recursively executes a parse tree.  The exit
206status is returned in the global variable exitstatus.  The alter-
207native entry evalbackcmd is called to evaluate commands in back
208quotes.  It saves the result in memory if the command is a buil-
209tin; otherwise it forks off a child to execute the command and
210connects the standard output of the child to a pipe.
211
212JOBS.C:  To create a process, you call makejob to return a job
213structure, and then call forkshell (passing the job structure as
214an argument) to create the process.  Waitforjob waits for a job
215to complete.  These routines take care of process groups if job
216control is defined.
217
218REDIR.C:  Ash allows file descriptors to be redirected and then
219restored without forking off a child process.  This is accom-
220plished by duplicating the original file descriptors.  The redir-
221tab structure records where the file descriptors have be dupli-
222cated to.
223
224EXEC.C:  The routine find_command locates a command, and enters
225the command in the hash table if it is not already there.  The
226third argument specifies whether it is to print an error message
227if the command is not found.  (When a pipeline is set up,
228find_command is called for all the commands in the pipeline be-
229fore any forking is done, so to get the commands into the hash
230table of the parent process.  But to make command hashing as
231transparent as possible, we silently ignore errors at that point
232and only print error messages if the command cannot be found
233later.)
234
235The routine shellexec is the interface to the exec system call.
236
237EXPAND.C:  Arguments are processed in three passes.  The first
238(performed by the routine argstr) performs variable and command
239substitution.  The second (ifsbreakup) performs word splitting
240and the third (expandmeta) performs file name generation.  If the
241"/u" directory is simulated, then when "/u/username" is replaced
242by the user's home directory, the flag "didudir" is set.  This
243tells the cd command that it should print out the directory name,
244just as it would if the "/u" directory were implemented using
245symbolic links.
246
247VAR.C:  Variables are stored in a hash table.  Probably we should
248switch to extensible hashing.  The variable name is stored in the
249same string as the value (using the format "name=value") so that
250no string copying is needed to create the environment of a com-
251mand.  Variables which the shell references internally are preal-
252located so that the shell can reference the values of these vari-
253ables without doing a lookup.
254
255When a program is run, the code in eval.c sticks any environment
256variables which precede the command (as in "PATH=xxx command") in
257the variable table as the simplest way to strip duplicates, and
258then calls "environment" to get the value of the environment.
259There are two consequences of this.  First, if an assignment to
260PATH precedes the command, the value of PATH before the assign-
261ment must be remembered and passed to shellexec.  Second, if the
262program turns out to be a shell procedure, the strings from the
263environment variables which preceded the command must be pulled
264out of the table and replaced with strings obtained from malloc,
265since the former will automatically be freed when the stack (see
266the entry on memalloc.c) is emptied.
267
268BUILTIN COMMANDS:  The procedures for handling these are scat-
269tered throughout the code, depending on which location appears
270most appropriate.  They can be recognized because their names al-
271ways end in "cmd".  The mapping from names to procedures is
272specified in the file builtins, which is processed by the mkbuil-
273tins command.
274
275A builtin command is invoked with argc and argv set up like a
276normal program.  A builtin command is allowed to overwrite its
277arguments.  Builtin routines can call nextopt to do option pars-
278ing.  This is kind of like getopt, but you don't pass argc and
279argv to it.  Builtin routines can also call error.  This routine
280normally terminates the shell (or returns to the main command
281loop if the shell is interactive), but when called from a builtin
282command it causes the builtin command to terminate with an exit
283status of 2.
284
285The directory bltins contains commands which can be compiled in-
286dependently but can also be built into the shell for efficiency
287reasons.  The makefile in this directory compiles these programs
288in the normal fashion (so that they can be run regardless of
289whether the invoker is ash), but also creates a library named
290bltinlib.a which can be linked with ash.  The header file bltin.h
291takes care of most of the differences between the ash and the
292stand-alone environment.  The user should call the main routine
293"main", and #define main to be the name of the routine to use
294when the program is linked into ash.  This #define should appear
295before bltin.h is included; bltin.h will #undef main if the pro-
296gram is to be compiled stand-alone.
297
298CD.C:  This file defines the cd and pwd builtins.  The pwd com-
299mand runs /bin/pwd the first time it is invoked (unless the user
300has already done a cd to an absolute pathname), but then
301remembers the current directory and updates it when the cd com-
302mand is run, so subsequent pwd commands run very fast.  The main
303complication in the cd command is in the docd command, which
304resolves symbolic links into actual names and informs the user
305where the user ended up if he crossed a symbolic link.
306
307SIGNALS:  Trap.c implements the trap command.  The routine set-
308signal figures out what action should be taken when a signal is
309received and invokes the signal system call to set the signal ac-
310tion appropriately.  When a signal that a user has set a trap for
311is caught, the routine "onsig" sets a flag.  The routine dotrap
312is called at appropriate points to actually handle the signal.
313When an interrupt is caught and no trap has been set for that
314signal, the routine "onint" in error.c is called.
315
316OUTPUT:  Ash uses it's own output routines.  There are three out-
317put structures allocated.  "Output" represents the standard out-
318put, "errout" the standard error, and "memout" contains output
319which is to be stored in memory.  This last is used when a buil-
320tin command appears in backquotes, to allow its output to be col-
321lected without doing any I/O through the UNIX operating system.
322The variables out1 and out2 normally point to output and errout,
323respectively, but they are set to point to memout when appropri-
324ate inside backquotes.
325
326INPUT:  The basic input routine is pgetc, which reads from the
327current input file.  There is a stack of input files; the current
328input file is the top file on this stack.  The code allows the
329input to come from a string rather than a file.  (This is for the
330-c option and the "." and eval builtin commands.)  The global
331variable plinno is saved and restored when files are pushed and
332popped from the stack.  The parser routines store the number of
333the current line in this variable.
334
335DEBUGGING:  If DEBUG is defined in shell.h, then the shell will
336write debugging information to the file $HOME/trace.  Most of
337this is done using the TRACE macro, which takes a set of printf
338arguments inside two sets of parenthesis.  Example:
339"TRACE(("n=%d0, n))".  The double parenthesis are necessary be-
340cause the preprocessor can't handle functions with a variable
341number of arguments.  Defining DEBUG also causes the shell to
342generate a core dump if it is sent a quit signal.  The tracing
343code is in show.c.
344