Project 1 FAQ§
cush Meta FAQ§
Is there an example implementation we can use to test with?§
We provide an example shell at ~cs3214/bin/cush-gback
that you may use.
What's a good way to get started on this project?§
A good way to get started is to listen to and read the provided lectures,
then read the provided code. For an additional perspective, read Chapter
8 in the textbook. A reasonable first milestone could be the execution of
a single program (along with command line arguments) in the foreground.
That is, get your shell to run commands such as ls
and
return to the prompt when they have finished.
Implement job control and background processes later.
Why is the use of -Werror mandatory?§
Because we've seen students too many times wasting their time and ours trying to debug obviously wrong code.
When should we add error checking for our system calls?§
When you code them. An all-too-frequent mistake is for students to waste hours or
even days debugging their code's logic when they could have found the error immediately
or in short time simply by checking which system calls fail, and why.
Use the provided utils_error()
function, which relies on
the errno variable. perror(3) provides similar functionality, but unlike
utils_error()
does not support printf-style format strings.
Where can I learn more about the rationale for POSIX job control?§
We highly recommend that you read pages 18-21 of the POSIX 1003.1 Rationale. The entire document is available here (SLO Login required).
Are we required to use the provided functions in signal_support.c
and termstate_management.c
?§
These functions provide convenient wrappers for the signal masking and handling functions you need,
as well as functions to save and restore the state of the terminal. They use the POSIX API as discussed
in the book in Section 8.5.5. There is no need to use old-style, pre-POSIX signal handlers
(e.g., using signal()
or the book's Signal()
function, as used in the
examples in 8.5.1-8.5.4.)
Where can I learn more about regular expressions?§
Stackoverflow has a good page that introduces basic concepts and provides pointers to books and other resources. The specific syntax of Python 2 regular expressions (which we use) is described here. We may upgrade the driver to Python 3 at some point.
What's the difference between blocking a process and blocking a signal?§
The verb "to block" is used in at least two different meanings in
operating systems. Used as a transitive verb, a process may "block"
when it encounters a situation in which it cannot make progress.
For instance, a process may wait for some event (such as a read(2)
operation to complete, or a child process to exit, or simply for some
time to pass when it calls sleep(2)
). In this case, we say the
process is blocked. Blocked processes do not use CPU time. They are
unblocked when what they're blocked on finally happens. At that point,
they become ready and the scheduler will eventually resume them as soon
as a CPU/core becomes available.
Another meaning of "to block" uses block as an intransitive verb, as in "blocking a signal." Blocking here means to delay the delivery of a pending signal until some later point in time. This is accomplished by using an appropriate system call (sigmask(2) or sigblock(2)). It is needed when there is a potential for race conditions between the signal handler and the main control flow of a program.
Are stopped processes BLOCKED in the sense of the simplified process state diagram?§
No, or not necessarily.
The ability to stop a process via either user- or OS-triggered signals and then resume it is a traditional Unix facility that is not captured in the simplified process state diagram. (It's a simplified diagram, after all.) That diagram refers to states that reflect a process's behavior, along with external events such as preemption by an interrupt timer or arrival of an I/O notification, scheduler choices, etc.
Stopping/resuming processes could be modeled by adding 2 additional states to this diagram: BLOCKED+STOPPED and READY+STOPPED. If a BLOCKED process is stopped, it moves into the BLOCKED+STOPPED state. If a RUNNING or READY process is stopped, it moves into the READY+STOPPED state. If a BLOCKED+STOPPED process's event arrives that would have moved it to READY, it is instead moved to the READY+STOPPED state instead. Once a process is continued, it is moved from READY+STOPPED to READY or from BLOCKED+STOPPED to BLOCKED. See lecture slides.
How do I make sure that my submission passes the tests when grading?§
In most cases, if you reliably pass the tests when you run them, they will also pass when we run them.
However, since this assignment is done in C, and C is a type-unsafe language that makes it
easy to write code that invokes undefined behavior, you should use proper development
and debugging techniques to rule those out to the extent possible. This includes, for one, using a high level
of compiler warnings (-Wall -Werror
, as enforced by the Makefile - though you could turn on additional
warnings (-Wextra
)). Second, we urge you to run your shell under valgrind and exercise its functionality
(use job control, etc.). This will flag, for instance, if your shell passes uninitialized data to system calls
which can make you fail tests under certain runtime conditions.
In fact, if you ask: why does my shell pass when I run it but fails under the autograder, proving that valgrind doesn't flag any undefined behavior is the first thing we will ask you to do.
I'm trying to compile your code on my machine at home, but it doesn't work.§
Building the project on any environment other than our lab machines is not officially supported. That said, the code should build on most recent Linux distributions (including Debian and Redhat-based ones, and even Ubuntu running in WSL-2.) Feel free to share any problems you encounter on the forum!
cush General Implementation FAQ§
Are we required to use posix_spawn
instead of fork
/exec
?§
No, you may use either. We strongly recommend posix_spawn
, however.
Note that our implementation of posix_spawn
is library-based, meaning it
uses fork()
and exec()
under the hood. It is not a system call itself.
In a system call trace (strace) you will see the underlying system calls.
Keep this fact in mind when reading the rest of the FAQ when it talks
about fork()
as opposed to posix_spawn()
.
What's the relationship between a job, a process group, and a pipeline?§
These terms are often used synonymously, but they describe different aspects.
A pipeline is a sequence of one or more user commands separated by the pipe (|
) symbol,
which will result in the execution of one or more processes. Each pipeline forms a job.
A new process group is created for each job. By convention, the first process of each
pipeline becomes the leader of the job's process group, the other processes join the
leader's process group.
So, put briefly, each pipeline is a job, and each job has its own process group.
Does our shell need to fork for 'built-in' commands?§
No. Built-in commands are executed by the shell without forking a new process.
Moreover, some built-in commands, notably cd
, but also any command related to
environment variables (such as bash's export
command), must be executed
in the shell's process or else their effect would vanish after the shell's child
process exited.
Which version of exec*
should we use?§
Read the manual (man 3p exec
) to learn about the different versions.
You may use any of them; a recommended version is execvp()
.
Like any shell, your shell should respect the PATH
variable and make
sure that users can run commands from any of the PATH
's directories without
having to specify the path name of the directory in which the command's executable
is located.
If you use posix_spawn
, you probably want to use posix_spawnp
for the
same effect.
Why can I not pass a 'struct list' as a parameter to a function?§
The provided list class uses head and tail sentinels that are stored
in each struct list instance. The last element in the list contains a
next
pointer to the tail sentinel, and the first element contains a
prev
pointer to the head sentinel. If you copied the struct list
instance, those pointers would not be updated, rendering the copied
struct list instance useless. Always pass pointers to a struct list
to functions, as in struct list *
.
void myfunction(struct list *l) { // ok
}
void dontdeclareafunctionlikethis(struct list l) { // will not work
}
To avoid other list-related pitfalls, particularly pitfalls related to deleting elements and iterating over lists, you should read list.c.
My shell crashes, but I don't see any messages such as 'Segmentation Fault'§
Keep in mind that such messages are printed only if a (working) shell
detects that one of its child processes terminated with SIGSEGV
.
Your shell will be the child process of your login shell (typically bash);
so if the segmentation fault occurs in your shell process, you'll likely
see the error message printed by bash. However, if your shell crashes
on the code path between when it forks a new process (about to exec()
a command), then it would be your shell's responsibility to print this
error message after waiting for the child. If it does not do that yet,
you won't see anything. valgrind is of great help here: it follows
both the parent and the child process and reports segmentation faults
in either.
How should we use the data structures allocated by the parser?§
The data structures (prefixed with ast_
in the provided code represent
the parsed structures of the command line. You should treat those as
read-only to make sure they are retained throughout the life of a job.
You must move/remove the call to ast_command_line_free
function that
is in the provided code to avoid these structure being freed prematurely.
ast_command_line_free(cline); // remove me
Instead, you can call ast_command_line_free
once you have stored a
reference to each ast_pipeline
object elsewhere, such as in the pipe
field of the job struct.
(Make sure to remove the references contained link elements of the
list embedded in ast_command_line
object, for instance by
removing these ast_pipeline
object from this list.)
These ast_pipeline
objects can be freed once
the last process in a pipeline has terminated and the information has
been reported to the user.
My shell works when I test it on the command line but fails with the Python/pexpect test harness.§
This may have several reasons. A common one is that the output of your
shell does not match what the test script expects, and/or that you
have changed the output_spec.py
file describing the output format
of your shell. Among others, the test harness expects that the name
of a job can be extracted via a regular expression from the job list.
In addition, it expects that the shell echoes the command line of a job
when it executes the 'fg' command. (Like bash does.)
Example:
$ sleep 20
^Z
[3]+ Stopped sleep 20
$ fg 3
sleep 20
That last line sleep 20
must be output by your shell.
Another common failure mode is to forget to ensure that the regular
expression given by the variable job_status_regex
in your
output_spec.py
must match the output your shell produces when a job is
stopped (or in general its status is printed). The default we provide
would assume your shell outputs job messages where the job is listed
inside parentheses, such as in:
[1] Stopped (sleep 20)
We get a segfault in a loop where we go through a list of objects - valgrind says 'Invalid read of size 4'§
An object can only be in one list at a time. If you're placing an object
object into a new list without first removing it from the old list,
you're corrupting the old list's next/prev pointers. This will lead to
undefined results if you're doing this while iterating over the old list.
Read list.c for techniques of how to avoid this. If you need to keep
an object in multiple lists simultaneously, you must provide separate
list_elem
fields.
We update our job list and the job status before calling execvp()
, but the update is ignored/does not work.§
(applies only if you use fork()
, not when using posix_spawn
)
A common mistake is to misunderstand the effect of fork(). When the shell calls fork(), a new child process is created. The child process will eventually call exec() to load the program the user desired while the parent process will continue to implement the shell functionality - wait for foreground jobs, implement job control etc. The child process obtains a copy of all data from the parent at the time fork() is called. "All data" here means all heap objects, all global, and all local variables. This copy will have the same values that it had before the fork(), but any modifications affect only the child's copy. When the child execs, all of its data is wiped and replaced with the data of the new program. So, code like this
/* WRONG CODE */
if (fork() == 0) {
/* in child */
cmd->pipeline->status = FOREGROUND; /* update child's status */
/* Above statement is wrong and pointless - it updates the
child's copy of the 'pipeline' object only.
All of the child's data, including everything on its heap, is about to
be discarded by the subsequent execvp call we are about to execute. */
execvp(cmd->argv[0], cmd->argv);
}
is completely wrong. Make sure that anything related the management of job lists, job status, etc. is done before the fork(), or after the fork() in the context of the parent (shell) process. Be aware that while you can use local/global variables and heap objects to pass information from the parent to the child when it is forked, you cannot use those variables to pass information back from the child to the parent, or from the parent to the child after the child has been forked.
How do I use waitpid()
?§
...Or: What do I pass to waitpid()
as the first parameter?
...Or: is it normal for waitpid()
to return -1?
The strategy for using waitpid()
correctly is a bit tricky.
The base code provides a useful skeleton which we recommend you keep.
waitpid()
must be called
on three code paths: when waiting for a process started as a foreground process to terminate,
when implementing the 'fg' command, and also from inside the SIGCHLD handler.
You should call waitpid() in all of these situations with -1 as the first argument and
using the WUNTRACED option; when calling it in the SIGCHLD handler, also specify WNOHANG
using the bitwise |
operator.
To avoid race conditions, keep SIGCHLD blocked when calling waitpid() while waiting for a foreground job. (Note that SIGCHLD is blocked inside the SIGCHLD signal handler already.) But do not keep SIGCHLD blocked all the time - when the shell sits at the command prompt, waiting for user input, SIGCHLD must be unblocked so that the shell can reap terminating children.
Since you are passing -1, it may happen that while waiting for a foreground job waitpid()
returns to tell you about a background job that terminated. In that case, you must record
this information in your job list (since the OS will tell you only once about it!), and
then call waitpid()
again.
Note also that even though the waitpid()
call issued by the shell when waiting for
a foreground job will reap any children that have exited or changed status, a SIGCHLD
signal, once pending, will still be delivered as soon as you unblock SIGCHLD.
In that case, the OS has nothing to report when you call waitpid()
.
That's why we use the WNOHANG flag in the SIGCHLD handler and that's why
the code ignores if waitpid() returns -1 there.
The provided base code implements this approach in the
wait_for_job
and sigchld_handler
functions whose use we recommend.
You will need to implement handle_child_status
to properly
update the status of the job to which the waited-for process belongs.
Do we need to implement the shell initialization process shown in the GNU libc manual, section 27.6.2?§
No. You don't need to implement the initialization process in this project. You may assume that your shell is started in the foreground (i.e., has ownership of its terminal) and is already in its own process group. This will be the case if you test your shell by starting it from bash (type "./cush"), and it will also be the case if the shell is started from the Python test harness. A more complete shell (suitable as a login shell) should do it, however.
Do we need to use the getopt()
function to parse a child's command line arguments?§
No - the child process will do that itself if required for its
functionality. Note that the provided parser code prepares each command
line already in a form that can be passed to execvp()
.
The only use for it in your shell that I could see is if you were implementing sophisticated built-in commands with many switches/options.
Do we need to change the provided grammar, for instance, to add built-in commands?§
No. Although theoretically possible, this is not a recommended approach. But if you're familiar with bison/flex and wish to add additional features you're welcome to extend it.
cush Signal FAQ§
I'm confused about which signals we need to catch.§
The only signal that you absolutely must catch is SIGCHLD. When SIGCHLD arrives, you need to reap any children whose status may have changed and update the shell's data structures accordingly.
Optionally, you may catch SIGINT (^C). Catching SIGINT could be used, for instance, to allow a user to abort a partially completed command line, like bash does. You would need to use sigsetjmp()/siglongjmp() to that end. Note that when the user types ^C, SIGINT will reach the shell only when it is in the foreground process group, which is the case only while no other foreground job is running - namely, while the shell waits for user input.
When a foreground job is running, it will have ownership of the terminal. This means that the keys ^Z and ^C cause the OS to send SIGTSTP and SIGINT directly to the processes in the foreground process group, and not to the shell. So, for SIGTSTP and SIGINT to stop/terminate your foreground process group, you don't have to do any signal handling!
A second signal you may optionally catch or ignore is SIGTSTP. Like ^C, ^Z will reach the shell only while waiting for user input. You could ignore SIGTSTP so that the shell isn't suspended itself when the user types ^Z instead of typing a command. This is what most shells do, as you can observe by typing ^Z on a bash prompt.
I googled sa_sigaction_t
but Google thinks it's a typo.§
sa_sigaction_t
is a convenience function pointer type that I declared in
signal_support.h
like so:
/* Signal handler prototype */
typedef void (*sa_sigaction_t)(int, siginfo_t *, void *);
Any void-returning function with a signature of (int, siginfo_t *, void *)
can be substituted where a variable of type
sa_sigaction_t
is expected. This means you should declare your
signal handling routine to have this signature, then you can pass it to
signal_set_handler
which provides a convenient wrapper for the
underlying sigaction(2)
system call that installs this signal handler.
I recommend using this function instead of the outdated signal(2)
function, because it allows more fine-grained control over the signal
processing semantics. For instance, you can specify that system calls that
are interrupted by a signal be automatically restarted; you can specify
that additional information be sent along to the signal handler routine
(such as in the case of SIGSEGV, where a program faulted), and you can
specify exactly which additional signals, if any, should be blocked
during the execution of the signal handler.
In my opinion, there's rarely a reason to use signal(2)
anymore; however
others have pointed out that signal(2) is part of ISO C
and may be preferred for portability in general. However, as shown in this table
SIGCHLD is part of POSIX only and not part of ISO C, given its Unix-specific semantics.
This differs from such signals as SIGINT or SIGSEGV, which can be meaningfully
implemented in any C language environment.
Do we need to perform any signal-related setup for the programs our shell executes?§
Recall that a process can dictate, for each signal with the exception of SIGKILL, what it wants to do if that signal is raised (handle, ignore, or take the default action which may be termination). In addition, for signals that it handles, it can temporarily delay their delivery by blocking them.
If a process sets up any special handling for signals and/or blocks signals, any children of the process will inherit such actions and they will also inherit the parent's signal mask (a bitmask describing which signals are blocked, if any). So if your parent shell blocks SIGCHLD when calling fork(), SIGCHLD will be blocked in the child. (Note, however, that the child is its own separate process for the purpose of signal delivery - the reasons for why it may receive SIGCHLD are completely separate from why the parent process may receive SIGCHLD, or any of the other signals. On fork(), pending signals that have not yet been delivered to the parent do not become pending in the child so that the child doesn't accidentally receive signals not intended for it.)
If a process executes a new program, the signal handling is also largely unaffected, with one notable exception: any signal that has a handler will be reset to the default action - this makes sense since the signal handler itself belongs to the old program being replaced by the exec() call. However, exec() does not affect whether signals are ignored or blocked - a signal that is ignored or blocked will remain ignored or blocked.
Your shell must make sure that new programs are started with reasonable
signal settings for each signal. That means that all signals have
their default actions, and no signal is blocked. Consequently, if your
shell blocked any signal (such as SIGCHLD) in the parent, the signal
should be unblocked in the child before the exec() call. Similarly,
if you decide to ignore any signals (such as SIGINT, perhaps),
reinstall the default behavior before calling exec(). You may call
signal_set_handler(SIGINT, (sa_sigaction_t) SIG_DFL);
to that
end. Otherwise, the child process would ignore SIGINT, and you couldn't
terminate it with ^C. You do not need to reset signals you handle,
such as SIGCHLD, to their default action - exec() will do that for you.
siginfo->si_pid
contains the pid of the child process that caused SIGCHLD. Why can I not use it?§
When POSIX signal handling is used, the OS will pass along additional information
to the signal handler in arguments #2 and #3. Argument #2 is of type siginfo_t
.
See sigaction(2).
However, since traditional Unix signals are not queued, si_pid
for a SIGCHLD will contain
the pid of the first child that caused SIGCHLD to become pending. Therefore, it cannot
be used to learn which children need to be reaped. You must use waitpid(-1, ...)
.
Can we use signalfd(2)
instead of using signal handlers?§
(No one has really asked that question yet, and I don't know if anyone has considered that, either.)
I think so, although you'll be trading one set of complexities for another and you'll need to learn some additional system calls.
Signal handlers have the drawback that they introduce asynchronous control flow in your program:
while SIGCHLD is not blocked/masked, it may arrive at any point in time and cause your SIGCHLD
handler to be executed. If you use signalfd()
, you can instead learn when a signal
has become pending, then execute a read()
system call to consume it without ever invoking the
signal handler, thus avoiding asynchronous control flow.
Learning when the signal has become pending requires learning when that file descriptor has become readable (e.g., it is possible to call read() without causing the process to be blocked inside the read() system call waiting for a signal). You need to use select(2) or a similar call for that. The problem you are facing then is that you need to also read user input after the shell has output its prompt.
Fortunately, the readline() function cush uses has an alternate API designed for this purpose
in the rl_callback_*
family of functions. Read this example
on stackoverflow to get the general idea. Thus, your main function must set up a signal file descriptor, then execute a
select() loop in which it selects both on stdin and the signal fd. If stdin becomes readable, it'll call
rl_callback_read_char()
, if the signal fd becomes readable, it needs to reap all children
that can be reaped at this point in time (again using WNOHANG in a loop). (As before, SIGCHLD is
not queued.)
The shell will perform command line processing if rl_callback_read_char()
calls the handler installed with rl_callback_handler_install()
.
cush Pipe FAQ§
Is there a way to learn if all processes that are part of a pipeline have terminated?§
No. 😞 You will receive separate notifications for each process that's part of a pipeline/job. They may exit in any order. Your shell must manage this: for instance, if a pipeline/job that consists of multiple processes is running in the foreground, the shell should not prompt for new input until after all processes that are part of the job have been reaped (and your shell received confirmation from the OS that they've terminated.)
Note that one call to waitpid()
yields only one notification, even if you've
specified waitpid(-1,..) to accept notifications for any of your children. This is so
that the status of each child can be reported separately. You will thus
need to call waitpid() repeatedly.
In addition, you will need to maintain some kind of data structure to map the pid of the child process waitpid() returns to the job to which they belong. The provided base code does not do that for, it is for you to implement. For the purposes of this project, an O(n) lookup time data structure is acceptable.
Why do my pipes not work?§
... strace
shows the pipe() and dup2() calls
I'd expect. It seems the child process never exits, so my shell is stuck in waitpid()
?
Check that the shell closes its copy of the pipe file descriptor(s) after the fork().
If it doesn't, since there's still an open file descriptor referring to the write end,
the pipe itself will not be closed, and programs
that read until they exhaust their standard input (such as cat
, grep
and
so on) will be stuck attempting to read from their stdin. Use ls -l /proc/NNNNN/fd
where NNNNN is the pid of your shell to examine the state of the shell's file descriptors.
My pipes don't work even though I used pipe()
(or pipe2()
) and dup2()
correctly.§
strace -ff
shows that writing to file descriptor 1/reading from file descriptor 0
by the child processes results in -1 EBADF.
You probably confused the read and write ends of the pipe. I recommend creating symbolic constants for them.
When I add the second process of a pipeline to the pipeline's process group, setpgid() fails with EPERM 'Operation not permitted'§
This occurs when the process group to which you are trying to add the process no longer exists, for instance, because all the processes in it have already terminated. This can occur if you (erroneously) wait for the first child in a pipe to finish before forking the second child.
Waiting for the first child also breaks a pipe's built-in flow control mechanism. If the first child outputs more than the pipe's bounded buffer capacity, it will block forever since you never start the second child that would have otherwise drained the pipe and consumed the data produced by the first child. In current Linux kernels, a pipe's capacity is only 64K. In addition, you're giving up parallelism.
The Linux man page for setpgid() is incomplete in that it does not list the case where pgrp no longer exists. Even though EPERM seems unusual here - ESRCH would perhaps have been more clear to the user, POSIX 1003.1/2013 actually calls for EPERM here:
[EPERM]
The value of the pgid argument is valid but does not match the process ID
of the process indicated by the pid argument and there is no process with
a process group ID that matches the value of the pgid argument in the same
session as the calling process.
cush Job Control/Terminal FAQ§
How does our shell continue a stopped job?§
By sending the SIGCONT signal to the job's process group. If the job is also
made the foreground job (i.e., the user typed fg
instead of bg
to continue
the stopped job, SIGCONT must be sent after ownership of the terminal has been
transferred.
When I start a process in the foreground, should I call termstate_give_terminal_to()
in the parent (before waiting for the process) or the child (before exec'ing the command)?§
_ this question applies when using fork()
only. If you use posix_spawn
, use POSIX_SPAWN_TCSETPGROUP
instead _
Think about what could happen in each case. You need to avoid a situation where
the launched program performs some action that requires ownership of the terminal, but
termstate_give_terminal_to()
, which calls tcsetpgrp()
, hasn't been called.
When you call tcsetpgrp()
, the process group you specify must have already been created.
pages 18-21 of the POSIX rationale document.
When a foreground job is stopped and later resumed, do we need to make sure that the state of the terminal is preserved?§
Yes. Some processes, such as vim, place the terminal in a special, so-called 'raw' state. In the
'raw' state, the program receives characters as they are typed. By contrast, in the 'cooked'
state, programs normally read lines of input from the terminal when they call read(2)
. If a
foreground job is suspended and later resumed, you need to save and restore whatever state it
placed the terminal in. Use the provided functions in termstate_management.c
for this purpose,
which wrap the tcsetattr()
/tcgetattr()
functions. Note that you do not need to understand
what exact state the job put the terminal in - it's ok to treat it opaquely.
On a related note, if a process that places the terminal in the raw state is stopped, the shell
should restore the original state of the terminal before issuing the next prompt. To that end,
the functions in termstate_management
save the initial state of the terminal when the shell is
first started and restore this state when termstate_give_terminal_back_to_shell()
is called,
which you must call whenever the shell outputs its prompt and waits for user input.
The rationale for this approach is that a job control shell must allow the execution of programs that are themselves oblivious to the fact that they are run under a job control shell, i.e., programs that assume they have exclusive control of the terminal.
Side note: this is actually only a recommendation in POSIX. We require that you
implement it, but not all POSIX shells do. bash
and zsh
, for instance, do not.
ksh
does.
Why is our shell suspended when I hit ^Z or ^C to stop/terminate a foreground job?§
When a foreground process exits, the shell needs to reacquire the terminal (by making its own process group the foreground process group of its controlling terminal.) This is done via tcsetpgrp(). Since the process issuing the tcsetpgrp() is not in foreground group, the tcsetpgrp() call will result in the delivery of SIGTTOU. (The same rules that apply to all processes apply to your shell, too.) In this case, this creates a Catch-22: the shell itself needs to be the foreground process group to call readline() to receive the next input from the user, but the call needed to make the shell's process group the terminal's foreground process group requires that the calling process already is part of the foreground process group. POSIX resolves this dilemma by stipulating that the shell should block SIGTTOU during the call to tcsetpgrp(), and that if it does, SIGTTOU will not be delivered.
You can achieve this via the provided termstate_give_terminal_to()
and termstate_give_terminal_back_to_shell()
functions in termstate_management.c
.
The latter function will return ownership to the shell's process group
which is recorded during initialization.
Note that the shell's own process group is different from the process
groups the shell creates for the jobs it spawns.
If you observe that your shell is suspended with SIGTTOU
when calling
tcsetattr
from readline
(which is called by your main REPL loop), then
you likely forgot to call termstate_give_terminal_back_to_shell()
.
Why is our shell suspended when I try to reassert ownership of the terminal via tcsetpgrp()
?§
... Or when I restore its terminal state with tcsetattr()?
See previous question. This problem should not occur if you use the provided functions that block SIGTTOU.
My shell gets stuck on the command line after I type one letter!§
For example, when I start a process in the background like so:
$ ./cush
cush> sleep 30 &
[1] 1818
cush> j # <-- Here is where the shell is stuck. I can't type anything anymore.
In addition, when I log onto the same machine on another terminal, I see my cush process consuming 100% of a CPU (using top).
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1805 cs3214 20 0 107m 1212 948 R 71.8 0.1 26:01.35 cush
When I use strace to see what it's doing, I seeing constantly repeating output like this:
$ strace -p 1805
.... lots of output like this ....
ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = ? ERESTARTSYS (To be restarted)
--- SIGTTOU (Stopped (tty output)) @ 0 (0) ---
When I attach gdb, I see that it's calling tcsetattr()
from readline()
, which is called
in my main() function:
(gdb) attach 1805
(gdb) bt
#0 0x0000003b0f0d8f78 in tcsetattr () from /lib64/libc.so.6
#1 0x0000003b1041979f in ?? () from /lib64/libreadline.so.6
#2 0x0000003b10419b60 in rl_deprep_terminal () from /lib64/libreadline.so.6
#3 0x0000003b10426ada in rl_cleanup_after_signal () from /lib64/libreadline.so.6
#4 0x0000003b10426f89 in ?? () from /lib64/libreadline.so.6
#5 0x0000003b1042919e in rl_getc () from /lib64/libreadline.so.6
#6 0x0000003b104296c0 in rl_read_key () from /lib64/libreadline.so.6
#7 0x0000003b1041550f in readline_internal_char () from /lib64/libreadline.so.6
#8 0x0000003b10415a55 in readline () from /lib64/libreadline.so.6
#9 0x0000000000406552 in main (ac=1, av=0x7fff7ef59348) at cush.c:495
Answer: You do not have ownership of the terminal when you call readline()
.
Readline() assumes it has ownership so it can safely call, for instance, tcsetattr()
- which
is a function that requires such ownership. If that's not the case, readline() will
act in undefined ways (apparently blindly retrying over and over after SIGTTOU is sent.)
In the example shown above, the most likely reason is that you wrongly gave ownership
of the terminal to the sleep 30 &
background job the user started. If the user starts
a job in the background, that job must not be given ownership of the terminal.
In order to not have to separately handle the case where a foreground job (which must
be given ownership of the terminal!) terminates or exits, it is useful to simply
also reacquire ownership before calling readline() (or even better, before printing
anything, including the prompt). Use termstate_give_terminal_back_to_shell()
to that end.
Note that you still must avoid giving terminal ownership to the background job to avoid the race condition where the background job (in the child process, before the exec()), snaps away ownership from the shell that's already returned to the prompt.
Why does output_spec.py
expect a carriage return line feed CRLF (\r\n
) instead of just a LF (\n
)?§
... I thought the Unix line termination character was LF (\n
)?
Or: ... How come ^Z shows up if I turn on logging in output_spec.py
- I thought ^Z results in
SIGTSTP being sent?
The regular expressions in output_spec.py
refer to the information that is output
to the so-called master side of the pseudo terminal, and the logfile logs everything that is
input into the master side of the pseudo terminal. Your shell, on the other hand, is
connected to the pseudo terminal's so-called slave end.
The pty driver performs the same operations a regular
terminal driver would. It can also be customized in the same way.
For instance, if the flag ONLCR
is set, it will transform a NL into a
CRLF (See termios(3); you can run stty -a
to see which flags are enabled
by default for the pseudo terminal used by ssh.). Similarly, if ^Z is sent to the
master end of the terminal, SIGTSTP will be sent to the terminal's foreground process job.
pty(7) summarizes the functionality provided by pseudo terminals:
A pseudo-terminal is a pair of virtual character devices that provide a bidirectional
communication channel. One end of the channel is called the master; the other end is
called the slave. The slave end of the pseudo-terminal provides an interface that
behaves exactly like a classical terminal. A process that expects to be connected to a
terminal, can open the slave end of a pseudo-terminal and then be driven by a program
that has opened the master end. Anything that is written on the master end is provided
to the process on the slave end as though it was input typed on a terminal. For exam-
ple, writing the interrupt character (usually control-C) to the master device would
cause an interrupt signal (SIGINT) to be generated for the foreground process group
that is connected to the slave. Conversely, anything that is written to the slave end
of the pseudo-terminal can be read by the process that is connected to the master end.
Pseudo-terminals are used by applications such as network login services (ssh(1),
rlogin(1), telnet(1)), terminal emulators, script(1), screen(1), and expect(1).
(Comment:) The potentially offensive historical connotations of these terms
aside, I find them confusing and not even very helpful in remembering a pty's function.
In particular, it's misleading to think of the program on the
so-called "slave" end to be somehow driven by the "master" end -
any control would only go as far as the program reacts to any
input it chooses to read from its controlling terminal.
(In a typical shell session, standard input will be connected
to the controlling terminal.)
Perhaps it's better to think of an "operator" side and a "terminal" side.
Writing to the "operator" side is like typing into the terminal.
Reading from the "operator"" side obtains what the terminal would
otherwise output for the operator to see if it were an actual device.
The pty performs certain substitutions when copying data from
the operator to the terminal side and vice versa, controlled by
how it's currently programmed (stty -a
).
The "terminal" side is like a regular terminal so that programs
do not need to be coded differently when using a pty.
Reading from the "terminal" side obtains user input; writing to
the "terminal" side outputs to the user.
In 2020, the Austin group (which maintains POSIX) and the Linux man page project independently chose to change the nomenclature in future versions. I prefer Linux's approach, which now refers to pseudo terminals as pseudo terminal device pairs, and the "slave end" is now simply called a terminal. We'll update this entry when the new version of the man page makes it to the rlogin cluster.
expect(1) is a program that performs similar functionality as the pexpect library Patrick Boyd (a former CS 3214 UTA) extended for our testing framework.
I'm trying use setpgrp() or getpgid(), but it doesn't compile.§
Read the fine print in the man page for these functions:
To get the prototypes under glibc, define both _XOPEN_SOURCE and
_XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer
n larger than or equal to 500.
Hence, use:
#define _XOPEN_SOURCE 500
#include <unistd.h>
The reasons for this are historical. If you look at the
Unix family tree
you'll see two major branches: systems deriving from
System V on the right, and systems in the BSD line in the center. Linux,
on the left, now attempts to provide support for both lines; sometimes,
there are functions with the same name, but different signatures
(for instance, setpgrp()
). Then you have to tell it which one you want by
defining certain pre-processor constants. In this assignment, you should
not need to use any such functions. Use setpgid()
instead of setpgrp()
,
and the only place where you would need to learn a process's process group
is to identify the shell's own process group, as is done in termstate_management.c
.
Use getpgrp()
for this purpose, as suggested in setpgid(2)
.
Hint: there is no need to ever call getpgid()
on a process pid
other than your own (that is, the shell's).
Why can I not use getpgid()
to identify which process group an exited process belonged to?§
Because once you reap the last child process of a process group, the OS will erase all
records of that process group - thus, getpgid()
when invoked with the pid of a reaped
child will return an error (ESRCH
).
Thus, your shell will need to find a different way, namely its own data structures, to map pids to jobs.