Project 3 esh FAQ

Project 3 FAQ

This FAQ answers questions students have had in past semesters related to project 3.

Does our shell need to fork for 'built-in' commands?
No. Built-in commands are executed by the shell without forking a new process. Moreover, some built-in commands, notably cd, but also any command related to environment variables (such as bash's export command), must be executed in the shell's process or else their effect would vanish after the shell's child process exited.
Do we need to implement the shell initialization process shown in the GNU libc manual, section 27.6.2?
No. You may assume that your shell is started in the foreground (i.e., has ownership of its terminal) and is already in its own process group. This will be the case if you test your shell by starting it from bash (type "./esh"), and it will also be the case if the shell is started from the Python test harness.
Which version of exec* should we use?
Read the manual (man 3p exec) to learn about the different versions. You may use any of them; a recommended function is execvp().
Why is our shell suspended when I hit ^Z or ^C to stop/terminate a foreground job?
Or: Why is our shell suspended when I try to reassert ownership of the terminal via tcsetpgrp()?
When a foreground process exits, the shell needs to reacquire the terminal (by making its own process group the process group the foreground process group of the controlling terminal.) This is done via tcsetpgrp(). Since the process issuing the tcsetpgrp() is not in foreground group, the tcsetpgrp() will result in the delivery of SIGTTOU. Simply block SIGTTOU around the call to tcsetpgrp() via the provided esh_signal_block()/esh_signal_unblock() functions.
Where can I learn more about the rationale for POSIX job control?
We highly recommend you read pages 18-21 of the POSIX 1003.1 Rationale. The entire document is available here. (SLO Login required).
Are we required to use the functions in esh-sys-utils.c?
These functions provide convenient wrappers for the signal masking and handling functions you need, as well as functions to save/restore the state of the terminal. They use the POSIX API as discussed in the book in Section 8.5.5. There is no need to use old-style, pre-POSIX signal handlers (e.g., using signal() or the book's Signal() function, as used in the examples in 8.5.1-8.5.4.) The functions in esh-sys-utils.c are used in my sample solution.
How does our shell continue a stopped job?
By sending SIGCONT to the job's process group. If the job is also made the foreground job, SIGCONT must be sent after ownership of the terminal has been transferred.
Why do my pipes not work? strace shows the pipe() and dup2() calls I'd expect. It seems the child process never exits, so my shell is stuck in waitpid()
Check that the shell closes its copy of the pipe file descriptor(s) after the fork(). If it doesn't its copy of the write end, the pipe itself will not be closed, and programs that read until they exhaust their standard input (such as cat, grep and so on will be stuck attempting to read from their stdin. Use ls -l /proc/NNNNN/fd where NNNNN is the pid of your shell to examine the state of the shell's file descriptors.
Why can I not pass a struct list as a parameter to a function?
The provided list class uses head and tail sentinels that are stored in each struct list instance. The last element in the list contains a 'next' pointer to the tail sentinel, and the first element contains a 'prev' pointer to the head sentinel. If you copied the struct list instance, those pointers would not be updated, rendering the copied struct list instance useless. Always pass pointers to a struct list to functions, as in struct list *.
```
void myfunction(struct list *l) { // ok
}

void dontdeclareafunctionlikethis(struct list l) { // will not work
}
```
To avoid other list-related pitfalls, particularly pitfalls related to deleting elements and iterating over lists, you must read list.c.
Do we need to use the getopt() function?
The only use for it that I could see is if you're implementing sophisticated built-in commands with many switches/options. Otherwise, probably not. Note that the provided parser code prepares each command line already in a form that can be passed to execvp().
Do we need to change the provided grammar, for instance, to add built-in commands?
No. Although theoretically possible, this is not a recommended approach. But if you're familiar with yacc/lex and wish to add additional features (say redirecting stdout/stderr separately), you're welcome to extend it.
When a foreground job is stopped and later resumed, do I need to make sure that the state of the terminal is preserved?
Yes. Some processes place the terminal in a special, so-called 'raw' state. For instance, vi does that. In 'raw' state, the program receives characters as they are typed. (By contrast, in the 'cooked' state, programs normally read lines of input.) If a foreground job is suspended and later resumed, you need to save and restore whatever state it placed the terminal in. Use the provided functions in esh-sys-utils.c for this purpose, which wrap the tcsetattr()/tcgetattr() functions. Note that you do not need to understand what exact state the job put the terminal in. On a related note, if a process that places the terminal in the raw state is stopped, the shell must restore the original state of the terminal before issuing the next prompt. To that end, save the initial state of the terminal when the shell is first started and restore this state whenever the shell outputs its prompt and waits for user input.

The rationale for this approach is that a job control shell must allow the execution of programs that are themselves job control unaware, i.e., programs that assume they have exclusive control of the terminal. See here.
I'm confused about which signals we need to catch.
The only signal that you must catch is SIGCHLD. When SIGCHLD arrives, you need to reap any children whose status may have changed and update the shell's data structures accordingly.

Optionally, you may catch SIGINT (^C). Catching SIGINT could be used, for instance, to allow a user to abort a partially completed command line, like bash does. You would need to use sigsetjmp()/siglongjmp() to that end, as shown on a lecture slide. Note that when the user types ^C, SIGINT will reach the shell only when it is in the foreground process group, which is the case only while no other foreground job is running - namely, while the shell waits for user input.

When a foreground job is running, it will have ownership of the terminal. This means that the keys ^Z and ^C cause the OS to send SIGTSTP and SIGINT directly to the processes in the foreground process group, and *not* to the shell. So, for SIGTSTP and SIGINT to stop/terminate your foreground process, you don't have to do any signal handling!

A second signal you may optionally catch or ignore is SIGTSTP. Like ^C, ^Z will reach the shell only while waiting for user input. You could ignore SIGTSTP so that the shell isn't suspended itself when the user types ^Z instead of typing a command. This is what most shells do, as you can observe by typing ^Z on a bash prompt.

This paragraph to be removed for future semesters once I had a chance to update the assignment. The current version of the assignment allows for what is mistakenly called a "reduced version" of the assignment that is based on a draft of the textbook authors. I no longer believe that this version really 'reduces' anything, and meant to remove this option. Should you choose this option to meet minimum requirements, which I do not recommend, you would not use tcsetpgrp to direct SIGTSTP and SIGINT to the foreground job's process group, but instead your shell would catch these signals and relay them to the foreground job via killpg().
Why is the use of -Werror mandatory?
Because we've seen students too many times wasting their time and ours trying to debug obviously wrong code.
I'm trying to compile your code on my Ubuntu machine at home, but the files generated by flex contain warnings that break the compile process.
Building the project on any environment other than our lab machines is not officially supported. That said, some distributions of Linux (notably, Ubuntu systems) ship with a version of flex that contains a bug and emits code that generates compiler warnings. You can try applying this patch for a work-around.
What's the relationship between a job, a process group, and a pipeline?
These terms are often used synonymously, but they describe different aspects. A pipeline is a sequence of one or more user commands separated by the pipe (|) symbol, which will result in the execution of one or more processes. Each pipeline forms a job. A new process group is created for each job. By convention, the first process of each pipeline becomes the leader of the job's process group, the other processes join the leader's process group.

So, put briefly, each pipeline is a job, and each job has its own process group.
When I start a process in the foreground, should I call tcsetpgrp() in the parent (before waiting for the process) or the child (before exec'ing the command)?
Think about what could happen in each case. You need to avoid a situation where the launched program performs some action that requires ownership of the terminal, but tcsetpgrp() hasn't been called.
Is there a way to learn if all processes that are part of a pipeline have terminated?
No. You will receive notifications for each process that's part of a pipeline/job. They may exit in any order. Your shell must manage this: for instance, if a pipeline/job that consists of multiple processes is running in the foreground, the shell should not prompt for new input until after all processes that are part of the job have been reaped (and your shell received confirmation from the OS that they've terminated.)
My pipes don't work even though I used pipe() and dup2() correctly. strace -ff shows that writing to file descriptor 1/reading from file descriptor 0 by the child processes results in -1 EBADF.
You probably confused the read and write ends of the pipe.
My shell works when I test it on the command line but fails with the Python/pexpect test harness.
This may have several reasons. A common one is that the output of your shell does not match what the test script expects, and/or that you haven't adapted the file describing the output format of your shell accordingly. Among others, the test harness expects that the name of a job can be extracted via a regular expression from the job list. In addition, it expects that the shell echoes the command line of a job when it executes the 'fg' command. (Like bash does.)
After we place esh_pipeline objects into our jobs list we get a segfault in the loop where we go through the list of pipelines in the esh_command_line list - valgrind says "Invalid read of size 4"
An object can only be in one list at a time. If you're placing an esh_pipeline object into a new list without first removing it from the old list, you're corrupting the old list's next/prev pointers. This will lead to undefined results if you're doing this while iterating over the old list. Read list.c for techniques of how to avoid this. If you need to keep an object in multiple lists simultaneously, you must provide separate list_elem fields.
We update our job list and the job status before calling execvp(), but the update is ignored/does not work.
A common mistake is to misunderstand the effect of fork(). When the shell calls fork(), a new child process is created. The child process will eventually call exec() to load the program the user desired while the parent process will continue to implement the shell functionality - wait for foreground jobs, implement job control etc. The child process obtains a copy of all data from the parent at the time fork() is called. "All data" here means all heap objects, all global, and all local variables. This copy will have the same values that it had before the fork(), but any modifications affect only the child's copy. When the child execs, all of its data is wiped and replaced with the data of the new program. So, code like this
```
    /* WRONG CODE */
    if (fork() == 0) {
        /* in child */
        cmd->pipeline->status = FOREGROUND;   /* update child's status */
        /* Above statement is wrong and pointless - it updates the 
            child's copy of the 'pipeline' object only.
            All of the child's data, including everything on its heap, is about to
            be discarded by the subsequent execvp call we are about to execute. */
        
        execvp(cmd->argv[0], cmd->argv);
    }
```
is utterly wrong. Make sure that anything related the management of job lists, job status, etc. is done before the fork(), or after the fork() in the context of the parent (shell) process. Be aware that while you can use local/global variables and heap objects to pass information from the parent to the child when it is forked, you cannot use those variables to pass information back from the child to the parent, or from the parent to the child after the child has been forked.
When should we add error checking for our system calls?
When you code them. An all-too-frequent mistake is for students to waste hours or even days debugging their code's logic when they could have found the error immediately or in short time simply by checking which system calls fail, and why.
Is it normal for waitpid() to return -1?
If your shell is waiting for a foreground job (either because the user started a job without the ampersand or because the user typed the fg command), it will call waitpid(), at which point it'll block until a child exits or is stopped. When that happens, waitpid() will return and report which child exited or stopped and its status. However, a SIGCHLD will become pending at the same time. Despite the fact that waitpid() already reaped the child, SIGCHLD will still be delivered, but the waitpid() call inside the SIGCHLD handler has nothing to report. That's why you must use the WNOHANG flag in the SIGCHLD handler and that's why you should ignore if waitpid() returns -1 there.

As an aside, note that this sequence of events is Linux-specific and not guaranteed. It is also possible for the SIGCHLD handler to interrupt the main shell's waitpid() call, then for the signal handler's waitpid() to complete, and for the main shell's call to return -1, perhaps with errno set to EINTR. Your code should handle either scenario - which you can accomplish by processing the return value and status of whichever waitpid() call succeeds, and ignoring the waitpid() call that fails.

To make matters worse, the SIGCHLD signal may arrive after you've unblocked it, but before you call waitpid() in the main control flow. (Note that you cannot leave SIGCHLD blocked while calling waitpid()). In that case, the child will be reaped in the signal handler, and the call to waitpid() will return -1 with ECHILD.
What's the difference between blocking a process and blocking a signal?
The verb "to block" is used in at least two different meanings in operating systems. A process may "block" when it encounters a situation in which it cannot make progress. For instance, a process may wait for some event (such as a read() operation to complete, or a child process to exit, or simply for some time to pass). In this case, we say the process is blocked. Blocked processes do not use CPU time. They are unblocked when what they're blocked on finally happens. At that point, they become ready and the scheduler will eventually resume them as soon as a CPU/core becomes available.

Another meaning of "to block" refers to "blocking a signal." Blocking here means to delay the delivery of a pending signal until some later point in time. This is accomplished using an appropriate system call (sigmask(2) or sigblock(2)). It is needed when there is a potential for race conditions between the signal handler and the main control flow of a program.

Why does eshoutput.py expect a CRLF (\r\n) - I thought the Unix line termination character was LF (\n)?
Or: How come ^Z shows up if I turn on logging in eshoutput.py - I thought ^Z results in SIGTSTP being sent?

The regular expressions in eshoutput.py refer to the information that is output the master side of the pseudo terminal, and the logfile logs everything that is input into the master side of the pseudo terminal. Your shell, on the other hand, is connected to the pseudo terminal's slave end. The pty driver performs the same operations a regular terminal driver would. It can also be customized in the same way. For instance, if the flag ONLCR is set, it will transform a NL into a CRLF (See termios(3); you can run stty -a to see which flags are enabled by default for the pseudo terminal used by ssh.). Similarly, if ^Z is sent to the master end of the terminal, SIGTSTP will be sent to the terminal's foreground process job.

pty(7) summarizes the functionality provided by pseudo terminals:

   A  pseudo-terminal  is a pair of virtual character devices that provide a bidirectional
   communication channel.  One end of the channel is called the master; the other  end  is
   called  the  slave.   The  slave  end of the pseudo-terminal provides an interface that
   behaves exactly like a classical terminal.  A process that expects to be connected to a
   terminal,  can  open the slave end of a pseudo-terminal and then be driven by a program
   that has opened the master end.  Anything that is written on the master end is provided
   to  the process on the slave end as though it was input typed on a terminal.  For exam-
   ple, writing the interrupt character (usually control-C) to  the  master  device  would
   cause  an  interrupt  signal  (SIGINT) to be generated for the foreground process group
   that is connected to the slave.  Conversely, anything that is written to the slave  end
   of  the pseudo-terminal can be read by the process that is connected to the master end.
   Pseudo-terminals are used by applications  such  as  network  login  services  (ssh(1),
   rlogin(1), telnet(1)), terminal emulators, script(1), screen(1), and expect(1).

('expect(1)' is a program that performs similar functionality as the pexpect library Patrick extended for our testing framework.)

I'm trying use setpgrp() or getpgid(), but it doesn't compile.
You need to read the fine print in the man page for these functions:
```
       To get the  prototypes  under  glibc,  define  both  _XOPEN_SOURCE  and
       _XOPEN_SOURCE_EXTENDED, or use "#define _XOPEN_SOURCE n" for some integer 
       n larger than or equal to 500.
```
So, use:
```
#define _XOPEN_SOURCE 500
#include 
```
The reasons for this are historical. If you look at the Unix family tree you'll see two major branches: systems deriving from System V on the left, and systems in the BSD line in the center. Linux, on the right, now attempts to provide support for both lines; sometimes, there are functions with the same name, but different signatures (for instance, setpgrp()). Then you have to tell which one you want by defining certain pre-processor constants. In this assignment, you should not need to use any such functions (use setpgid() instead of setpgrp(), and there should be no need to ever call getpgid()).
Linux's man page for sigaction(2) says that siginfo->si_pid contains the pid of the child process that caused SIGCHLD. Why can I then not use it?
When POSIX signal handling is used, the OS will pass along additional information to the signal handler in arguments #2 and #3. Argument #2 is of type siginfo_t. See sigaction(2). However, since traditional Unix signals are not queued, si_pid for a SIGCHLD will contain the pid of the first child that caused SIGCHLD to become pending. Therefore, it cannot be used to learn which children need to be reaped. You must use waitpid(-1, ...). Study sigchlddoesnotqueue.c for an example.