How do environment variables work in Unix, and how does PATH work in particular?

In Unix, each process maintains a set of environment variables that are simple (string, string) pairs. Often, it includes variables such as HOME, USER, OLDPWD. The command env can be used to show the current settings.

A process may read, set, and unset its environment variables using an API. In C, that's the getenv, setenv, unsetenv functions, and equivalent APIs exist in most other languages.

When you type a command on the command line, the shell uses the fork + exec system calls to start a new program in a new process. Therefore, we need to understand the effect of these system calls on the environment.

When process starts a new process via fork(), the child process inherits a copy of the environment of the parent as it is at this point in time.

When a process starts a new program via exec(), the new program's environment can be set (look for the envp argument in the execvpe variants). When not using those variants, that is, often by default, the environment is not changed when a new program is executed.

An important aspect is that the environment is a per process property. For instance, if a process changes its environment variables and then exits, no one else will see or be affected by this change. If an environment variable is set by a process, they will not be set also in already existing parent or sibling processes, and they will not also be set `automatically' in future processes that are not direct descendants of this process.

The PATH environment variable§

The PATH environment variable is consulted by the execvp() function that is used to execute a new program. For instance, if the user types a command on a shell's commandline, the shell will fork a new process and this new process will use execvp to run the command the user entered. If the command is ls and PATH contains /opt/bin:/bin:/usr/bin, in this order, then it will check for /opt/bin/ls, /bin/ls, and /usr/bin/ls, in this order. The first to succeed will be executed (and the checking stops). If none exists, the call fails.

Because PATH is part of the environment, it must be either set, or be correctly passed down from an ancestor process. In Unix, process started by users often have a shell as their ancestor process - this shell is started when the user logs into the machine. Shells provide commands to manipulate environment variables. They update their own environment, which as explained above is passed onto the processes they start.

In bash, for instance, you may access the value of PATH using the expression $PATH. (This is a little bit different than most ordinary languages where you don't have to add a $ to access a variable on the right-hand side.)

Internal Shell Variables§

Making matters slightly more complicated, shells support a second type of variables: internal variables, for instance, you may say: A=1 to set variable A to 1. You would use export to add internal variables to the environment, as in export A which can be combined with an assignment as in export A=42.

However, if a variable is already in the set of environment variables, simply assigning to it will also update the environment variable by the same name. That's at least true for bash, it may not be true for all shells.

To prepend and/or append something to the PATH, you would use

PATH=/prepend/this/directory:$PATH:/append/this/directory

or, optionally in bash,

export PATH=/prepend/this/directory:$PATH:/append/this/directory
Making Sure Your Environment Is Set Up Correctly Every Time§

Since environment variable settings are lost when processes exit, the environment variable must be set anew every time a user logs onto the machine (so that they exist in their shell's environment and can be passed to new processes). That's why this command must appear in a file that is read and interpreted by new login shells, typically in ~/.bash_profile, and from there will be passed on and inherited by any descendant commands the user starts.

Tilde expansion

When evaluating an expression, bash interprets the tilde ~ as the current user's home directory, and ~user as user's home directory. This process takes place only when evaluating a variable (and is suppressed if it appears in double quotes!). It will not take place inside execvp. Thus, to add a directory whose path is expressed relative to a user's home directory, you'd use

export PATH=$PATH:~cs3214/bin

without double quotes. If you wrote export PATH="$PATH:~cs3214/bin" the tilde would be preserved, and execvp would look in the non-existent directory ~cs3214/bin.