How do environment variables work in Unix, and how does PATH
work in particular?
In Unix, each process maintains a set of environment variables that are simple
(string, string) pairs. Often, it includes variables such as HOME
, USER
,
OLDPWD
. The command env
can be used to show the current settings.
A process may read, set, and unset its environment variables using an API.
In C, that's the getenv
, setenv
, unsetenv
functions, and equivalent
APIs exist in most other languages.
When you type a command on the command line, the shell uses the fork
+ exec
system calls to start a new program in a new process. Therefore, we need to
understand the effect of these system calls on the environment.
When process starts a new process via fork()
, the child process inherits
a copy of the environment of the parent as it is at this point in time.
When a process starts a new program via exec()
, the new program's environment
can be set (look for the envp
argument in the execvpe
variants). When not
using those variants, that is, often by default, the environment is not changed
when a new program is executed.
An important aspect is that the environment is a per process property. For instance, if a process changes its environment variables and then exits, no one else will see or be affected by this change. If an environment variable is set by a process, they will not be set also in already existing parent or sibling processes, and they will not also be set `automatically' in future processes that are not direct descendants of this process.
The PATH
environment variable§
The PATH
environment variable is consulted by the execvp()
function that
is used to execute a new program. For instance, if the user types a command on a
shell's commandline, the shell will fork a new process and this new process will
use execvp
to run the command the user entered. If the command is ls
and
PATH
contains /opt/bin:/bin:/usr/bin
, in this order, then it will check for
/opt/bin/ls
, /bin/ls
, and /usr/bin/ls
, in this order. The first to succeed
will be executed (and the checking stops). If none exists, the call fails.
Because PATH
is part of the environment, it must be either set, or be correctly
passed down from an ancestor process. In Unix, process started by users often
have a shell as their ancestor process - this shell is started when the user logs
into the machine. Shells provide commands to manipulate environment variables.
They update their own environment, which as explained above is passed onto the
processes they start.
In bash, for instance, you may access the value of PATH
using the expression $PATH
.
(This is a little bit different than most ordinary languages where you don't have
to add a $
to access a variable on the right-hand side.)
Internal Shell Variables§
Making matters slightly more complicated, shells support a second type of variables:
internal variables, for instance, you may say: A=1
to set variable A
to 1
.
You would use export
to add internal variables to the environment, as in
export A
which can be combined with an assignment as in export A=42
.
However, if a variable is already in the set of environment variables, simply assigning to it will also update the environment variable by the same name. That's at least true for bash, it may not be true for all shells.
To prepend and/or append something to the PATH
, you would use
PATH=/prepend/this/directory:$PATH:/append/this/directory
or, optionally in bash,
export PATH=/prepend/this/directory:$PATH:/append/this/directory
Making Sure Your Environment Is Set Up Correctly Every Time§
Since environment variable settings are lost when processes exit, the environment
variable must be set anew every time a user logs onto the machine (so that they exist
in their shell's environment and can be passed to new processes).
That's why this command must appear in a file that is read and interpreted by new
login shells, typically in ~/.bash_profile
, and from there will be passed on and
inherited by any descendant commands the user starts.
Tilde expansion
When evaluating an expression, bash interprets the tilde ~ as the current user's
home directory, and ~user as user
's home directory. This process takes place only
when evaluating a variable (and is suppressed if it appears in double quotes!).
It will not take place inside execvp
. Thus, to add a directory whose path is
expressed relative to a user's home directory, you'd use
export PATH=$PATH:~cs3214/bin
without double quotes.
If you wrote export PATH="$PATH:~cs3214/bin"
the tilde would be preserved, and execvp
would
look in the non-existent directory ~cs3214/bin
.