- The following gawk script just
cats a file:
gawk -f <gawk script name>):
{
print
}
First, gawk is really gust a gnu- awk ...
they are the same apart from minor details.
The curly braces contain commands.
Since there is nothing before {, it
means that these commands must be applied to all lines.
- gawk has two special patterns, BEGIN
and END where you can put commands that must be
done before any line is read, and after all lines are
read (respectively). Here's an example:
BEGIN {
print "I am going to start reading a file. Whoopie!"
}
{
print
}
END {
print "I have finished reading the file. Sigh."
}
- When gawk reads a line, it automatically parses
the line and puts pieces of the line into defined variables
such as $1 (first field), $2 (second field), etc. The
default field separator is a tab. So, the gawk script
{
print $1
}
will just print the names.
- Can also create variables and tinker with them, just like
we would in a C program. Here's how you will calculate the
average value of scores in the first column of numbers (which
is actually the second column of the file).
BEGIN {
total = 0
lc = 0
}
{
total = total + $2
++lc
}
END {
avg = total/lc
print total, avg
}
- Can tinker with some gawk system variables to
modify the output, e.g., OFS stands for "output
field separator". We can set it in the BEGIN part
by:
BEGIN {
total = 0
lc = 0
OFS = "---"
}
This will affect all subsequent printings done using
the print command. In between two variables (listed
in comma separated format), gawk will insert the
output field separator. Similarly, there is a FS which
is an input field separator tag. This will tell gawk
to use some other input field separator than the default tab. In
class we also mentioned NR which counts the number
of lines parsed by awk. When all of them are parsed, NR = number of
lines parsed.
- It is good practice to put one gawk command
on each line. If you put multiple commands, you will need
to use a ";" to separate them.
To sum up. There are (at least) three ways to run an awk (gawk)
command.