UNIX Lab. Basic Awk scripting

Your name here (please print):

Your student ID number here:

Construct gawk commands to operate on an protein structure file that will produce results specified below. Write down your gawk commands below each question. You can also use egrep if you think that gives a shorter solution. To begin, download the above file and examine its contents using vim . Read the "REMARK" section carefully. Note that awk is most likely aliased to gawk on your machine, so it does not matter if you type gawk or awk

  1. (2 points) Compute the total charge of the protein, that is the sum of charges of all atoms. (Note: the correct answer is < 100 ) gawk 'BEGIN{s=0} {if ($1 == "ATOM" ) s+= $9 } END {print s}' protein.pqr
    The total charge is 2.
  2. (3 points) Would the total charge change if you were to get rid of every anino-acid named "LEU"? (you need to answer the question without doing any modifications to the file. The answer should be "Yes" or "No" followed by the gawk script you have used + some explanations if needed. ). gawk 'BEGIN { total = 0 } /LEU/ { total += $9 } END { print total }' protein.pqr
    The result, which is the total charge on all "LEU", is 0, so the answer is "No".
  3. (2 points) Now compute the total number of distinct amino-acids. Note that you can not assume that they are numbered sequentially, but you can be sure that each contains a single atom "CA". grep -c "CA" protein.pqr
    32 single-atom CA's
  4. (1 point) Re-order the lines in the input file in ascending order with respect to charge (that is the first line becomes the atom with the smallest charge. ) gawk '{ if ( $1 == "ATOM" ) print }' < protein.pqr | sort -n -k 9
  5. (2 points) Find the total charge of all hydrogen atoms. gawk 'BEGIN{s=0} {if ($3 ~ /^H/) s+=$9} END{print s}' < protein.pqr Note that you can't simply search for an occurence of "H", as it can happen outside of the atom name. The ^ makes sure you pick the first "H" in a word and don't pick up something like "CHH".
  6. (1 point) [Using gawk] change all "ASP" into "ASH" gawk '{ gsub(/ASP/,"ASH"); print }' protein.pqr