UNIX Lab. Basic Awk scripting

Your name here (please print):

Construct awk commands to operate on an protein structure file that will produce results specified below. Cut and paste your awk commands below each question and submit the page with your answers when done. To begin, download the above file and examine its contents using vim . Read the "REMARK" section carefully. Note that awk is most likely aliased to gawk on your machine, so it does not matter if you type gawk or awk

  1. (3 points) Compute the total charge of the protein, that is the sum of charges of all atoms. (Note: the correct answer is < 100 )
  2. (3 points) Would the total charge change if you were to get rid of every anino-acid named "LEU"? (you need to answer the question without doing any modifications to the file. The answer should be "Yes" or "No" followed by the awk script you have used + some explanations if needed. ).
  3. (3 points) Now compute the total number of distinct amino-acids. Note that you can not assume that they are numbered sequentially, but you can be sure that each contains a single atom "CA".
  4. (3 point) Use sort to re-order the lines in the input file in ascending order with respect to charge (that is the first line becomes the atom with the smallest charge. ) Make sure you filter out all lines that do not contain charge information (that is do not begin with "ATOM"). You can use unix pipes | to connect various commands.
  5. (2 points) Find the total charge of all hydrogen atoms.
  6. (1 point) [Using awk] change all "ASP" into "ASH"