UNIX Lab. Basic Awk scripting
Your name here (please print):
Construct awk commands to operate on an protein structure file
that will produce results specified below. Cut and paste your awk commands
below each question and submit the page with your answers when done.
To begin, download the above
file and examine its contents using vim . Read the "REMARK" section carefully. Note that awk is most likely aliased to gawk
on your machine, so it does not matter if you type gawk or awk
- (3 points) Compute the total charge of the protein, that
is the sum of charges of all atoms. (Note: the correct answer is < 100 )
- (3 points) Would the total charge change if you were to get rid
of every anino-acid named "LEU"? (you need to answer the question without
doing any modifications to the file. The answer should be "Yes" or "No" followed by the awk script you have used + some explanations if needed. ).
- (3 points) Now compute the total number of distinct amino-acids.
Note that you can not assume that they are numbered sequentially, but
you can be sure that each contains a single atom "CA".
- (3 point) Use sort to
re-order the lines in the input file in ascending
order with respect to charge (that is the first line becomes the
atom with the smallest charge. ) Make sure you filter out all lines
that do not contain charge information (that is do not
begin with "ATOM"). You can use unix pipes |
to connect various commands.
- (2 points) Find the total charge of all hydrogen atoms.
- (1 point) [Using awk] change all "ASP" into "ASH"