CS 2204 Lab 6
Your name here (please print):
Your student ID number here:
Construct gawk commands to operate on an protein structure file
that will produce results specified below. Write down your gawk commands
below each question. To begin, download the above
file and examine its contents using vim . Read the "REMARK" section carefully. Note that awk is most likely aliased to gawk
on your machine, so it does not matter if you type gawk or awk
- (2 points) Compute the total charge of the protein, that
is the sum of charges of all atoms. (Note: the correct answer is < 100 )
- (3 points) Would the total charge change if you were to get rid
of every anino-acid named "LEU"? (you need to answer the question without
doing any modifications to the file. The answer should be "Yes" or "No" followed by the gawk script you have used + some explanations if needed. ).
- (2 points) Now compute the total number of distinct amino-acids.
Note that you can not assume that they are numbered sequentially, but
you can be sure that each contains a single atom "CA".
- (2 point) Use sort to
re-order the lines in the input file in ascending
order with respect to charge (that is the first line becomes the
atom with the smallest charge. ) Make sure you filter out all lines
that do not contain charge information (that is do not
begin with "ATOM"). You can use unix pipes |
to connect various commands.
- (2 points) Find the total charge of all hydrogen atoms.
- (1 point) [Using gawk] change all "ASP" into "ASH"