THEME I. PROTEIN FOLDING.
To answer some of these questions you will have to read the
Plaxco et al. paper; in particular you will find Eq. 1 of this work
useful. A link to the paper is available from the class' site.
For every protein you can define its residue-residue
``contact map" which is simply a matrix (A) such that a[i][j] = 1 if residues
(amino acids) i and j touch each other and zero otherwise.
Naturally, you assume that amino-acids are labeled sequentially in the
polypeptide chain.
Q1
You have two proteins, X and Y, each made of exactly 100 amino-acids. Protein-X has (A) such that a[i][j] = 1 if |i - j| < 6; a[i][j] = 0 otherwise.
Protein-Y is such that a[i][j] = 1 if |i - j| < 5 OR |i - j | = 95; a[i][j] = 0 otherwise. According to Plaxco et al., which protein will
have faster rate of folding and why?
Q2
Is it likely for a functional enzyme protein in its native state to have
(A) such that a[i][j] = 1 only if |i - j| = 0 or 1? Explain.
Q3
Assume that the theory of Plaxco et al. applies to polypeptides
as well (these can be considered as very small proteins). Based on the theory, order the following four polypeptides by their folding times: a) 24 residue alpha helix, "a24"; b) 48 residue alpha helix, "a48"; c) 24 residue beta hairpin "b24"; and d) 48 residue beta hairpin "b48". Explain.
If you want to know more about contact maps and are curious what a contact map of a real protein from the PDB site might look like, you may download the Macromolecular Contact tool and play with it. It is java-based, and should be pretty much
plug-and-play.
THEME II. RNA SECONDARY STRUCTURE PREDICTION.
Exact solutions of idealized models are, in most cases,
useless in practical applications, but
allow one to test fundamental concepts and more practical, but approximate
solutions. In this problem you will develop and implement (choose any suitable language) an algorithm that is guaranteed to find ALL exact lowest energy states
for very short RNA sequences (length < 20).
Energy function: Total energy = Sum_all_possible_pairs ( individual pair energy)
individual pair energy = -3 for a GC base pair, -2 for an AU base pair, 0 for
any unpaired base, and +3 for any mismatched pair, such as GG.
No other contributions to the total energy should be considered. For example,
here is no extra penalty for very small loops or sharp "kinks", even though
these may look ``unrealistic".
Q4 Develop an algorithm that satisfies the criteria above. Present a flow chart of your algorithm. What is its time complexity?
Q5 Apply your algorithm to the following sequence of total length 15: GGGGGGGCCCCCCCC. How many different
structures correspond to the lowest energy state? What would one see in
an experiment aimed at determining this structure?
Q6 Propose a single mutation in the above structure that will qualitatively change the result in Q5.