THEME I. PROTEIN FOLDING. To answer some of these questions you will have to read the Plaxco et al. paper; in particular you will find Eq. 1 of this work useful. A link to the paper is available from the class' site.


For every protein you can define its residue-residue ``contact map" which is simply a matrix (A) such that a[i][j] = 1 if residues (amino acids) i and j touch each other and zero otherwise. Naturally, you assume that amino-acids are labeled sequentially in the polypeptide chain.

Q1 You have two proteins, X and Y, each made of exactly 100 amino-acids. Protein-X has (A) such that a[i][j] = 1 if |i - j| < 6; a[i][j] = 0 otherwise. Protein-Y is such that a[i][j] = 1 if |i - j| < 5 OR |i - j | = 95; a[i][j] = 0 otherwise. According to Plaxco et al., which protein will have faster rate of folding and why?

Q2 Is it likely for a functional enzyme protein in its native state to have (A) such that a[i][j] = 1 only if |i - j| = 0 or 1? Explain.

Q3 Assume that the theory of Plaxco et al. applies to polypeptides as well (these can be considered as very small proteins). Based on the theory, order the following four polypeptides by their folding times: a) 24 residue alpha helix, "a24"; b) 48 residue alpha helix, "a48"; c) 24 residue beta hairpin "b24"; and d) 48 residue beta hairpin "b48". Explain.

If you want to know more about contact maps and are curious what a contact map of a real protein from the PDB site might look like, you may download the Macromolecular Contact tool and play with it. It is java-based, and should be pretty much plug-and-play.

THEME II. RNA SECONDARY STRUCTURE PREDICTION.


Exact solutions of idealized models are, in most cases, useless in practical applications, but allow one to test fundamental concepts and more practical, but approximate solutions. In this problem you will develop and implement (choose any suitable language) an algorithm that is guaranteed to find ALL exact lowest energy states for very short RNA sequences (length < 20).

Energy function: Total energy = Sum_all_possible_pairs ( individual pair energy)

individual pair energy = -3 for a GC base pair, -2 for an AU base pair, 0 for any unpaired base, and +3 for any mismatched pair, such as GG. No other contributions to the total energy should be considered. For example, here is no extra penalty for very small loops or sharp "kinks", even though these may look ``unrealistic".

Q4 Develop an algorithm that satisfies the criteria above. Present a flow chart of your algorithm. What is its time complexity?

Q5 Apply your algorithm to the following sequence of total length 15: GGGGGGGCCCCCCCC. How many different structures correspond to the lowest energy state? What would one see in an experiment aimed at determining this structure?

Q6 Propose a single mutation in the above structure that will qualitatively change the result in Q5.