CS 5984 Fall 2000 Homework Assignment 10

50 Points
Due: 12/6/00 at 5:00PM

The point value of each problem is shown in square brackets [ ]. Your solutions must be prepared with LaTeX or other word processing system and submitted as a stapled printout to a box outside the instructor's office (McBryde 638). This homework is due at 5:00PM on December 6, 2000. No late homework will be accepted. Be certain to write your solutions in COMPLETE SENTENCES.

  1. [25] Here is a set of four amino acid (protein) sequences:
    S1 = MEGKEENMR
    S2 = MVGKERR
    S3 = NVGRIINMV
    S4 = NVRINMV
    

    Let the scoring function s(x,y) be
    s(x,y)=0 if x matches (equals) y;
    s(x,y)=1 if x does not match (equal) y;
    s(x,y)=1 if either x or y is space (-), but not both.

    Use the center star method to find an approximately optimal SP alignment for the four strings given. Show all your computations.

  2. [25] Let S1, S2, S3, and S4 be the same as in problem 1. Let T be the following phylogenetic tree for those strings:

    Using the approximation approach in Gusfield 14.8.1, compute an optimal lifted alignment for T. What is the distance of your alignment?

    Additional Challenge for the Bored (Not required; no extra credit offered)
    Find a phylogenetic tree whose optimal phylogenetic alignment (Gusfield 4.8) for S1, S2, S3, and S4 has minimum distance. In particular, the best phylogenetic tree may have no structural similarity to T.


Please report any problems found in these pages to:

CS5984 Class Account (algnbio@courses.cs.vt.edu)