Computational Biology
Scribe Notes for Class 12
June 7, 2000
Scribe: N. Allen
Today's Topics
- Multiple Sequence Alignment
- Typically uses proteins as alphabet
- Start with: sequences s1,s2,...,sk
- Goal: produce alignment such as the ones on SM page 69 or 71
- One system for scoring is sum-of-pairs score:
- Each column scored seperately
- Sum of pairwise scores of all pairs of symbols in the column
- Dynamic programming implementation has time and space complexity nk
- Scoring heuristic:
- Aggregate two strings using a dynamic programming based algorithm
- Add additional strings to aggregate using a dynamic programming based algorithm
- Alternative scoring system: Tree Alignment (SM page 79)
- Database Searching
- Goal: approximate global alignment with a large database very fast
- One approach uses PAM (% of accepted mutations) matrices
- Works with number of mutations per 100 amino acid sequences
- Only considers mutations that are positively selected by the environment (survivable mutation)
- Considers mutations to be undirected events (A->Y equally likely as Y->A)
- Computing a PAM matrix
- Let pe be the frequency of occurence for some amino acid e
- Let fab be the number of a<->b mutations observed
- Let fa be the sum of fab for all b
- Let f be the of fa for all a
- The matrix entries Maa are then 1 - (fa / (100*f*pa))
- The matrix entries Mab are then (fab / fa) * (fa / (100*f*pa))
Today's Sources
Please report any problems found in these pages to:
CS6104 Account (cs6104@courses.cs.vt.edu)