Computational Biology
Scribe Notes for Class 11
June 6, 2000
Scribe: S. Oak
Today's Handouts and Announcements
-
Searching Protein Sequence Libraries: Comparison
of the Sensitivity and Selectivity of the Smith - Waterman and FASTA Algorithms
- William R. Pearson.
-
An O(ND) Difference Algorithm and its Variations
- Eugene W. Myers.
Today's Topics
The topic of Sequence Comparison
was discussed, along with a set of computational challenges.
-
Similarity of 2 sequences gives a measure
of how similar the sequences are.
-
Alignment of 2 sequences is a way of placing
one sequence above the other inorder to make clear the correspondence between
similar characters or substrings from the substrings.
-
Methods for comparing sequences:
-
Global comparison in which we are interested
in alignments involving the entire sequences.
-
Local comparison in which we are interested
in alignments involving the substrings of the sequences.
-
There is also a third type of comparison in which we are interested in
aligning the prefixes and sufixes of the given sequences, which is
called the semi global comparison.
-
All the problems can be solved by dynamic programming.
-
The basic algorithm computes the scores for every gap, match and mismatch
in the given sequences and then add up the column scores.
-
The scoring system is such that matches are
rewarded (positive score) and mismatches and spaces are penalised (negative
score).
-
Dynamic programming results in a (m+1)(n+1) matrix
where m = total number of bases in seq 1 ;
and n = total number of bases in seq 2.
-
Each cell in the matrix, cell(i,j) corresponds to matching prefix of seq
1 of length i with prefix of seq 2 of length j.
-
Scoring for Global Alignment:
-
The score a[i,j] in cell[i,j] = max (
a[i,j-1] - 2 -- gap
a[i-1,j] -2 -- gap
a[i-1, j-1] - 1 -- mismatch
a[i-1,j-1] + 1 -- match
)
-
Asymtotic Time complexity = mn
-
Asymtotic Space complexity = mn
-
Scoring for Local Alignment:
-
The score a[i,j] in cell[i,j] = max (
a[i,j-1] - 2 -- gap
a[i-1,j] -2 -- gap
a[i-1, j-1] - 1 -- mismatch
a[i-1,j-1] + 1 -- match
0
)
-
Asymtotic Time complexity = mn
-
Asymtotic Space complexity = mn
-
Space saving can be accomplished by Divide and conquer technique.
-
Gap penalty Functions considers that with
mutations involved the occurrence of a gap with 'k' spaces is more probable
than the occurence of 'k' isolated spaces.
-
We use the same dynamic programming algo. but adding a third dimension
for each cell that maintains the length of gaps leading to that particular
cell.
-
Asymtotic Time complexity = (m+n)mn
-
Asymtotic Space complexity = (m+n)mn
Today's Sources
Please report any problems found in these pages to:
CS6104 Account (cs6104@courses.cs.vt.edu)