Computational Biology
Scribe Notes for Class 11
June 6, 2000
Scribe: S. Oak
Today's Handouts and Announcements
Searching Protein Sequence Libraries: Comparison
of the Sensitivity and Selectivity of the Smith - Waterman and FASTA Algorithms
- William R. Pearson.
An O(ND) Difference Algorithm and its Variations
- Eugene W. Myers.
Today's Topics
The topic of Sequence Comparison
was discussed, along with a set of computational challenges.
Similarity of 2 sequences gives a measure
of how similar the sequences are.
Alignment of 2 sequences is a way of placing
one sequence above the other inorder to make clear the correspondence between
similar characters or substrings from the substrings.
Methods for comparing sequences:
Global comparison in which we are interested
in alignments involving the entire sequences.
Local comparison in which we are interested
in alignments involving the substrings of the sequences.
There is also a third type of comparison in which we are interested in
aligning the prefixes and sufixes of the given sequences, which is
called the semi global comparison.
All the problems can be solved by dynamic programming.
The basic algorithm computes the scores for every gap, match and mismatch
in the given sequences and then add up the column scores.
The scoring system is such that matches are
rewarded (positive score) and mismatches and spaces are penalised (negative
Dynamic programming results in a (m+1)(n+1) matrix
where m = total number of bases in seq 1 ;
and n = total number of bases in seq 2.
Each cell in the matrix, cell(i,j) corresponds to matching prefix of seq
1 of length i with prefix of seq 2 of length j.
Scoring for Global Alignment:
The score a[i,j] in cell[i,j] = max (
a[i,j-1] - 2 -- gap
a[i-1,j] -2 -- gap
a[i-1, j-1] - 1 -- mismatch
a[i-1,j-1] + 1 -- match
Asymtotic Time complexity = mn
Asymtotic Space complexity = mn
Scoring for Local Alignment:
The score a[i,j] in cell[i,j] = max (
a[i,j-1] - 2 -- gap
a[i-1,j] -2 -- gap
a[i-1, j-1] - 1 -- mismatch
a[i-1,j-1] + 1 -- match
Asymtotic Time complexity = mn
Asymtotic Space complexity = mn
Space saving can be accomplished by Divide and conquer technique.
Gap penalty Functions considers that with
mutations involved the occurrence of a gap with 'k' spaces is more probable
than the occurence of 'k' isolated spaces.
We use the same dynamic programming algo. but adding a third dimension
for each cell that maintains the length of gaps leading to that particular
Asymtotic Time complexity = (m+n)mn
Asymtotic Space complexity = (m+n)mn
Today's Sources
Please report any problems found in these pages to:
CS6104 Account (