Computational Biology
Scribe Notes for Class 20
June 19, 2000
Scribe: J. Zwolak
Today's Handouts and Announcements
Today's Topics
The Shortest Common Superstring can be used in fragment assembly in Biology. However there are some problems that arise by using a simple straight forward SCS algorithm.
GREEDY --- Take the two strings with the greatest overlap. Combine them and repeat the process until one string is left. This algorithm is a simple SCS approximation.
MGREEDY --- Create an overlap graph. Partition the graph into cycles. Concatenate the SCS of each cycle (this was not clear to me, but this is how I understand it).
TGREEDY --- Run MGREEDY then GREEDY. Gives the best approximation to SCS.
To address some of the problems with the input fragments (ie reverse compliments and errors) the compliments of the fragments can be included in the list of fragments and/or errors can be permitted in the overlap.
Today's Sources
Setubal and Meidanis:
Chapter 4.
Salzberg, Searls, and Kasif:
Chapter 9.
Avrim Blum, Tao Jiang, Ming Li, John Tromp, and Mihalis Yannakakis
Linear Approximation of Shortest Superstrings
Journal of the ACM 41, July 1994, 630-647.
S. Rao Kosaraju and Arthur L. Delcher
Large-Scale Assembly of DNA Strings
and Space-Efficient Construction of Suffix Trees
ACM Symposium on Theory of Computing, 1995, 169-177.
Please report any problems found in these pages to:
CS6104 Account (