Computational Biology
Scribe Notes for Class 20
June 19, 2000
Scribe: J. Zwolak
Today's Handouts and Announcements
Today's Topics
-
The Shortest Common Superstring can be used in fragment assembly in Biology. However there are some problems that arise by using a simple straight forward SCS algorithm.
-
GREEDY --- Take the two strings with the greatest overlap. Combine them and repeat the process until one string is left. This algorithm is a simple SCS approximation.
-
MGREEDY --- Create an overlap graph. Partition the graph into cycles. Concatenate the SCS of each cycle (this was not clear to me, but this is how I understand it).
-
TGREEDY --- Run MGREEDY then GREEDY. Gives the best approximation to SCS.
-
To address some of the problems with the input fragments (ie reverse compliments and errors) the compliments of the fragments can be included in the list of fragments and/or errors can be permitted in the overlap.
Today's Sources
-
Setubal and Meidanis:
Chapter 4.
-
Salzberg, Searls, and Kasif:
Chapter 9.
-
Avrim Blum, Tao Jiang, Ming Li, John Tromp, and Mihalis Yannakakis
Linear Approximation of Shortest Superstrings
Journal of the ACM 41, July 1994, 630-647.
-
S. Rao Kosaraju and Arthur L. Delcher
Large-Scale Assembly of DNA Strings
and Space-Efficient Construction of Suffix Trees
ACM Symposium on Theory of Computing, 1995, 169-177.
Please report any problems found in these pages to:
CS6104 Account (cs6104@courses.cs.vt.edu)