Computational Biology

Scribe Notes for Class 16
June 13, 2000

Scribe: S. Oak

Today's Handouts and Announcements

None

Today's Topics

The topic of Markov Chains was discussed, along with a set of computational challenges.

A finite Automaton can be created for the sequences. Here is the example
Expects the input string and is more precise.
To make it less precise each transition is labeled with the probability of the occurance of a particular Base in the given sequences.
This gives a MARKOV chain. Here is the example
Markov chain is used on the symbols over a given alphabet and a gap ( lambda).
Markov chain can be used to score all the sequences.
Hidden markov chains (Hidden markov models) use the positions in alignments as the states in the graph. (Refer to fig 2. on page 49 for HMM of Salzberg, Searls, and Kasif:)
In HMM, regions with gaps are identified and a state is formed.Sequences that align properly will score high while very long sequences extended on sides may get penalised fue to the traversal through the gapped state.
Thus in HMM the probability depends very strongly on the length of the sequence.
Hence Log-odds are used. Log-odds is the logarithm of the probability of the sequence divided by the probability according to the null model.
Null model treats the sequences as random strings of nucleotides, so the probability of sequence of length L = 0.25^L
Log-odds for sequence S = log P(S) - log(0.25^L)
Therefore Log-odds for sequence S = log P(S) - L * log 0.25
When the sequence fits motif the log odds are high, while if it fits the null model then the score is negative.
The Profile HMM is a certain type of HMM with a structure that in normal way allows position dependent gap penalties.
It contains -

Main states - represented by Boxes.
Insertion states - represented by Diamonds.
Delete or NULL states - represented by Circles.

It is dangerous to estimate a probability distribution from just a few observed amino acids.

If only 2 sequences are given with leucine at certain positions then probability of leucine is 1 while others is 0.In such a case is any other Amiono acid is substituted for leucine then the probability becomes 0.

Pseudocounts are used to overcome the above scenario.The simplest is to add 1 to all counts.
Viterbi Algorithm is used to find the best alignment, the one with largest probability of the sequence or the log-odds scores.
Its a dynamic programming technique wherein we get a (L * 3k - 3 matrix) where L is length of the sequence and k denotes the number of main states in the profile HMM.
If last 2 letters are remembered during the HMM then it is called the First Order HMM.
If last 6 bases are remembered then these give rise to 2 codons and can be used to find the corelation among the adjacent Amino acids.
The 6 bases are called hexamers.

Today's Sources

Setubal and Meidanis:Not Used
Salzberg, Searls, and Kasif: Chapter 4 ( pages 45 - 54).

Please report any problems found in these pages to:

CS6104 Account (cs6104@courses.cs.vt.edu)

Computational Biology

Scribe Notes for Class 16 June 13, 2000

Scribe: S. Oak

Today's Handouts and Announcements

Today's Topics

Today's Sources

Scribe Notes for Class 16
June 13, 2000