Computational Biology
Scribe Notes for Class 16
June 13, 2000
Scribe: S. Oak
Today's Handouts and Announcements
Today's Topics
The topic of Markov Chains was discussed, along with a set
of computational challenges.
-
A finite Automaton can be created for
the sequences. Here is the example
-
Expects the input string and is more precise.
-
To make it less precise each transition is labeled with the probability
of the occurance of a particular Base in the given sequences.
-
This gives a MARKOV chain. Here is
the
example
-
Markov chain is used on the symbols over a given
alphabet and a gap ( lambda).
-
Markov chain can be used to score all the sequences.
-
Hidden markov chains
(Hidden markov models) use the positions in alignments as the states in
the graph. (Refer to fig 2. on page 49 for HMM of Salzberg,
Searls, and Kasif:)
-
In HMM, regions with gaps are identified and a state is formed.Sequences
that align properly will score high while very long sequences extended
on sides may get penalised fue to the traversal through the gapped
state.
-
Thus in HMM the probability depends very strongly on the length of the
sequence.
-
Hence Log-odds are used. Log-odds is
the logarithm of the probability of the sequence divided by the probability
according to the null model.
-
Null model treats the sequences as random strings of nucleotides, so the
probability of sequence of length L = 0.25L
-
Log-odds for sequence S = log P(S)
- log(0.25L)
-
Therefore Log-odds for sequence S = log
P(S) - L * log 0.25
-
When the sequence fits motif the log odds are high, while if it fits the
null model then the score is negative.
-
The Profile HMM is a certain type of
HMM with a structure that in normal way allows position dependent gap penalties.
-
It contains -
-
Main states - represented by Boxes.
-
Insertion states - represented by Diamonds.
-
Delete or NULL states - represented by Circles.
-
It is dangerous to estimate a probability distribution from just a few
observed amino acids.
-
If only 2 sequences are given with leucine at certain positions
then probability of leucine is 1 while others is 0.In such
a case is any other Amiono acid is substituted for leucine
then the probability becomes 0.
-
Pseudocounts are used to overcome the
above scenario.The simplest is to add 1 to all counts.
-
Viterbi Algorithm is used to find the
best alignment, the one with largest probability of the sequence or the
log-odds scores.
-
Its a dynamic programming technique wherein we get a (L * 3k - 3 matrix)
where L is length of the sequence and k denotes the number of main states
in the profile HMM.
-
If last 2 letters are remembered during the HMM then it is called the First
Order HMM.
-
If last 6 bases are remembered then these give rise to 2 codons and can
be used to find the corelation among the adjacent Amino acids.
-
The 6 bases are called hexamers.
Today's Sources
Please report any problems found in these pages to:
CS6104 Account (cs6104@courses.cs.vt.edu)