D. R. Powell, L. Allison, T. I. Dix and D. L. Dowe,
Australian Computer Science Communications, Computing Theory '98,
Proceedings of the Fourth Australasian Theory Symposium (CATS '98),
Perth, Australia, 2-3 February 1998,
Springer-Verlag, Singapore, ISBN:98103083-92-1, 20:3: pp215-229
Alignment of two random sequences over a fixed alphabet can
be shown to be optimally done by a Dynamic Programming Algorithm (DPA).
It is normally assumed that the sequences are random and incompressible
and that one sequence is a mutation of the other.
However, DNA and many other sequences are not always random and unstructured,
and the issue arises as how to best align compressible sequences.
Assuming our sequences to be non-random and to emanate from
mutations of a first order Markov model, we note that alignment of
high information regions is more important than alignment of low information
regions and arrive at a new alignment method for low information sequences
which performs better than the standard DPA for data generated from
mutations of a first order Markov model.
Keywords: Sequence Alignment, DNA, Biology, Information Theory.