Alignment of low information sequences

LA home
Computing
Publications
 CATS98
  paper.ps

also see
 M-align
 CompJ99
and
 Bioinformatics

D. R. Powell, L. Allison, T. I. Dix and D. L. Dowe,
Australian Computer Science Communications, Computing Theory '98,
Proceedings of the Fourth Australasian Theory Symposium (CATS '98),
Perth, Australia, 2-3 February 1998,
Springer-Verlag, Singapore, ISBN:98103083-92-1, 20:3: pp215-229

Abstract: Alignment of two random sequences over a fixed alphabet can be shown to be optimally done by a Dynamic Programming Algorithm (DPA). It is normally assumed that the sequences are random and incompressible and that one sequence is a mutation of the other. However, DNA and many other sequences are not always random and unstructured, and the issue arises as how to best align compressible sequences.

Assuming our sequences to be non-random and to emanate from mutations of a first order Markov model, we note that alignment of high information regions is more important than alignment of low information regions and arrive at a new alignment method for low information sequences which performs better than the standard DPA for data generated from mutations of a first order Markov model.

Keywords: Sequence Alignment, DNA, Biology, Information Theory.

www:


© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Created with "vi (Linux or Solaris)",  charset=iso-8859-1,  fetched Wednesday, 20-Sep-2017 10:34:23 EDT.

free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop,
Firefox web-browser, FlashBlock flash on/off.