Alignment

LA home
Computing
Bioinformatics
 Glossary
 Alignment
  AI2004
  CompJ99
  IPL99
  JTB93
  JME92
  IPL92
  AIM90
  IPL86

Also see
DPA

To align two or more sequences is to pad out each sequence with null "characters", `-', until they have the same length and in such a way as to optimise some criterion -- to maximise a score or to minimise a cost. The sequences can then be written out one above the other, e.g.,

  S1: ACGTA-GTACGT
      || || ||| ||
  S2: AC-TACGTAGGT
with their characters lining up in columns.

If you care to think of S1 as the ancestor and of S2 as the descendant then a `-' in S2 indicates a deletion and a '-' in S1 indicates an insertion; insertions and deletions are collectively called indels. However you can just as well consider S2 to be the ancestor and S1 the descendant. If S1 and S2 are present day sequences related by evolutionary history then they are both descendants of some unknown hypothetical ancestor.

Generally, a match (identical characters) is "good" and has a high score (low cost), and mismatch, insertion or deletion are "bad" and have low scores (high costs).

An alignment that optimises the criterion is optimal. An optimal alignment can be found with a dynamic programming algorithm[DPA]. In general, there may be more than one optimal alignment.

If there are more than two sequences to align the process is called multiple alignment. There are various ways of defining a score function or a cost function on multiple alignments. Alignment of three sequences is a useful special case because each internal node of a phylogenetic tree has three neighbours.

www

free:
Linux operating-sys
OpenOffice office-suite, ver. 2.4+
The GIMP ~photoshop
Firefox web browser
FlashBlock flash on/off

© L. Allison   http://www.allisons.org/ll/   (or as otherwise indicated),
Created with "vi (Linux + Solaris)",  charset=iso-8859-1,  fetched Saturday, 11-Oct-2008 09:56:28 EST.