Minh Duc Cao, Trevor I. Dix, and Lloyd Allison
Technical Report 233,
Faculty of Information Technology (Clayton Campus),
Traditional genome alignment methods based on dynamic programming are often
a. computationally expensive,
b. unable to compare the genomes of distant species and
c. unable to deal with low information regions.
We are presenting an information-theoretic approach for
pairwise genome local alignment. Our method, the expert model aligner,
the XMAligner, relies on the expert model compression algorithm.
To align two sequences, XMAligner first compresses one sequence to
measure the information content at each position in the sequence.
Then the sequence is compressed again but this time with the
background knowledge from the other sequence to obtain the conditional
information content. The information content and the conditional information
content from the two compressions are examined.
Similar regions in the compressed sequence should have the conditional
information content lower than the individual information content.
The method is applied to align the genomes of Plasmodium falciparum and
Plasmodium knowlesi against other three Plasmodium genomes with
different levels of diversity.
Despite the differences in nucleotide composition of the reference sequences,
the conserved regions found by XMAligner in three alignments are relatively
consistent. A strong correlation was found between the similar regions
detected by the XMAligner and the hypothetical annotation of
The alignment results can be integrated into the DNAPlatform for
(Technical Reports can be obtained from the
Faculty of Information Tech. (Clayton),
Monash University, Australia 3800.)