M. D. Cao, L. Allison, T. I. Dix
Springer Verlag, LNCS 5866 (AI09), pp.71-80,
Phylogenetic analyses of species based on single genes or
parts of the genomes are often inconsistent because of
factors such as variable rates of evolution and
horizontal gene transfer. The availability of more and
more sequenced genomes allows phylogeny construction from
complete genomes that is less sensitive to such inconsistency.
For such long sequences, construction methods like
maximum parsimony and maximum likelihood are often
not possible due to their intensive computational requirement.
Another class of tree construction methods,
namely distance-based methods, require a measure of
distances between any two genomes.
Some measures such as evolutionary edit distance of
gene order and gene content are computational expensive or
do not perform well when the gene content of the organisms are similar.
This study presents an information theoretic measure of
genetic distances between genomes based on the
biological compression algorithm expert model.
We demonstrate that our distance measure can be
applied to reconstruct the consensus phylogenetic tree
of a number of Plasmodium parasites from their genomes,
the statistical bias of which would mislead conventional
Our approach is also used to successfully construct a
plausible evolutionary tree for the γ-Proteobacteria group
whose genomes are known to contain many horizontally transferred genes.