
L.Allison, C.S.Wallace & C.N.Yee,
J. Molec. Evol., 35(1), pp.7789, 1992
Abstract:
Minimum message length encoding is a technique of
inductive inference with theoretical and practical advantages.
It allows the posterior oddsratio of two theories or hypotheses
to be calculated.
Here it is applied to problems of aligning
or relating two strings, in particular
two biological macromolecules.
We compare the rtheory, that the strings are related,
with the nulltheory, that they are not related.
If they are related, the probabilities of the various alignments
can be calculated.
This is done for one, three and fivestate models of relation or mutation.
These correspond to linear and piecewise linear cost functions
on runs of insertions and deletions.
We describe how to estimate parameters of a model.
The validity of a model is itself an hypothesis and can
be objectively tested.
This is done on real DNA strings and on artificial data.
The tests on artificial data indicate limits on what can be
inferred in various situations.
The tests on real DNA support either the three or fivestate models
over the onestate model.
Finally, a fast, approximate, minimum message length string
comparison algorithm is described.
[doi:10.1007/BF00160262],
or
[paper.ps] and
[software].
Also see on
[trees & multiple alignment].

