Minimum message length (MML) inference was devised
by Chris Wallace and David Boulton c1968 and
developed by Chris Wallace and many colleagues.
MML is a Bayesian method of inference:
- Bayes's theorem:
- pr(H&D) = pr(H).pr(D|H)
- pr(H|D) = pr(H&D) / pr(D)
- msgLen(E) = I(E) = - log2(pr(E)) bits
- msgLen(H&D) = msgLen(H) +msgLen(D|H)
= msgLen(D) +msgLen(H|D)
for hypothesis H, data D, event E.
MML is a practical realisation of
Some have assumed that MML is the same as maximum aposterior inference (MAP)
but in general it is not.
And unlike the later minimum description length (MDL) principle,
MML favours explicit priors and fully parameterized models.
Key points are that every continuous (real, floating point) variable
has some limited measurement accuracy and
that every continuous parameter has some optimal limited precision to
which it should be inferred and stated.
A consequence is that even continuous data and continuous parameters
have non-zero probabilities (and hence finite message lengths),
not just probability densities, and
therefore Bayes's theorem still applies as is.
Interestingly, there are many cases where even a discrete parameter must
be estimated to less precision than its discreteness would seem to allow.
Some statistical models that have been MML-ed include:
- Binomial, 2-state.
- Multinomial, k-state.
- Normal (Gaussian).
- Linear regression.
- von Mises - Fisher (vMF) and von Mises distributions.
- Student's t-Distribution.
- Mixture models (clustering, classification).
- (Hidden) Markov models, PFSA.
- Classification- (decision-) trees and
- Regression- and Model-trees.
- Sequence segmentation.
- Megalithic stone circles!
- Bayesian networks.
- Supervised learning.
- Unsupervised learning.
- Trees and Graphs.
MML has theoretical support in the form of Kolmogorov complexity.
Strict MML (SMML) is a sort of MML "gold standard".
Unfortunately SMML is computationally intractible for all but simple problems
but, happily, accurate and feasible approximations to SMML exist.