## On MML

 LA home Computing MML  Glossary  Notes   On MML   >paradigm>

The reasons why minimum message length (MML) inference works are quite elementary and were long hidden in plain sight(a), so the fact that it was not in use before 1968 is surprising, and that there is any debate about it is more surprising.

• Even a continuous attribute (variable) of a datum can only be measured to some limited accuracy, ±ε/2, ε > 0, so
• every datum that is possible under a model has a probability that is strictly > 0, not just a probability density (pdf).
• Any continuous parameter of a model (theory, hypothesis, ...) can only be inferred (estimated) to some limited precision(b), ±δ/2, δ > 0, so
• every parameter estimate that is possible under a prior has a probability that is strictly > 0, not just a probability density.
• So, Bayes's theorem can always be used(c), as is,
pr(H&D) = pr(H).pr(D|H) = pr(D).pr(H|D),  for hypothesis H and datum D.

However, that is not to say that to then go and make MML work in useful applications is easy, in fact it can be quite difficult. After the self evident observations above, a lot of hard work on efficient encodings, search algorithms, code books, invariance, Fisher information, fast approximations, robust heuristics, adaptions to specific problems, and all the rest, remained to be done. Fortunately, MML has been made to work in many general and useful applications including, but not limited to, these lists, , , & , and in other areas such as bioinformatics , say.

BTW, given Bayes,
pr(H&D) = pr(H).pr(D|H) = pr(D).pr(H|D),
pr(H|D) ~ pr(H) . pr(D|H),
it is sometimes claimed that MML inference is just MAP inference but, in general, this is not the case. MML requires that one sets not just the "height", pdf(H), but also chooses the set of distinguishable(d) hypotheses {H1, H2, ...} and the optimal precison ("width") of each one. (It could be argued that MML is MAP done properly.)

-- L.A., 9/2011.