MML Glossary 

Bayes (17021761), as in Bayes's theorem P(H&D) = P(H).P(DH) = P(D).P(HD). Bayesian: Styles of inference (machine learning, statistics, etc.) that rely on Bayes's theorem and the use of priors. Classification: See supervised and unsupervised classification. Conditional Probability: P(BA), the probability of B given A. Conjugate prior: A family of prior distributions is conjugate for f(xθ) if the posterior distribution is in the family whenever the prior is in the family; Consistent: An estimator is consistent if it converges to the correct estimate (assuming that the model class includes the true model) as more and more data are made available. Data Mining: Machine learning + some aspects of data bases, with the emphasis on (very) large data sets and efficient and robust (and sometimes ad hoc) methods. (If you know of a better, short definition, tell me.) Data Space = Sample Space: Set of values from which data are drawn, e.g., Estimate: Thetahat, a value of a parameter, theta, inferred from (i.e., fitted to) data. Estimator: A function (mapping) from the dataspace to the space of parameter values. Expected Future Data: Weighted (by posterior probability) average over all hypotheses (models, parameter estimates). Fisher, R. A. (18901962). Independent: A and B are independent if P(A&B)=P(A).P(B). Invariant: An estimator is invariant if f'(e(D)) = e'(f(D)), where f is a monotonic transformation on the data space, and f' is the corresponding transformation on the parameter space. Joint Probability: E.g. P(A&B), the joint probability of A and B. See conditional and independent. Kullback Leibler distance: Between two probability distributions, KL({p_{i}},{q_{i}}) = Likelihood: P(DH), where D is the data set (training data), and H is a hypothesis (parameter estimate, model, theory). MAP: Maximum aposteriori estimation; in the simplest cases only, MML is equivalent to MAP, this is not true in general Markov Model of order k: Given a series x_{1}, x_{2}, x_{3},... where P(x_{t}=e) can depend on x_{tk} to x_{t1}, only. MDL: Minimum Description Length, since J.Rissannen, Parameter Estimation by Shortest Description of Data, Proc JACE Conf. RSME, pp.593, 1976. Also see MML below. Message Length: The length, usually in bits, of a message in an optimal code encoding some event (or data D). Often as twopart message, log_{2}(P(H))+log_{2}(P(DH). Message after Shannon's mathematical theory of communication (1948). Minimum EKL Estimator, MinEKL: The parameter estimate for a distribution (or model or hypothesis) that minimises the KL distance between the distribution and Expected Future Data, i.e., maximises the likelihood of Expected Future Data. Mixture Model: The weighted average of two of more models, especially mixture of probability distributions in unsupervised classification. MML: Minimum Message Length, since C.S.Wallace & D.M.Boulton, An Information Measure for Classification, Computer Jrnl., 11(2) pp.185194, 1968. Multivariate: Data, distribution etc. having multiple attributes (variables). Observation: A data item, e.g., from an experiment. Ockham: As in Ockham's razor. Also Occam. Odds ratio: Simply the ratio of two probabilities, P(A)/P(B). Also as in posterior oddsratio P(H_{1}D)/P(H_{2}D)=P(H_{1}).P(DH_{1})/(P(H_{2}).P(DH_{2})). Prior: Before, particularly "before actual data are seen", as in prior probability distribution of parameters and/or models, P(H). Posterior: After, particularly "after actual data are seen", as in posterior probability distribution of parameters and/or models, P(HD)=P(H&D)/P(D)=P(H).P(DH)/P(D). Regression: To model, fit or infer, but particularly to fit a function (line, polynomial, etc.) through points {(x_{i},y_{i})} where y is dependent on x. Sample Space: Space, set of values over which a random variable ranges. Strict MML (SMML): See Farr and Wallace (2002). Supervised Classification: To infer a function, c:S→T, a classification function, given examples (training data) drawn from S×T. Univariate: Data, distribution etc. having one attribute. Unsupervised Classification: To infer a mixture model from examples (data). Variable (1): Random variable. Variable (2): An attribute of an observation (thing), e.g., a column of a dataset. von Mises ( Fisher, vMF), probability distributions on directions in R^{D}. Wallace, C. S. (19332004). Some sources


