MML Glossary |
|
Bayes (1702-1761), as in Bayes's theorem P(H&D) = P(H).P(D|H) = P(D).P(H|D). Bayesian: Styles of inference (machine learning, statistics, etc.) that rely on Bayes's theorem and the use of priors. Classification: See supervised and unsupervised classification. Conditional Probability: P(B|A), the probability of B given A. Conjugate prior: A family of prior distributions is conjugate for f(x|θ) if the posterior distribution is in the family whenever the prior is in the family; Consistent: An estimator is consistent if it converges to the correct estimate (assuming that the model class includes the true model) as more and more data are made available. Data Mining: Machine learning + some aspects of data bases, with the emphasis on (very) large data sets and efficient and robust (and sometimes ad hoc) methods. (If you know of a better, short definition, tell me.) Data Space = Sample Space: Set of values from which data are drawn, e.g., Estimate: Theta-hat, a value of a parameter, theta, inferred from (i.e., fitted to) data. Estimator: A function (mapping) from the data-space to the space of parameter values. Expected Future Data: Weighted (by posterior probability) average over all hypotheses (models, parameter estimates). Fisher, R. A. (1890-1962). Independent: A and B are independent if P(A&B)=P(A).P(B). Invariant: An estimator is invariant if f'(e(D)) = e'(f(D)), where f is a monotonic transformation on the data space, and f' is the corresponding transformation on the parameter space. Joint Probability: E.g. P(A&B), the joint probability of A and B. See conditional and independent. Kullback Leibler distance: Between two probability distributions, KL({pi},{qi}) = Likelihood: P(D|H), where D is the data set (training data), and H is a hypothesis (parameter estimate, model, theory). MAP: Maximum aposteriori estimation; in the simplest cases only, MML is equivalent to MAP, this is not true in general Markov Model of order k: Given a series x1, x2, x3,... where P(xt=e) can depend on xt-k to xt-1, only. MDL: Minimum Description Length, since J.Rissannen, Parameter Estimation by Shortest Description of Data, Proc JACE Conf. RSME, pp.593-, 1976. Also see MML below. Message Length: The length, usually in bits, of a message in an optimal code encoding some event (or data D). Often as two-part message, -log2(P(H))+-log2(P(D|H). Message after Shannon's mathematical theory of communication (1948). Minimum EKL Estimator, MinEKL: The parameter estimate for a distribution (or model or hypothesis) that minimises the KL distance between the distribution and Expected Future Data, i.e., maximises the likelihood of Expected Future Data. Mixture Model: The weighted average of two of more models, especially mixture of probability distributions in unsupervised classification. MML: Minimum Message Length, since C.S.Wallace & D.M.Boulton, An Information Measure for Classification, Computer Jrnl., 11(2) pp.185-194, 1968. Multivariate: Data, distribution etc. having multiple attributes (variables). Observation: A data item, e.g., from an experiment. Ockham: As in Ockham's razor. Also Occam. Odds ratio: Simply the ratio of two probabilities, P(A)/P(B). Also as in posterior odds-ratio P(H1|D)/P(H2|D)=P(H1).P(D|H1)/(P(H2).P(D|H2)). Prior: Before, particularly "before actual data are seen", as in prior probability distribution of parameters and/or models, P(H). Posterior: After, particularly "after actual data are seen", as in posterior probability distribution of parameters and/or models, P(H|D)=P(H&D)/P(D)=P(H).P(D|H)/P(D). Regression: To model, fit or infer, but particularly to fit a function (line, polynomial, etc.) through points {(xi,yi)} where y is dependent on x. Sample Space: Space, set of values over which a random variable ranges. Strict MML (SMML): See Farr and Wallace (2002). Supervised Classification: To infer a function, c:S→T, a classification function, given examples (training data) drawn from S×T. Univariate: Data, distribution etc. having one attribute. Unsupervised Classification: To infer a mixture model from examples (data). Variable (1): Random variable. Variable (2): An attribute of an observation (thing), e.g., a column of a data-set. von Mises (- Fisher, vMF), probability distributions on directions in RD. Wallace, C. S. (1933-2004). Some sources
![]() |
|
↑ © L. Allison, www.allisons.org/ll/ (or as otherwise indicated). Created with "vi (Linux)", charset=iso-8859-1, fetched Monday, 02-Oct-2023 20:18:51 UTC. Free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop, Firefox web-browser, FlashBlock flash on/off. |