
SNOB is written in Fortran 77, and more recently C,
and is available from:
[Snob][2003],
[Snob][2001].
(NB. So called "vanilla" Snob only implements
multi state and normal dist'ns as of '04.)

SNOB is
Chris Wallace's
computer program for unsupervised classification of multivariate data.
The classification problem is sometimes called
clustering, mixture modelling or numerical taxonomy.
SNOB uses the Minimum [MessageDescription] Length [Encoding]
(MMLMDL)
principle to decide upon the best classification of the data.
MML encoding is a realisation of
Ockham's razor.
SNOB is very efficient and can classify many tens of thousands
of "things" quickly,
where each thing can have tens of "attributes" (variables).
An attribute can be
continuous
(realvalued) or
discrete
(multistate).
A "class" is defined by distributions on one or more,
but not necessarily all, attributes.
The number of classes, the classes, their
defining attributes and distributions, and class memberships
are all inferred by SNOB.
Chris wrote a later, more powerful
version, 'Factor Snob',
which includes hierarchical classes, and factor analysis.
Selected Bibliography:
 C. S. Wallace.
Statistical and Inductive Inference by Minimum Message Length.
Springer, 2005.
 C. S. Wallace. Vanilla Snob. 2002
[www]['02].
 C. S. Wallace.
Classification by MinimumMessageLength Inference.
Advances in Computing and Information  ICCI '90.
Springer Verlag LNCS 468 pp.7281, 1990.
 C. S. Wallace & P. R. Freeman.
Estimation and Inference by Compact Encoding.
J. R. Stat. Soc. B 49 pp.240265, 1987.
 D. M. Boulton & C. S. Wallace.
An Information Measure for Hierarchic Classification.
Computer Journal 16(3) pp.254261, 1973.
 D. M. Boulton & C. S. Wallace.
A Program for Numerical Classification.
Computer Journal 13(1) pp.6369, 1970.
 C. S. Wallace & D. M. Boulton.
An Information Measure for Classification.
Computer Journal 11 pp.185195, 1968.
 L. Allison.
Models for machine learning and data mining in functional programming.
J. Functional Programming,
15(1), pp.1532, Jan. 2005.
 Includes the source code of the
core expectationmaximization (EM) algorithm for clustering.
©
L.A. / 1994,
1999, 2000, 2001, 2002, 2003, 2005, 2011

