LA home

Also see
SNOB is written in Fortran 77, and more recently C, and is available from:
[Snob][2003], [Snob][2001].
(NB. So called "vanilla" Snob implements multi- state and normal dist'ns as of '04.)

SNOB is Chris Wallace's computer program for unsupervised classification of multivariate data. The classification problem is sometimes called clustering, mixture modelling or numerical taxonomy. SNOB uses the Minimum [Message|Description] Length [Encoding] (MML|MDL) principle to decide upon the best classification of the data. MML encoding is a realisation of Ockham's razor.

SNOB is very efficient and can classify many tens of thousands of "things" quickly, where each thing can have tens of "attributes" (variables). An attribute can be continuous (real-valued) or discrete (multi-state).

A "class" is defined by distributions on one or more, but not necessarily all, attributes. The number of classes, the classes, their defining attributes and distributions, and class memberships are all inferred by SNOB.

Chris wrote a later, more powerful version, 'Factor Snob', which includes hierarchical classes, and factor analysis.

Selected Bibliography:

C. S. Wallace. Statistical and Inductive Inference by Minimum Message Length. Springer, 2005.
C. S. Wallace. Vanilla Snob. 2002 [www]['02].
C. S. Wallace. Classification by Minimum-Message-Length Inference. Advances in Computing and Information - ICCI '90. Springer Verlag LNCS 468 pp.72-81, 1990.
C. S. Wallace & P. R. Freeman. Estimation and Inference by Compact Encoding. J. R. Stat. Soc. B 49 pp.240-265, 1987.
D. M. Boulton & C. S. Wallace. An Information Measure for Hierarchic Classification. Computer Journal 16(3) pp.254-261, 1973.
D. M. Boulton & C. S. Wallace. A Program for Numerical Classification. Computer Journal 13(1) pp.63-69, 1970.
C. S. Wallace & D. M. Boulton. An Information Measure for Classification. Computer Journal 11 pp.185-195, 1968.
L. Allison. Models for machine learning and data mining in functional programming. J. Functional Programming, 15(1), pp.15-32, Jan. 2005.
Includes the source code of the core expectation-maximization (EM) algorithm for clustering.
© L.A. / 1994, 1999, 2000, 2001, 2002, 2003, 2005, 2011

© L. Allison   (or as otherwise indicated),
Created with "vi (Linux or Solaris)",  charset=iso-8859-1,  fetched Thursday, 25-May-2017 04:40:32 EDT.

free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop,
Firefox web-browser, FlashBlock flash on/off.