
Curve fitting:
Given a sequence of `n' points,
[<x_{0},y_{0}>, ...,
<x_{n1},y_{n1}>],
what is the optimal piecewise linear curve
that can be fitted to the points?
We want "optimal" to mean "most probable" given the data,
i.e. having the largest posterior probability.
We must therefore formulate a model for the data
and the noise / measurement inaccuracy in the data.
Polygon Fitting:
The sequence of points is considered to be closed, circular,
i.e.
<x_{n},y_{n}> =
<x_{0},y_{0}>,
and we fit a polygon to them.
The data might, for example, come from
digitising the boundary of an object in an image.
Error can be introduced by original photography,
scanning, identification of the boundary, and
sampling of points on the boundary.
Noise or measurement error in the data
is assumed to be due to some random process, but it can be modelled.
e.g. We might assume
that if a point "really" comes from a certain line segment
then the error perpendicular to the line is modelled by a
normal distribution,
say.
Hypothesis complexity should clearly be
taken into account because as the number of line segments
increases towards n1 it is possible to fit the data with zero error
and, without care, this could lead to overfitting.
Minimum
message length
(MML) inference takes the hypothesis, H, into account
with a twopart message:
msgLen(H&D) = msgLen(H) + msgLen(D  H).
A complex hypothesis with many segments might fit D very well,
i.e. P(D  H) is high and msgLen(D  H) is low,
but it must also pay for its own cost, i.e. msgLen(H).
Related problems are:
 Use of basis functions other than straight lines, e.g.
 Fourier components, as in Patrick (1979), Patrick & Wallace (1982).
 Pointsets: we are given an unordered set of points
to be described by one or more straight lines etc..
 Function approximation:
here x_{i} < x_{i+1},
and we usually consider the x_{i} to be exact
and only the y_{i} to contain possible noise or error.
e.g.
 Segmentation: fitting piecewise constant functions
 Approximation by piecewise straight lines or
other basis curves etc.
MML / MDL:
Patrick (1979) used Fourier components to describe megalithic stone circles.
He used MML to come to the conclusion that
they really are rough circles rather than
more elaborate geometries that had been proposed,
and still are proposed regularly.
Georgeff and Wallace (1983, 1984)
described a minimum message length criterion
for fitting one or more (straight) line segments
through a set of data points.
Banerjee et al (1996) gave an MDL method for polygon fitting
and the Java applet below implements their algorithm.
They made the simplifying assumptions that
(i) the polygon's vertices (knots) are
a subset of the given data points,
(ii) the first data point is a vertex,
and thereby achieved an O(n^{2})time algorithm.
 L. Allison
Notes
 W. D. Fisher.
On Grouping for Maximum Homegeneity.
Jrnl. Am. Stat. Soc. 53 pp.789798, 1958
Gave an O(s.n^{2})time algorithm for
finding the optimum (minimum sum of squared errors) segmentations
of a sequence of n points into 1, 2, ..., s segments.
Did not have a criterion for stopping,
i.e. for choosing the best `s'.
 J. D. Patrick.
An Information Measure Comparative Analysis of Megalithic Geometries.
(i.e. stonecircles) Ph.D Thesis,
Monash University, 1979
It is a good read. See also ...
J. D. Patrick & C. S. Wallace.
Stone Circle Geometries: An Information Theory Approach.
In Archaeoastronomy in the Old World,
D. Heggie (ed), C.U.P., 1982
 C. S. Wallace & M. P. Georgeff.
A General Objective for Inductive Inference.
TR#32,
Dept. Computer Science, Monash University, Australia 3800,
March 1983.
[HTML]
Fuller version of Georgeff and Wallace (1984).
Includes PFSA's.
 M. P. Georgeff & C. S. Wallace.
A General Selection Criterion for Inductive Inference.
European Conf. on Artificial Intelligence, Pisa, pp.473482, Sept. 1984
Presented minimum message/description length
(MML  MDL) methods for the inference of
probabilistic finite state automata (PFSA),
fitting curves through points, and
fitting straight lines through points.
See also ...
 M. P. Georgeff & C. S. Wallace.
A General Selection Criterion for Inductive Inference.
TR#44, Dept. Computer Science (later
School of Comp. Sci. and Software Eng.),
Monash University, Australia 3800, June 1984
 S. Itoh.
An Algorithm for the Piecewise Linear Approximation of Planar Curves.
Proc. IEEE Symp. on Information Theory (ISIT), pp.62, 1990
 S. Banerjee, W. Niblack & M. Flickner.
A Minimum Description Length Polygon Approximation method.
IBM Research report RJ 10007 (89096) pp.19 Dec' 1996
The authors credit Itoh's work (1990) as
being "of particular relevance" to their paper.
 L. J. Fitzgibbon, L. Allison & D. L. Dowe.
Minimum message length grouping of ordered data.
Proc. 11th Int. Workshop on Algorithmic Learning Theory,
H. Arimura, S. Jain & A. Sharma (eds), pp.5670, Sydney,
Springer Verlag, LNCS, Dec' 2000.
Uses an MML cost criterion to give a stopping
condition to Fisher's (1958) algorithm and hence find an
optimal segmentation of multivariate data.
Means and variances can differ from segment to segment.
(NB. Needs Java on)
Draw shapes in the window (hold the left mouse button down).
The faster and slower buttons vary the sampling rate.
Closed or open curves can be drawn.
The MDL button fits a polygon through the last shape drawn
using the MDL fitting method.
© 1996, 1999

[example].

