|
Abstract.
This paper investigates the coding of change-points in
the information-theoretic Minimum Message Length
(MML) framework.
Change-point coding regions affect model selection and
parameter estimation in problems such as time series segmentation and
decision trees. The Minimum Message Length (MML) and
Minimum Description Length (MDL78) approaches to change-point problems
have been shown to perform well by several authors.
In this paper we compare some published MML and MDL78 methods
and introduce some new MML approximations called `MMLDc' and `MMLDF'.
These new approximations are empirically compared with Strict MML (SMML),
Fairly Strict MML (FSMML), MML68, the Minimum Expected Kullback-Leibler
Distance (MEKLD) loss function and MDL78 on a tractable binomial
change-point problem.
References
- Wallace, C.S., Boulton, D.M.:
An information measure for classification.
Computer Journal 11 (1968) pp185-194
- Wallace, C.S., Freeman, P.R.:
Estimation and inference by compact encoding (with discussion).
Journal of the Royal Statistical Society. Series B (Methodological)
49 (1987) pp240-265
- Wallace, C.S., Dowe, D.L.:
Minimum message length and Kolmogorov complexity.
Computer Journal 42 (1999) pp270-283
- Baxter, R.A., Oliver, J.J.:
The kindest cut: minimum message length segmentation.
In Arikawa, S., Sharma, A.K., eds.: Proceedings of the Seventh
International Workshop on Algorithmic Learning Theory. Volume 1160 of LNCS.,
Springer-Verlag Berlin (1996) pp83-90
- Oliver, J.J., Forbes, C.S.:
Bayesian approaches to segmenting a simple time series.
Technical Report 97/336, Department Computer Science, Monash
University, Australia 3168 (1997)
- Oliver, J.J., Baxter, R.A., Wallace, C.S.:
Minimum message length segmentation.
In Wu, X., Kotagiri, R., Korb, K., eds.: Research and Development in
Knowledge Discovery and Data Mining (PAKDD-98), Springer (1998) pp83-90
- Viswanathan, M., Wallace, C.S., Dowe, D.L., Korb, K.:
Finding cutpoints in noisy binary sequences - a revised empirical
evaluation.
In: Australian Joint Conference on Artificial Intelligence. (1999)
- Fitzgibbon, L.J., Allison, L., Dowe, D.L.:
Minimum message length grouping of ordered data.
In Arimura, H., Jain, S., eds.: Proceedings of the Eleventh
International Conference on Algorithmic Learning Theory (ALT2000). LNAI,
Springer-Verlag Berlin (2000) pp56-70
- Farr, G.E., Wallace, C.S.:
Algorithmic and combinatorial problems in strict minimum message
length inference.
In: Research on Combinatorial Algorithms. (1997) pp50-58
- Dowe, D.L., Baxter, R.A., Oliver, J.J., Wallace, C.S.:
Point estimation using the Kullback-Leibler loss function and
MML.
In: Pacific-Asia Conference on Knowledge Discovery and Data Mining
(PAKDD98). Volume LNAI of 1394., Springer-Verlag (1998) pp87-95
- Baxter, R.A.:
Minimum Message Length Inductive Inference: Theory and Applications.
PhD thesis, Department of Computer Science, Monash University (1996)
- Wallace, C.S.:
PAKDD-98 Tutorial: Data Mining.
Monash University, Australia (Book in preparation) (1998)
- Fisher, W.D.:
On grouping for maximum homogeneity.
Journal of the American Statistical Society 53 (1958)
pp789-798
- Kearns, M., Mansour, Y., Ng, A.Y., Ron, D.:
An experimental and theoretical comparison of model selection
methods.
Machine Learning 27 (1997) pp7-50
- Lam, E.:
Improved approximations in MML.
Honours thesis, Monash University, School of Computer Science and
Software Engineering, Monash University, Clayton, Australia (2000)
- Rissanen, J.J.:
Modeling by shortest data description.
Automatica 14 (1978) pp465-471
|
|