Ordered Discrete Data |
|
If the frequency counts for the categories are quite large, the data can be modelled successfully by a multi-state distribution. If the frequency counts are small, i.e. the amount of data is small for the number of categories, the cost of stating all the parameters of an unconstrained multi-state distribution may be prohibitive. This can cause a problem, e.g. in decision trees where the amount of data arriving at a particular leaf can be small. Best is to keep and make use of the fact that the data are ordered (class Ord in Haskell), discrete which can be used to advantage, e.g. in [partitioning] such data-spaces. PriorsThe key question is
what prior should be placed on distributions for ordered discrete data,
i.e. what kind of distribution should be favoured?
It is clear that, given sufficiently convincing data, either
UnimodalGiven M categories, the largest probability is say Ti, 1<=i<M. Given that, T1, ..., Ti-1 must be non-decreasing, and Ti+1, ..., TM must be non-increasing.
Message lengthA reasonable approach is to code the parameters of a unimodal distribution by the method used for the (unconstrained) multi-state distribution. There is some slight inefficiency if the probabilities of two adjacent categories are close in value with respect to their uncertainties. Estimator (search)There is unlikely to be a closed form for the MML estimator for a unimodal distribution. However a constrained search of the unimodal region should not be too difficult. A "smoothed" distribution derived from the obvious unconstrained M-state estimate may provide a good starting point. -- LA, CSW, 26/3/'02
The `uni1' code is based on counting
frequencies of letters, from the left,
while forcing the frequency counts to remain unimodal at all times.
The `uni2' code is better, based on the unordered MML code,
smoothed to make it unimodal if necessary.
|
|
↑ © L. Allison, www.allisons.org/ll/ (or as otherwise indicated). Created with "vi (Linux)", charset=iso-8859-1, fetched Wednesday, 24-Apr-2024 05:02:31 UTC. Free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop, Firefox web-browser, FlashBlock flash on/off. |