Multistate

Multistate and Multinomial Distributions

The MML estimator for an M-state distribution gives θ_i=(n_i+1/2)/(N+M/2), where n_i is the number of observations of state_i, during a total of N observations, N=∑_i=1..M {n_i}.

In general, the uncertainty region for the MML estimate for k-parameters θ = ⟨θ₁,θ₂, ...,θ_k⟩, is approximately √(12^k/F(θ)), where F(θ) is the [Fisher information]. Note that k=M-1 for the multi-state distribution.

3-States, M=3

A 3-state source has two parameters, θ₁ and θ₂ in [0,1]; i.e., θ=⟨θ₁,θ₂⟩. It is convenient to define θ₃, also in [0,1], where θ₃=1-θ₁-θ₂, but θ₃ is not a third (free) parameter. We observe n₁ occurrences of state₁, n₂ of state₂ and n₃ of state₃ where N=n₁+n₂+n₃. The likelihood, LH = θ₁^n₁.θ₂^n₂.θ₃^n₃, so ...

-log LH = -n₁ log θ₁ - n₂ log θ₂ - n₃ log(1-θ₁-θ₂)

d/d θ₁ {-log LH} = -n₁/θ₁ + n₃/(1-θ₁-θ₂)

d/d θ₂ {-log LH} = -n₂/θ₂ + n₃/(1-θ₁-θ₂)

d²/d θ₁² {-log LH} = n₁/θ₁² + n₃/(1-θ₁-θ₂)²

d²/d θ₂² {-log LH} = n₁/θ₂² + n₃/(1-θ₁-θ₂)²

d²/d θ₁ d θ₂ {-log LH} = n₃/(1-θ₁-θ₂)² = d²/d θ₂ d θ₁ {-log LH}

The expection of n₁ over the data space is N.θ₁, similarly for n₂ and n₃, so the Fisher is ...

`\| \| \| \|`	N/θ₁+N/θ₃	N/θ₃	`\| \| \| \|`
	N/θ₃	N/θ₂+N/θ₃

N²

θ₃²

| | | |

(1-θ₂)/θ₁

| | | |

1 (1-θ₁)/θ₂

=	N²	. {	(1-θ₂)	.	(1-θ₁)	- 1 }
	θ₃²		θ₁		θ₂

= (N²/θ₃²).((1-θ₁-θ₂)/(θ₁ θ₂) = N²/(θ₁.θ₂.θ₃)

M-States

It can be shown that for M-states, i.e., M-1 parameters, and probabilities θ₁, θ₂, ..., θ_M-1, and θ_M = 1-θ₁-...-θ_M-1, θ = ⟨θ₁,...,θ_M-1⟩, that F(θ) = N^M-1 / (θ₁.θ₂...θ_M).

Demonstration

Use the HTML FORM below to generate a data sample for specified probabilities and length N. The 'code' button calculates message lengths for various codes. Note, the appoximations used may break down for very small values of N.

Requirements: 1 ≤ M ≤ 10, |θ_[1,M]| > 0 (will be normalised), N ≥ 0, and sample ∈ [0, M-1]^N.
M= θ= N=

Notes

C. S. Wallace & D. M. Boulton. An Information Measure for Classification. Computer Journal 11(2) pp185-194, Aug 1968
(see the appendix) and ...
D. M. Boulton & C. S. Wallace. The Information Content of a Multistate Distribution. J. Theor. Biol. 23 pp269-278, 1969.
When these papers were written there were different notions of the information content of a sequence in use in the literature. W&B showed that if the calculations were done correctly and if all information was truly taken into account then these notions gave essentially the same answer.

And a delightful piece of trivia about dice via Dean McKenzie [7/1999]:: 'Several decades ago, the Harvard statistician Frederick Mosteller had an opportunity to test the [dice-tossing] model against the behavior of real dice tossed by a real person. A man named Willard H. Longcor. who had an obsession with throwing dice, came to him with an amazing offer to record the results of millions of tosses. Mosteller accepted, and some time later he received a large crate of big manilla envelopes, each of which contained the results of twenty thousand tosses with a single die and a written summary showing how many runs of different kinds had occurred. "The only way to check the work was by checking the runs and then comparing the results with theory," Mosteller recalls. "It turned out [Longcor] was very accurate" Indeed, the results even highlighted some errors in the then-standard theory of the distribution of runs'.
- Peterson, I. (1998) The Jungles of Randomness. Penguin, London. pp.7-8. (originally published 1998 by Wiley, New York).