
 The adaptive code for a multinomial:
Given a sequence of N data,
[x_{1}, ..., x_{N}],
each datum having one of k possible values,
{v_{1}, ..., v_{k}},
keep a running counter for each of the k possible values, e.g.,
two counters for {H,T},
four counters for
DNA, etc..
Initialise each counter to 1, not 0.
Process the sequence from left to right and
at each step, estimate the probability of the current datum's value
as the datum's value's counter divided by the sum of all the counters;
only then update the datum's value's counter,
not before^{[a]}.
A binomial (k=2) example:


data→  H   T   T   H   T   T   (N=6) 
counters 
1  
2  
2  
2  
3  
3    
1  
1  
2  
3  
3  
4  
 
sum 
2  
3  
4  
5  
6  
7  
 
pr 
 1 2 
×  1 3 
×  2 4 
×  2 5 
×  3 6 
×  4 7 
=  #H!.#T! (N+1)! 

 In general
 pr([x_{1}, ..., x_{N}])
= { ∏_{i=1..k} #v_{i}! }
/ { (N+k1)! / (k1)! },
 where #v_{i} is the number of v_{i} in the sequence.

 It can be seen that every permutation of the sequence has
the same probability, which is good.

 It can be shown that
the 1part adaptive and "combinatorial" codes give exactly the
same message length, and that
the 2part MML code's
message is longer,
but only by a fraction of a bit for each of
the multinomial's k1 parameters;
MML is also the only one of the three codes to state a model, and
there ain't no free lunch.

 A convenient programming trick for
calculating the length of the
first part (model complexity) of the 2part
MML message is
to work from the adaptive code's message length, plus
MML's small overall length increment, less
the length of the MML message's second (datamodel) part.

 ^{[a]}Note that both the
transmitter and the receiver can keep identical counters in
synchronisation and thus the receiver can decode the message.

