Integer Distributions

LA home
Computing
MML
 Glossary
 Discrete
  <m-state<
  Integers
   Geometric
   Poisson
   #1/(n(n+1))
   #1/n
   #omega (ω)
   #W Tree code
   Universal

  >Normal>

also see
Integers

This section considers positive integers, n≥1, unless otherwise stated (if you have non-negative integers, n≥0, just add one before encoding and subtract one after decoding. If you also have negative integers interleave them 0, +1, -1, +2, -2, ... , say.) Positive integers form the first, and perhaps the most fundamental, infinite data-space. A code over positive integers can be used to transmit data from any enumerable data-space.

The Geometric and Poisson distributions are examples of parameterised probability distributions; the probability distributions below have no parameters.

1/(n(n+1))

A parameter-less probability distribution for positive integers:

pr(n) =
1
n(n+1)
     n>0
This is a proper probability distribution:
Σ
n=1
1
n(n+1)
 = 
Σ
n=1
{
1
n
 - 
1
n+1
}  = 1 - 1/2 + 1/2 - 1/3 + 1/3 ...  = 1
So a non-redundant code can be based on it
msgLen(n) = log2(n) + log2(n+1), for n>0

The expectation of the distribution is infinite:

Σ
i≥1
{i/(i.(i+1))} = 

Σ
i≥1
{1/(i+1)}

 = ∞

So perhaps it is not a good distribution to use if you anticipate mostly small integers.

n probability
1 1/2
2 1/6
3 1/12
4 1/20
... ...

1/n

Note that   pr(n) ~ 1/n   cannot be a proper probability distribution because

Σi>1 1/n = ∞
Therefore a code with message lengths ~ -log2(1/n) = log2(n)   for all n≥1 is impossible. However, the log* distribution (and code) gets close to 1/n.

Elias omega, log2* and Relatives

There is more on
Universal codes
[here].

If you know that an integer, n, lies in the interval [1,N] (or in [0,N-1]) then it can be encoded in log2(N) bits, (and this is an optimal code if the probability distribution is uniform). What to do when there is no such bound N? Obviously transmit the length of the code-word for n first. But how to transmit the length? Transmit its length first, of course! A sound code can in fact be based on this intuitive idea; note that logk(n) decreases very rapidly as k increases.

The leading bit of n is necessarily “1” so there is no need to transmit it, except that it can be used as a flag to determine whether the current value is a length or the final value of n proper; lengths are thus given a leading “0”. Such a prefix code can be used to code integers of arbitrary size. Unfortunately the length of a code-word as a function of n is neither convex nor smooth although it is monotonic increasing+.

n components code-word
1 1 1
2 2,2 00 10
3 2,3 00 11
4 2,3,4 00 01 100, e.g., ~ code'(3)++100
5 2,3,5 00 01 101
6 2,3,6 00 01 110
7 2,3,7 00 01 111
8 2,3,4,8 00 01 000 1000, e.g., ~ code'(4)++1000
9 2,3,4,9 00 01 000 1001
10 2,3,4,10 00 01 000 1010
...
15 2,3,4,15 00 01 000 1111
16 2,3,5,16 00 01 001 10000, e.g., ~ code'(5)++10000
...
Note, the spaces in the above code-words are only there to make the components stand out.
The code above is valid, but not at all efficient. In fact we can do better, that is, achieve a non-redundant code, by using lengths minus one:
n components code-word prob
1 1 1 1/2
2 1,2 0 10 1/8
3 1,3 0 11
4 1,2,4 0 00 100, e.g., ~ code'(2)++100 1/64
5 1,2,5 0 00 101
6 1,2,6 0 00 110
7 1,2,7 0 00 111
8 1,3,8 0 01 1000, e.g., ~ code'(3)++1000 1/128
9 1,3,9 0 01 1001
10 1,3,10 0 01 1010
...
15 1,3,15 0 01 1111
16 1,2,4,16 0 00 000 10000, e.g., ~ code'(4)++100000 1/2048
...
msgLen(n) = 1 + floor(log2 n) + msgLen(floor(log2 n)),   if n>1
    = 1,   if n=1
pr(n) = 2-msgLen(n)

The probability distribution pr(n) has an infinite expectation: the probability of n is greater than under the 1/(n.(n+1)) distribution (which has an infinite expectation) for large n.

Use the HTML FORM below to encode an integer 'n':

n= > >
<<
word=
 {0..9}+  {0,1}+

Rissanen (1983) gives, r(n),

r(n) = log2*(n) + log2(2.865)
where log2*(n) = log2 n + log2 log2 n + ... NB. +ve terms only
pr(n) = 2-r(n)
as a continuous approximation to code-word lengths and advocates 2-r(n) as a "universal prior" for integers. The "2.865" is a normalisation constant to make the distribution (of the approximation) sum to 1.0. Note that r(n) is continuous but it (the area under the curve) is not convex being concave at n=2, 4, 16, 216, ... etc. This can cause problems for some optimisation algorithms (Allison & Yee 1990).

Notes

P. Elias. Universal Codeword Sets and Representations for the Integers. IEEE Trans. Inform. Theory IT-21 pp.194-203, 1975
Introduced this kind of code for the integers and defined the notion of a universal code (– Farr 1999)
S. K. Leung-Yan-Cheong & T. M. Cover. Some Equivalences between Shannon Entropy and Kolmogorov Complexity. IEEE Trans. Inform. Theory IT-24 pp.331-338, 1978
Investigated this kind of code for the integers (– Farr 1999)
J. Rissanen. A Universal Prior for Integers and Estimation by Minimum Description Length. Annals of Statistics 11(2) pp.416-431, 1983.
Advocates the use of the log* distribution and code.
+ L. Allison & C. N. Yee. Minimum Message Length Encoding and the Comparison of Macromolecules. Bull. Math. Biol. 52(3) pp.431-453, 1990.
G. Farr. Information Theory and MML Inference. School of Computer Science and Software Engineering, Monash University, 1999.
An excellent source on "universal" codes (and other things).

Wallace Tree Code

The definition of (strict) binary trees is:

  1. A leaf is a binary tree.
  2. If 'left' and 'right' are binary trees then <left, right> is a binary tree.
A code for binary trees (Wallace & Patrick 1993) follows their definition:
  1. The code for a leaf is '0'.
  2. The code for a fork, <left,right>, is '1 code(left) code(right)'.

e.g.
Code       Tree

0          [leaf]


100                <..>
                  .    .
                .        .
           [leaf]        [leaf]


10100              <..>
 ^^               .    .
 ||             .        .
 |Right    [leaf]        <..>
 |                      .    .
 Left                 .        .
                 [leaf]        [leaf]

Note, a code-word always contains one more zero than it has ones. This allows the end of a code-word to be detected. It also allows the word to be decoded. Note for example that '1' and '10' are not code-words.

The code is efficient in the sense that the sum over all asuch words, w, of 2-|w| is one. The code is equivalent to giving a tree with code w a probability of 2-|w|: This would be difficult to prove combinatorially, but consider an infinitely long random string over {0,1}. Now 0 (pr=0.5) and 1 (pr=0.5) can be taken as the steps in a random walk – 0 left and 1 right, say. It is well known that a one-dimensional random walk returns to the starting point with probability 1.0 (so it will also visit a point next to the start with probability 1.0). There will be a first point in the string at which there has been one more 0 than there have been 1s. Take the prefix up to and including that point as a code-word. Repeat. In this way code-words are generated, each such word w with probability 2-|w|. Every possible code-word appears eventually. The sum of the probabilities of all possible code-words is one.

This can be used as the basis of algorthms to encode and decode positive integers: Enumerate code-words in order of increasing length and within that, for a given length, lexicographically say. Use the n-th code-word as the code-word for integer n:

n= > >
<<
word=
 {0..9}+  {0,1}+

The first code word of length 2k+3, k≥0, is 1(01)k00 and the last is 1k+10k+2.

We see that the code-word lengths increase in smaller and more regular jumps (of two bits) than is the case for the Elias and log* codes.

Notes

C. S. Wallace & J. D. Patrick. Coding Decision Trees. Machine Learning, 11, pp.7-22, 1993.
The tree-code is used to code the topology of classification trees (also known as [decision-trees]). This allows simple trees and complex trees to be compared fairly. Other information held in the nodes of the trees is also coded.
Also see integer codes in §2.1.14, §2.1.15 & §2.1.16 of CSW's book, 2005.
Also see L. Allison, Coding Ockham's Razor, Springer, 2018,
particularly chapter 3, Integers, doi:10.1007/978-3-319-76433-7_3
logk(n) = log(log(...log(n))), k times.
www #ad:

↑ © L. Allison, www.allisons.org/ll/   (or as otherwise indicated).
Created with "vi (Linux)",  charset=iso-8859-1,   fetched Wednesday, 01-May-2024 21:43:06 UTC.

Free: Linux, Ubuntu operating-sys, OpenOffice office-suite, The GIMP ~photoshop, Firefox web-browser, FlashBlock flash on/off.