Integers

Integer Distributions

This section considers positive integers, n≥1, unless otherwise stated (if you have non-negative integers, n≥0, just add one before encoding and subtract one after decoding. If you also have negative integers interleave them 0, +1, -1, +2, -2, ... , say.) Positive integers form the first, and perhaps the most fundamental, infinite data-space. A code over positive integers can be used to transmit data from any enumerable data-space.

The Geometric and Poisson distributions are examples of parameterised probability distributions; the probability distributions below have no parameters.

1/(n(n+1))

A parameter-less probability distribution for positive integers:

pr(n) =

n(n+1)

n>0

This is a proper probability distribution:

∞

n=1

n(n+1)

∞

n=1

{

n+1

}

= 1 - 1/2 + 1/2 - 1/3 + 1/3 ...

= 1

So a non-redundant code can be based on it

msgLen(n) = log₂(n) + log₂(n+1), for n>0

The expectation of the distribution is infinite:

i≥1

{i/(i.(i+1))} =

i≥1

{1/(i+1)}

= ∞

So perhaps it is not a good distribution to use if you anticipate mostly small integers.

n	probability
1	1/2
2	1/6
3	1/12
4	1/20
...	...

1/n

Note that pr(n) ~ 1/n cannot be a proper probability distribution because

Σ_i>1 1/n = ∞

Therefore a code with message lengths ~ -log₂(1/n) = log₂(n) for all n≥1 is impossible. However, the log^* distribution (and code) gets close to 1/n.

Elias omega, log₂^* and Relatives

There is more on
Universal codes
[here].

If you know that an integer, n, lies in the interval [1,N] (or in [0,N-1]) then it can be encoded in log₂(N) bits, (and this is an optimal code if the probability distribution is uniform). What to do when there is no such bound N? Obviously transmit the length of the code-word for n first. But how to transmit the length? Transmit its length first, of course! A sound code can in fact be based on this intuitive idea; note that log^k(n) decreases very rapidly as k increases.

The leading bit of n is necessarily “1” so there is no need to transmit it, except that it can be used as a flag to determine whether the current value is a length or the final value of n proper; lengths are thus given a leading “0”. Such a prefix code can be used to code integers of arbitrary size. Unfortunately the length of a code-word as a function of n is neither convex nor smooth although it is monotonic increasing⁺.

n	components	code-word
1	1	1
2	2,2	00 10
3	2,3	00 11
4	2,3,4	00 01 100, e.g., ~ code'(3)++100
5	2,3,5	00 01 101
6	2,3,6	00 01 110
7	2,3,7	00 01 111
8	2,3,4,8	00 01 000 1000, e.g., ~ code'(4)++1000
9	2,3,4,9	00 01 000 1001
10	2,3,4,10	00 01 000 1010
...
15	2,3,4,15	00 01 000 1111
16	2,3,5,16	00 01 001 10000, e.g., ~ code'(5)++10000
...

Note, the spaces in the above code-words are only there to make the components stand out.
The code above is valid, but not at all efficient. In fact we can do better, that is, achieve a non-redundant code, by using lengths minus one:

n	components	code-word	prob
1	1	1	1/2
2	1,2	0 10	1/8
3	1,3	0 11	1/8
4	1,2,4	0 00 100, e.g., ~ code'(2)++100	1/64
5	1,2,5	0 00 101
6	1,2,6	0 00 110
7	1,2,7	0 00 111
8	1,3,8	0 01 1000, e.g., ~ code'(3)++1000	1/128
9	1,3,9	0 01 1001
10	1,3,10	0 01 1010
...
15	1,3,15	0 01 1111
16	1,2,4,16	0 00 000 10000, e.g., ~ code'(4)++100000	1/2048
...

msgLen(n) = 1 + floor(log₂ n) + msgLen(floor(log₂ n)), if n>1

= 1, if n=1

pr(n) = 2^-msgLen(n)

The probability distribution pr(n) has an infinite expectation: the probability of n is greater than under the 1/(n.(n+1)) distribution (which has an infinite expectation) for large n.

Use the HTML FORM below to encode an integer 'n':

Rissanen (1983) gives, r(n),

r(n) = log₂^*(n) + log₂(2.865)

where log₂^*(n) = log₂ n + log₂ log₂ n + ... NB. +ve terms only

pr(n) = 2^-r(n)

as a continuous approximation to code-word lengths and advocates 2^-r(n) as a "universal prior" for integers. The "2.865" is a normalisation constant to make the distribution (of the approximation) sum to 1.0. Note that r(n) is continuous but it (the area under the curve) is not convex being concave at n=2, 4, 16, 2¹⁶, ... etc. This can cause problems for some optimisation algorithms (Allison & Yee 1990).

Notes

P. Elias. Universal Codeword Sets and Representations for the Integers. IEEE Trans. Inform. Theory IT-21 pp.194-203, 1975: Introduced this kind of code for the integers and defined the notion of a universal code (– Farr 1999)
S. K. Leung-Yan-Cheong & T. M. Cover. Some Equivalences between Shannon Entropy and Kolmogorov Complexity. IEEE Trans. Inform. Theory IT-24 pp.331-338, 1978: Investigated this kind of code for the integers (– Farr 1999)
J. Rissanen. A Universal Prior for Integers and Estimation by Minimum Description Length. Annals of Statistics 11(2) pp.416-431, 1983.: Advocates the use of the log^* distribution and code.
⁺ L. Allison & C. N. Yee. Minimum Message Length Encoding and the Comparison of Macromolecules. Bull. Math. Biol. 52(3) pp.431-453, 1990.
G. Farr. Information Theory and MML Inference. School of Computer Science and Software Engineering, Monash University, 1999.: An excellent source on "universal" codes (and other things).

Wallace Tree Code

The definition of (strict) binary trees is:

A leaf is a binary tree.
If 'left' and 'right' are binary trees then <left, right> is a binary tree.

A code for binary trees (Wallace & Patrick 1993) follows their definition:

The code for a leaf is '0'.
The code for a fork, <left,right>, is '1 code(left) code(right)'.

e.g.

Code       Tree

0          [leaf]


100                <..>
                  .    .
                .        .
           [leaf]        [leaf]


10100              <..>
 ^^               .    .
 ||             .        .
 |Right    [leaf]        <..>
 |                      .    .
 Left                 .        .
                 [leaf]        [leaf]

Note, a code-word always contains one more zero than it has ones. This allows the end of a code-word to be detected. It also allows the word to be decoded. Note for example that '1' and '10' are not code-words.

The code is efficient in the sense that the sum over all asuch words, w, of 2^-|w| is one. The code is equivalent to giving a tree with code w a probability of 2^-|w|: This would be difficult to prove combinatorially, but consider an infinitely long random string over {0,1}. Now 0 (pr=0.5) and 1 (pr=0.5) can be taken as the steps in a random walk – 0 left and 1 right, say. It is well known that a one-dimensional random walk returns to the starting point with probability 1.0 (so it will also visit a point next to the start with probability 1.0). There will be a first point in the string at which there has been one more 0 than there have been 1s. Take the prefix up to and including that point as a code-word. Repeat. In this way code-words are generated, each such word w with probability 2^-|w|. Every possible code-word appears eventually. The sum of the probabilities of all possible code-words is one.

This can be used as the basis of algorthms to encode and decode positive integers: Enumerate code-words in order of increasing length and within that, for a given length, lexicographically say. Use the n-th code-word as the code-word for integer n:

The first code word of length 2k+3, k≥0, is 1(01)^k00 and the last is 1^k+10^k+2.

We see that the code-word lengths increase in smaller and more regular jumps (of two bits) than is the case for the Elias and log^* codes.

Notes

C. S. Wallace & J. D. Patrick. Coding Decision Trees. Machine Learning, 11, pp.7-22, 1993.: The tree-code is used to code the topology of classification trees (also known as [decision-trees]). This allows simple trees and complex trees to be compared fairly. Other information held in the nodes of the trees is also coded.
Also see integer codes in §2.1.14, §2.1.15 & §2.1.16 of CSW's book, 2005.
Also see L. Allison, Coding Ockham's Razor, Springer, 2018,: particularly chapter 3, Integers, doi:10.1007/978-3-319-76433-7_3
log^k(n) = log(log(...log(n))), k times.

n=	10	> > <<	word=	press the code button
	{0..9}⁺			{0,1}⁺

n=	10	> > <<	word=	press 'code'
	{0..9}⁺			{0,1}⁺

1/(n(n+1))

1/n

Elias omega, log2* and Relatives

Notes

Wallace Tree Code

Notes

Elias omega, log₂^* and Relatives