^MML^ ^Information^

Examples K-L Distance

Code lengths for {A, C, G, T}

Jumping ahead a little, we would get these average code lengths,   SUMi{ pi.log(qi) },   for {A,C,G,T} for the following true and assumed probabilities of the bases:

{A, C, G, T} assumed probabilities
q = (1/4, 1/4, 1/4, 1/4) q' = (1/2, 1/4, 1/8, 1/8)
true
prob's
p  = (1/4, 1/4, 1/4, 1/4) 2 = 1/4*2+1/4*2+1/4*2+1/4*2 2 1/4 = 1/4*1+1/4*2+1/4*3+1/4*3
p' = (1/2, 1/4, 1/8, 1/8) 2 = 1/2*2+1/4*2+1/8*2+1/8*2 1 3/4 = 1/2*1+1/4*2+1/8*3+1/8*3

K-L Distance

It just happens that in the above example the KL distances between the two distributions, KL(fair->biased) = KL(biased->fair) = 1/4, are equal - in this case.

In general, and in the following example, the K-L distance is not symmetric:

{A, C, G, T} assumed probabilities
q = (1/2, 1/4, 1/8, 1/8) q' = (1/4, 1/8, 1/8, 1/2)
true
prob's
p  = (1/2, 1/4, 1/8, 1/8) 1 3/4 = 1/2*1+1/4*2+1/8*3+1/8*3 2 1/4 = 1/2*2+1/4*3+1/8*3+1/8*1
p' = (1/4, 1/8, 1/8, 1/2) 2 3/8 = 1/4*1+1/8*2+1/8*3+1/2*3 1 3/4 = 1/4*2+1/8*3+1/8*3+1/2*1

KL(p->p') = 1/2,  but  KL(p'->p) = 5/8.


© L. Allison 2000.
Created with "vi (Linux + IRIX)",   charset=iso-8859-1