## Normal, Gaussian

 LA home Computing MML  Glossary  KL   M-state   Normal   Normal Also see:   N(,)

### KL-distance from Nμ1,σ1to Nμ2,σ2

(Also known as KL-divergence.)
The general form is

x { pdf1(x).{ log(pdf1(x)) - log(pdf2(x)) }}

we have two normals so pdf1(x) is Nμ11(x), etc..

= x Nμ11(x).{ log(Nμ11(x)) - log(Nμ22(x)) }

= x Nμ11(x).{ (1/2)( - ((x-μ1)/σ1)2 + ((x-μ2)/σ2)2 ) + ln(σ21) }

can replace x with x+μ1. The expected value of x2 is σ12. Terms that are odd in x, and otherwise symmetric about zero, cancel out over [-∞,∞] leaving the ...x2 and ...constant terms.

= (1/2){ - (σ11)2 + (σ12)2 + ((μ12)/σ2)2 } + ln(σ21)

= { (μ1 - μ2)2 + σ12 - σ22 } / (2.σ22) + ln(σ21)

This is zero if μ12 and σ12. It obviously increases with |μ12| and has rather complex behaviour with σ1 and σ2  (and is consistent P&R, and with J&S where σ12).
KL(N(μqq) || N(μpp)), p.18 of Penny & Roberts, PARG-00-12, 2000.
KL(N(μ1,σ), N(μ2,σ)) = (μ12)2/(2σ2), Johnson & Sinanovic, NB. a common σ [pdf].

Note that the distance is convenient to integrate over, say, a range of μ1 & σ1:
 ∫ μ1max ∫ σ1max μ1min σ1min
 (μ1 - μ2)2 2σ22
+ ln σ2 - 1/2  +
 σ12 2σ22
- ln σ1

NB. no σ1 here ...

... & no μ1

let  f(μ1) =
 (μ1 - μ2)3 6σ22
+ μ1 . (ln σ2 - 1/2)
and  g(σ1) =
 σ13 6σ22
- σ1 . (ln σ1 - 1)

= (f(μ1max) - f(μ1min)) . (σ1max - σ1min) + (μ1max - μ1min) . (g(σ1max) - g(σ1min))