von Mises - Fisher

von Mises - Fisher (vMF)

The von Mises - Fisher (vMF) distribution is a probability distribution on directions in R^D. It is natural to think of it as a distribution on the (D-1)-sphere of unit radius, that is on the surface of the D-ball of unit radius.

The von Mises - Fisher's probability density function is

pdf(v | μ, κ) = C_D e^κμ.v

where datum v is a normalised D-vector, equivalently a point on the (D-1)-sphere,

mu, μ, is the mean (a normalised vector), and

kappa, κ ≥ 0, is the concentration parameter (a scalar).

The distribution's normalising constant

C_D(κ) = κ^D/2-1 / {(2π)^D/2 I_D/2-1(κ)}; where I_order(.) is the "modified Bessel function of the first kind"!
In the special case that D = 3,: C₃(κ) = κ / {2π (e^κ - e^-κ)}

The negative log pdf is

- log pdf(v | μ, κ) = - log C_D - κ μ . v,

and

C_D

log C_D = (D/2-1)log κ - (D/2)log 2π - log I_D/2-1(κ).

Given data N {v₀, ..., v_N-1}, define their sum (a D-vector),

R = ∑_i=0..N-1 v_i,

and

Rbar = ||R|| / N.

- logLH The negative log likelihood is

- logLH = - N log C_D - κ μ . R.

It is obvious that the maximum likelihood estimate of μ is R normalised,

μ_ML = R / ||R||,

μ_MML and that the MML estimate is the same,

μ_MML = μ_ML = R / ||R||,

the most general prior for μ being the uniform distribution.

For given μ and κ, the expected value of Rbar equals

A_D(κ)

A_D(κ) = I_D/2(κ) / I_D/2-1(κ),

and the (less obvious) maximum likelihood estimate of κ is

κ_ML = A^-1(Rbar).

This is because

^∂/_∂κ - logLH = - N {^∂/_∂κ log C_D(κ)} - μ . R

which is zero if

- ^∂/_∂κ log C_D(κ) = μ . R / N,

where

^∂/_∂κ log C_D(κ)

= ω / κ - I'_ω(κ) / I_ω(κ), where ω = D/2 - 1

= ω {I_ω(κ) - ^κ/_ω I'_ω(κ)} / (κ I_ω(κ))

= ω {^κ/_2ω {I_ω-1(κ) - I_ω+1(κ)} - ^κ/_2ω {I_ω-1(κ) + I_ω+1(κ)}} / (κ I_ω(κ))

= - I_D/2(κ) / I_D/2-1(κ),

using the "well known" relations,

I_ν(z) = ^z/_2ν {I_ν-1(z) - I_ν+1(z)},

and

I'_ν(z) = ¹/₂ {I_ν-1(z) + I_ν+1(z)}, (I'₀(z) = I₁(z)).

The MML estimate, κ_MML, κ_MML is the value that minimises the two-part message length; no closed form is known for κ_MML. The message length calculations also require a choice of prior for κ, and the vMF's Fisher information, F.

The Fisher information of the vMF distribution.

The expected second derivative of - logLH w.r.t. κ is

^∂²/_∂κ² - logLH = N A'_D(κ).

The vMF distribution is symmetric about μ on the (D-1)-sphere; there is no preferred orientation around μ. A direction, such as μ, has D - 1 degrees of freedom. The expected 2nd derivative of - logLH w.r.t. any one of μ's degrees of freedom is

N κ A_D(κ).

This is for the following reason:: Without loss of generality, let μ = (1, 0, ...), and then μ → (cos δ, sin δ, 0, ...), say, where δ is small,; ^∂/_∂δ - logLH = N κ ||R|| sin δ,; ^∂²/_∂δ² - logLH = N κ ||R|| cos δ ≈ N κ ||R||, as δ is small; which is; N κ A_D(κ) in expectation.

Symmetry implies that the off-diagonal elements for μ are zero. And, μ is a position parameter and κ a scale parameter, so the off-diagonal elements between μ and κ are also zero.

F, the Fisher information of the vMF is therefore,

F = N^D (κ A_D(κ))^D-1 A'_D(κ).

Sources

Search for [vonMises direction] in the [Bib], and

see section 6.5, p.266 of Wallace's book (2005).

P. Kasarapu & L. Allison, Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions, Machine Learning (Springer Verlag), March 2015 [click].

The special case of the probability distribution where D = 2 is known as the von Mises distribution for directions in R², that is for angles and periodic quantitites such as annual events.