## von Mises - Fisher (vMF)

 LA home Computing MML  Glossary  Continuous   Normal 1   Normal 2   t-distn   von Mises-Fisher   von Mises   Linear1   vMF also see  ML'15
The von Mises - Fisher (vMF) distribution is a probability distribution on directions in RD. It is natural to think of it as a distribution on the (D-1)-sphere of unit radius, that is on the surface of the D-ball of unit radius.

The von Mises - Fisher's probability density function is
pdf(v | μ, κ) = CD eκμ.v
where datum v is a normalised D-vector, equivalently a point on the (D-1)-sphere,
mu, μ, is the mean (a normalised vector), and
kappa, κ ≥ 0, is the concentration parameter (a scalar).
The distribution's normalising constant
CD(κ) = κD/2-1 / {(2π)D/2 ID/2-1(κ)}
where Iorder(.) is the "modified Bessel function of the first kind"!
In the special case that D = 3,
C3(κ) = κ / {2π (eκ - e)}

The negative log pdf is
- log pdf(v | μ, κ) = - log CD - κ μ . v,
and
 CD
log CD = (D/2-1)log κ - (D/2)log 2π - log ID/2-1(κ).

Given data
 N
{v0, ..., vN-1}, define their sum (a D-vector),
R = i=0..N-1 vi,
 R
and
Rbar = ||R|| / N.

 - logLH
The negative log likelihood is
- logLH = - N log CD - κ μ . R.
It is obvious that the maximum likelihood estimate of μ is R normalised,
μML = R / ||R||,
 μMML
and that the MML estimate is the same,
μMML = μML = R / ||R||,
the most general prior for μ being the uniform distribution.

For given μ and κ, the expected value of Rbar equals
and the (less obvious) maximum likelihood estimate of κ is
κML = A-1(Rbar).
This is because
/∂κ - logLH = - N {/∂κ log CD(κ)} - μ . R
which is zero if
- /∂κ log CD(κ) = μ . R / N,
where
/∂κ log CD(κ)
= ω / κ - I'ω(κ) / Iω(κ),     where ω = D/2 - 1
= ω {Iω(κ) - κ/ω I'ω(κ)} / (κ Iω(κ))
= ω {κ/ {Iω-1(κ) - Iω+1(κ)} - κ/ {Iω-1(κ) + Iω+1(κ)}} / (κ Iω(κ))
= - ID/2(κ) / ID/2-1(κ),
using the "well known" relations,
Iν(z) = z/ {Iν-1(z) - Iν+1(z)},
and
I'ν(z) = 1/2 {Iν-1(z) + Iν+1(z)},     (I'0(z) = I1(z)).

The MML estimate, κMML,
 κMML
is the value that minimises the two-part message length; no closed form is known for κMML. The message length calculations also require a choice of prior for κ, and the vMF's Fisher information, F.

The Fisher information of the vMF distribution.
The expected second derivative of - logLH w.r.t. κ is
2/∂κ2   - logLH = N A'D(κ).
The vMF distribution is symmetric about μ on the (D-1)-sphere; there is no preferred orientation around μ. A direction, such as μ, has D - 1 degrees of freedom. The expected 2nd derivative of - logLH w.r.t. any one of μ's degrees of freedom is
This is for the following reason:
Without loss of generality, let μ = (1, 0, ...), and then μ → (cos δ, sin δ, 0, ...), say, where δ is small,
/∂δ   - logLH = N κ ||R|| sin δ,
2/∂δ2   - logLH = N κ ||R|| cos δ   ≈ N κ ||R||, as δ is small
which is
Symmetry implies that the off-diagonal elements for μ are zero. And, μ is a position parameter and κ a scale parameter, so the off-diagonal elements between μ and κ are also zero.
F, the Fisher information of the vMF is therefore,
 F
F = ND (κ AD(κ))D-1 A'D(κ).

Sources
Search for [vonMises direction] in the [Bib], and
see section 6.5, p.266 of Wallace's book (2005).

P. Kasarapu & L. Allison, Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions, Machine Learning (Springer Verlag), March 2015 [click].

The special case of the probability distribution where D = 2 is known as the von Mises distribution for directions in R2, that is for angles and periodic quantitites such as annual events.