
 The von Mises  Fisher (vMF) distribution is a probability distribution
on directions in R^{D}.
It is natural to think of it as a distribution on the
(D1)sphere of unit radius,
that is on the surface of the Dball of unit radius.

 The von Mises  Fisher's probability density function is
 pdf(v  μ, κ) = C_{D} e^{κμ.v}
 where datum v is a normalised Dvector,
equivalently a point on the (D1)sphere,
 mu, μ, is the mean (a normalised vector), and
 kappa, κ ≥ 0, is the concentration parameter (a scalar).
 The distribution's normalising constant
 C_{D}(κ) =
κ^{D/21}
/ {(2π)^{D/2} I_{D/21}(κ)}
 where I_{order}(.) is the
"modified Bessel function of the first kind"!
 In the special case that D = 3,
 C_{3}(κ) = κ
/ {2π (e^{κ}  e^{κ})}

 The negative log pdf is
  log pdf(v  μ, κ)
=  log C_{D}  κ μ . v,
 and

log C_{D} = (D/21)log κ
 (D/2)log 2π
 log I_{D/21}(κ).

 Given data
{v_{0}, ..., v_{N1}}, define their sum
(a Dvector),
 R = ∑_{i=0..N1} v_{i},
 and
 Rbar = R / N.


The negative log likelihood is
  logLH
=  N log C_{D}  κ μ . R.
 It is obvious that the maximum likelihood estimate of μ is
R normalised,
 μ_{ML} = R / R,

and that the MML estimate is the same,
 μ_{MML}
= μ_{ML}
= R / R,
 the most general prior for μ being the uniform distribution.

 For given μ and κ,
the expected value of Rbar equals

A_{D}(κ) =
I_{D/2}(κ) / I_{D/21}(κ),
 and the (less obvious) maximum likelihood estimate of κ is
 κ_{ML} = A^{1}(Rbar).
 This is because
 ^{∂}/_{∂κ}  logLH =
 N {^{∂}/_{∂κ}
log C_{D}(κ)}
 μ . R
 which is zero if
  ^{∂}/_{∂κ}
log C_{D}(κ) = μ . R / N,
 where
 ^{∂}/_{∂κ} log C_{D}(κ)
 = ω / κ
 I'_{ω}(κ)
/ I_{ω}(κ),
where ω = D/2  1
 = ω
{I_{ω}(κ)
 ^{κ}/_{ω}
I'_{ω}(κ)}
/ (κ I_{ω}(κ))
 = ω
{^{κ}/_{2ω}
{I_{ω1}(κ)
 I_{ω+1}(κ)}
 ^{κ}/_{2ω}
{I_{ω1}(κ)
+ I_{ω+1}(κ)}}
/ (κ I_{ω}(κ))
 =  I_{D/2}(κ) / I_{D/21}(κ),
 using the "well known" relations,
 I_{ν}(z)
= ^{z}/_{2ν}
{I_{ν1}(z)  I_{ν+1}(z)},
 and
 I'_{ν}(z)
= ^{1}/_{2}
{I_{ν1}(z) + I_{ν+1}(z)},
(I'_{0}(z)
= I_{1}(z)).

 The MML estimate, κ_{MML},
is the value that minimises the twopart
message length;
no closed form is known for κ_{MML}.
The message length calculations also require
a choice of prior for κ, and
the vMF's Fisher information, F.

 The Fisher information of the vMF distribution.
 The expected second derivative of
 logLH w.r.t. κ is
 ^{∂2}/_{∂κ2}
 logLH
= N A'_{D}(κ).
 The vMF distribution is symmetric about μ on the (D1)sphere;
there is no preferred orientation around μ.
A direction, such as μ, has D  1 degrees of freedom.
The expected 2nd derivative of  logLH w.r.t. any one of
μ's degrees of freedom is
 N κ A_{D}(κ).
 This is for the following reason:
 Without loss of generality, let
μ = (1, 0, ...), and then
μ → (cos δ, sin δ, 0, ...), say,
where δ is small,
 ^{∂}/_{∂δ}
 logLH = N κ R sin δ,
 ^{∂2}/_{∂δ2}
 logLH
= N κ R cos δ
≈ N κ R, as δ is small
 which is
 N κ A_{D}(κ) in expectation.
 Symmetry implies that the offdiagonal elements for μ are zero.
And, μ is a position parameter and κ a scale parameter,
so the offdiagonal elements between μ and κ are also zero.
 F, the Fisher information of the vMF is therefore,

F =
N^{D}
(κ A_{D}(κ))^{D1}
A'_{D}(κ).

 Sources
 Search for [vonMises direction] in the
[Bib], and
 see section 6.5, p.266 of
Wallace's
book (2005).

 P. Kasarapu & L. Allison,
Minimum message length estimation of mixtures of multivariate Gaussian
and von MisesFisher distributions,
Machine Learning (Springer Verlag),
March 2015
[click].

 The special case of the probability distribution
where D = 2 is known as the
von Mises
distribution for directions in R^{2},
that is for angles and periodic quantitites such as annual events.

