## t Distribution

 LA home Computing MML  Glossary  Continuous   Normal 1   Normal 2   t-distn   von Mises-Fisher   von Mises   Linear1   t
Probability density function

 f(x | μ, σ, ν) = Γ ((ν+1)/2) [ 1 + (x - μ)2 ] -(ν+1)/2 √(πν) Γ(ν/2)  σ ν σ2

 Γx is the gamma fn, x>0. For int n, Γn = (n-1)! Γn = (n-1).Γ(n-1)
-∞ < x < ∞,   -∞ < μ < ∞,   σ > 0,   ν > 0.
Mean undefined if ν ≤ 1.
Variance = σ2 ν / (ν - 2),   if ν>2, else not defined. This tends to σ2 as ν tends to ∞.

ν is the `degrees of freedom' or the `shape parameter'.
If ν=1 the t-distribution is a Cauchy distribution.
As ν → ∞ the t tends to the normal distribution N(μ,σ); if ν≥30 it is very close to the normal.

¿If ν data are drawn from a normal distribution of unknown σ, N(0,σ), the posterior distribution of the next datum is an infinite weighted-mixture of normal distributions, which is equivalent to a t-distribution with μ=0 and variance scaled by σ2? (There is a little "problem" until you have drawn at least three values (to get the shape), so choosing them amounts to setting the prior.) Discovered by W.S.Gosset c1908 writing under the name Student.

Note that we can slightly rearrange f( ) to
 f(x | μ, σ, ν) = Γ((ν+1)/2) νν/2 σν √π Γ(ν/2)) {νσ2 + (x - μ)2}(ν+1)/2

Three expectations are useful later
e1 = Ex{ 1 / (νσ2 + (x-μ)2) }
e2 = Ex{ 1 / (νσ2 + (x-μ)2)2 }
e3 = Ex{ (x-μ)2 / (νσ2 + (x-μ)2)2 }

[Now], -∞+∞ 1 / (a+x2)k = √π.Γ(k - 1/2) / {ak-1/2.Γk}    (thanks DS)
so
-∞+∞ 1 / (νσ2 + (x-μ)2)(ν+3)/2,   (use, a=νσ2, k=(ν+3)/2)
= √π.Γ(ν/2+1) / {(νσ2)ν/2+1 Γ((ν+1)/2+1)}
so
e1 = {Γ((ν+1)/2).νν/2ν / (√π.Γ(ν/2))} . {√π.Γ(ν/2+1) / ( νν/2+1ν+2.Γ((ν+1)/2+1) )}
= {√π.ν.νν/2ν} / {√π.(ν+1).νν/2+1ν+2}
 e1
= 1 / ((ν+1).σ2)
Similarly
-∞+∞ 1 / (νσ2 + (x-μ)2)(ν+5)/2,   (use a=νσ2, k=(ν+5)/2)
= √π.Γ(ν/2+2 ) / {(νσ2)ν/2+2 Γ((ν+1)/2+2)}
so
e2 = {ν.(ν+2).νν/2ν} / {(ν+1).(ν+3).νν/2+2ν+4}
 e2
= (ν+2) / {ν.(ν+1).(ν+3).σ4}

[Now], -∞+∞ x2 / (a+x2)k = √π.Γ(k-3/2) / {2.ak-3/2.Γk}
so
-∞+∞ (x-μ)2 / {νσ2 + (x-μ)2}(ν+5)/2,   (use a=νσ2, k=(ν+5)/2)
= √π.Γ(ν/2+1) / {2.(νσ2)ν/2+1.Γ((ν+1)/2+2)}
so
 e3
e3 = 1 / {(ν+1).(ν+3).σ2}

Given n continuous-valued data x1, x2, .., xn, the negative log likelihood,
L = n { (1/2)log(πν) + log(Γ(ν/2)) - log(Γ((ν+1)/2)) + logσ } + ((ν+1)/2) i log(1 + (xi-μ)2/νσ2)
= n { (1/2)log π + log(Γ(ν/2)) - log(Γ((ν+1)/2)) - (ν/2) log ν - ν log σ } + ((ν+1)/2) i log(νσ2 + (xi-μ)2)

1st derivatives of L

d L / d μ
= - (ν+1) i{ (xi-μ) / (νσ2 + (xi-μ)2) }
d L / d σ
= - nν/σ + ν(ν+1)σ i{ 1 / (νσ2 + (xi-μ)2) }

 digamma ψx = d/dx log(Γx) = Γ'x/Γx, and ψ1x = d/dx ψ(x)
d L / d ν
= n { (1/2)ψ(ν/2) - (1/2)ψ((ν+1)/2) - 1/2 - (1/2) log ν - log σ } + (1/2) i log(νσ2 + (xi-μ)2) + ((ν+1)σ2/2) i { 1 / (νσ2 + (xi-μ)2) }

2nd derivatives

d2 L / d μ2
= (ν+1) i { 1 / (νσ2 + (xi-μ)2) - 2 (xi-μ)2 / (νσ2 + (xi-μ)2)2 }
using the results for e1, e2 & e3, above, expectation
= n (ν+1){e1 - 2.e3}
= n (ν+1){ 1/((ν+1)σ2) - 2 / ((ν+1).(ν+3).σ2) }
= n.{ 1 - 2/(ν+3) } / σ2
= n.(ν+1) / {(ν+3).σ2}

d2 L / d σ2
= nν/σ2 + ν(ν+1) i { 1 / (νσ2 + (xi-μ)2) - 2νσ2 / (νσ2 + (xi-μ)2)2 }
expectation
= nν/σ2 + n.ν(ν+1){e1 - 2νσ2e2}
= ... + n.ν(ν+1){ 1/((ν+1)σ2) - 2νσ2(ν+2)/{(ν+1)(ν+3)νσ4} }
= ... + n.ν(ν+3 - 2(ν+2)) / ((ν+3)σ2)
= nν/σ2 - n.ν(ν+1) / ((ν+3)σ2)
= 2.n.ν / ((ν+3)σ2)

Sanity check: When ν is large (30+), the t tends to N(μ,σ) and the product of the expected 2nd derivatives w.r.t. μ and σ tends to 2n24 which is the [normal's Fisher], when that is done w.r.t. μ and σ.

d2 L / d ν2
= n{ (1/4)ψ1(ν/2) - (1/4)ψ1((ν+1)/2) - 1/(2ν) } + (σ2/2) ∑i{ 1 / (νσ2+(xi-μ)2) } + (σ2/2) ∑i{ 1 / (νσ2+(xi-μ)2) } - ((ν+1)σ4/2) ∑i{ 1 / (νσ2+(xi-μ)2)2 }
= n{ (1/4)ψ1(ν/2) - (1/4)ψ1((ν+1)/2) - 1/(2ν) } + σ2i{ 1 / (νσ2+(xi-μ)2) } - ((ν+1)σ4/2) ∑i{ 1 / (νσ2+(xi-μ)2)2 }
expectation
= n{ (1/4)ψ1(ν/2) - (1/4)ψ1((ν+1)/2) - 1/(2ν) + σ2e1 - ((ν+1)σ4/2)e2 }
= n.{ ...ψ1...   - 1/ν + 2σ2/((ν+1)σ2) - (ν+1)σ4(ν+2) / {(ν+1)(ν+3)νσ4} } / 2
= n.{ ...ψ1...   - 1 + 2ν/(ν+1) - (ν+2)/(ν+3) } / (2ν)
= n{ (1/4)ψ1(ν/2) - (1/4)ψ1((ν+1)/2) - { (ν+5) / (2ν(ν+1)(ν+3)) } }

Off-diagonal 2nd derivatives

d2 L / d μ d σ = d2 L / d σ d μ
= 2ν(ν+1)σ ∑i{ (xi-μ) / (νσ2 + (xi-μ)2)2 }
expectation = 0 (which is what you would hope) because it is an "odd" function about μ (i.e. g(μ+z) = - g(μ-z)).

d2 L / d μ d ν = d2 L / d ν d μ
= (ν+1)σ2 ∑{ (xi-μ) / (νσ2 + (xi-μ)2)2 } - ∑{ (xi-μ) / (νσ2 + (xi-μ)2) }
expectation = 0 because of the two "odd" functions.

d2 L / d ν d σ = d2 L / d σ d ν
= - n/σ + (2ν+1)σ ∑{ 1 / (νσ2 + (xi-μ)2) } - ν(ν+1)σ3 ∑{ 1 / (νσ2 + (xi-μ)2)2 }
expectation = - n/σ + n.(2ν+1)σ.e1 - n.ν(ν+1)σ3e2
= n{ - 1/σ + (2ν+1)σ / ((ν+1)σ2) - ν(ν+1)(ν+2)σ3) / ((ν+1)(ν+3)νσ4) }
= (n/σ){ - 1 + (2ν+1)/(ν+1) - (ν+2)/(ν+3) }
= (n/σ){ - 2 / (ν+1)(ν+3) }
= - 2.n / (σ (ν+1) (ν+3))

Fisher
 μ σ ν E d2L/dμ2 0 0 0 E d2L/dσ2 E d2L/dσdν 0 E d2L/dσdν E d2L/dν2

=
 (ν+1) / {(ν+3)σ2} 0 0 0 2ν / {(ν+3)σ2} - 2 / {(ν+1)(ν+3)σ} 0 = NE neighbour (1/4)ψ1(ν/2) - (1/4)ψ1((ν+1)/2) - (ν+5) / {(2ν(ν+1)(ν+3)}
×n3

= F11 . (F22.F33 - F232)

 n3.(ν+1) { 2ν .{ ψ1...-ψ1... - ν+5 } - 4 } (ν+3)σ2 (ν+3)σ2 4 2ν(ν+1)(ν+3) ((ν+1)(ν+3)σ)2

... including a step (ν+5)(ν+1)+4 -> ν2+6ν+9 -> (ν+3)2 ...

 F
= (n34) . { ν.(ν+1) / (2.(ν+3)2) . { ψ1(ν/2) - ψ1((ν+1)/2) } - 1 / ((ν+1).(ν+3)) }
-- LA, July 2007
confirming the equation presented (without working) by Yudi [Agu02].

Note,   +log Fisher = 3.log(n) - 4.log σ + log(expression(ν))

Message length
m = - log(h(μ, σ, ν)) + L + 1/2 log F + (d/2)(1 + log κd),   (d=3 parameters)

See [IP 1.2] for an implementation of Student's t-distribution.