Linear Regression 1

 LA home Computing MML  Glossary  Continuous   Normal 1   Normal 2   t-distn   von Mises-Fisher   von Mises   Linear1   Linear 1   Normal
The simplest linear regression as a case study:

 X: -9 -8 -6 -3 -2 1 2 4 6 7 →Y: -3 -7 -5 -3 -1 0 1 4 5 8 a= b= σ=
y = a.x + b + N(0,σ),  or equivalently  N(a.x + b, σ).

The probability density of y given x:
f(y)  =
 1 √(2 π) σ
exp(
 - (y - a.x - b)2 2 σ2
)

Given {(xi, yi)}, i = 1..n, where the xi are "common knowledge", the negative log l.h. =

L = (n/2)log(2 π) + (n/2)log(σ2) + (1/(2 σ2))Σ{yi-a.xi-b}2

= (n/2)log(2 π) + n.log(σ) + (1/(2 σ2))Σ{yi-a.xi-b}2

First partial derivatives...

d L / d a = (-1/σ2) Σ{ xi.(yi-a.xi-b) }

= (-1/σ2) Σ{ xi.yi-a.xi2-xi.b }

d L / d b = (-1/σ2) Σ{yi-a.xi-b}

= (-1/σ2) { (Σ yi) - a(Σ xi) - n.b}

d L / d σ = n/σ - (1/σ3)Σ{yi-a.xi-b}2

L is minimized when the line passes through the C of G of the points (which leaves the slope, a). a = Σ xi(yi-b) / Σ xi2, and σ is the sqrt of the residual variance.

Second partial derivatives...

d2 L / d a2 = (+1/σ2) Σ{ xi2 }
(and remember, the xi are common knowledge)

d2 L / d b2 = n/σ2

d2 L / d σ2 = - n/σ2 + (3/σ4)Σ{yi-a.xi-b}2
expectation = 2 n / σ2

Off-diagonal second partial derivatives...

d2 L / d a.d b = (+1/σ2)Σ xi
= n . mean{xi} / σ2

d2 L / d a.d σ = (+2/σ3) Σ{ xi.(yi-a.xi-b) }
expectation = 0

d2 L / d b.d σ = (+2/σ3)Σ{yi-a.xi-b}
expectation = 0

Fisher
a b σ
a Ey d2 L / d a2 Ey d2 L / d a d b 0
b Ey d2 L / d a d b Ey d2 L / d b2 0
σ 0 0 Ey d2 L / d σ2

F = 2 n { n.(Σ xi2) - (n.mean{xi})2 } / σ6
= 2 n3 { (Σ xi2) / n - (mean{xi})2 } / σ6
 F
= 2 n3 variance{xi} / σ6

Priors

a = tan θ   where θ is the angular slope.
d a / d θ = 1 / cos2θ = 1 / (1 + a2).
The uniform prior, 1 / π, on θ corresponds to
the prior pr(a) = 1 / (π (1+a2)) on 'a'.
b can be untangled from 'a' by making the C of G the origin.
Then b ~ μ (of the {yi}) in the [normal distribution].
-- L.A. @ Dept. Comp. Sci., U. York, 12/2004