THE LINE OF BEST FIT VIA TRANSFORMATIONS Given a

THE LINE OF BEST FIT VIA TRANSFORMATIONS
DAVID G RADCLIFFE
Given a sequence of n points in the plane (X1 , Y1 ), . . . , (Xn , Yn ) we seek
the linear equation y = a + bx that approximates the
P points as closely as
possible, in that the sum of the squared errors E = ni=1 (Yi − a − bXi )2 is
minimized.
We assume that not all of the points lie on a single horizontal or vertical
line. InPthat case, wePcan apply
P
P a transformation to the points so that
xi =
yi = 0 and
x2i = yi2 = 1. The transformation is defined by
Xi − X
Yi − Y
xi = qP
and yi = qP
.
2
2
(Xi − X)
(Yi − Y )
Let r =
P
xi yi . Then
X
E=
(yi − a − bxi )2
X
=
(yi2 + a2 + b2 x2i − 2ayi − 2bxi yi + abxi )
X
X
X
X
X
X
=
yi2 +
a2 +
b2 x2i −
2ayi −
2bxi yi +
abxi
= 1 + na2 + b2 − 2br
= (1 − r2 ) + na2 + (b − r)2 .
The sum is minimized when a = 0 and b = r, so the line of best fit is
y = rx. What a simple equation!
Unfortunately, the equation is a bit messier when expressed in terms of
the original variables.



P
(X
−
x
−
y−Y
X)(Y
−
Y
)
X
i
i
 q

qP
=  qP
P
P
2
2
2
2
(Yi − Y )
(Yi − Y )
(Xi − X)
(Xi − X)
P
(Xi − X)(Yi − Y )
y−Y =
(x − X) .
P
(Xi − X)2
Date: December 25, 2014.