The Principle of Least Square

ST. PAUL’S UNIVERSITY
Assignment 2
Question 1
20 Marks
Q.1.a
What assumption do we make in regression in order to use standard error of estimate?
Q.1.b
A mathematics placement test is given to all entering freshmen at a small college. A
student who receives a grade below 35 is denied admission to the regular mathematics
course and placed a remedial class. The placement test scores and the final grades for 20
students who took the regular course were recorded follows:
Placement Test
Course Grade
Placement Test
Course grade
50
53
90
54
35
41
80
91
35
61
60
48
40
56
60
71
55
68
60
71
65
36
40
47
35
11
55
53
60
70
50
68
90
79
65
57
35
59
50
79
a. Find the equation of the regression line to predict course grades from placement
test scores.
b. Also calculate the correlation coefficient r.
Solution:
1. a:
What assumption do we make in regression in order to use standard error of estimate?
Answer:
1.
The observed values of Y are normally distributed around each estimated
value of Yˆ .
2.
The variance of the distributions around each possible value of Yˆ is the
same.
If this second assumption were not true, then the standard error at one point on the
regression line could differ from the standard error at another point on the line.
1. b:
Placement Test
(x)
50
35
35
Course Grade
(y)
53
41
61
xy
x2
y2
2650
1435
2135
2500
1225
1225
2809
1681
3721
ST. PAUL’S UNIVERSITY
40
55
65
35
60
90
35
90
80
60
60
60
40
55
50
65
50

=1110
56
68
36
11
70
79
59
54
91
48
71
71
47
53
68
57
79
1173
a:
Equation of Regression Line:
y= a+bx
Where
n xy   x y
b
2
n x 2    x 
Substituting the values
20  67690   1110 1173
b=
2
20  67100   1110 
1353800  1302030
1342000  1232100
51770
b
109900
b  0.4711
b=
And
So
a  y  bx
y
 y  1173  58.65
x
 x  1110  55.5
n
20
n
20
2240
3740
2340
385
4200
7110
2065
4860
7280
2880
4260
4260
1880
2915
3400
3705
3950
67690
1600
3025
4225
1225
3600
8100
1225
8100
6400
3600
3600
3600
1600
3025
2500
4225
2500
67100
3136
4624
1296
121
4900
6241
3481
2916
8281
2304
5041
5041
2209
2809
4624
3249
6241
74725
ST. PAUL’S UNIVERSITY
a  58.65   0.4711 55.5 
a  58.65  26.1461
a  32.5039
So, the required regression line is:
Y  a  bX
Y  32.5039  0.4711X
b:
Correlation Co efficient:
n xy   x y
r
2
2
n x 2    x  n y 2    y 


r

20  67690   1110 1173
 20  67100  1110   20  74725  1173 
2
r
2
1353800  1302030
1342000 1232100 1494500 1375929 
51770
114153.1993
r  0.4535
r
Question 2
15 Marks
Groups
2-4
Frequency 18
4-6
24
6-8
47
8-10
80
10-12
102
12-14
66
14-16
40
16-18
21
18-20
15
Compute:
i.
ii.
M.D
Coefficient of Skew ness .Also interprets the result of Skewness.
Solution:
Groups
Frequency
x
fx
fx2
2-4
4-6
6-8
8-10
18
24
47
80
3
5
7
9
54
120
329
720
162
600
2303
6480
cumulative
frequencies
(c.f)
18
42
89
169
f X-X
139.7286
138.3048
176.8469
141.0160
ST. PAUL’S UNIVERSITY
10-12
12-14
14-16
16-18
18-20
Total
102
66
40
21
15
413
Mean:
Mean  x 
11
13
15
17
19
……
1122
858
600
357
285
4445
 fx  4445  10.7627
 f 413
(i) Mean Deviation:
 f X-X
M .D 
f

1191.7975
 2.8857
413
Mode:
 f m  f1 
h
 f m  f1    f m  f 2 
102  80 
xˆ  10 
2
102  80   102  66 
Mode  xˆ  l 
xˆ  10 
22
 2 =10.7586
 22  36 
Standard Deviation:
S .D 
 fx
f
S .D 
53525
2
 10.7627 
413
2
  fx 

  f 


2
S .D  129.6005  115.8357
S .D  13.7648
S .D  3.7101
(ii) Co efficient of skewness:
12342
11154
9000
6069
5415
53525
271
337
377
398
413
……
24.2046
147.6618
169.4920
130.9833
123.5595
1191.7975
ST. PAUL’S UNIVERSITY
Mean  M od e
Standard Deviation
10.7627  10.7586
Sk 
 0.0011
3.7101
Sk 
Interpretation:
As the value of Sk is positive, so distribution is positive skewed.