fariSample mean πΜ
=
π1 +β―+ππ
π
π1 +β―+ππ βππ
Central Limit Thm: π {
Step 4: check if observed TS falls into region
Critical Region is set of all samples that will lead to
rejection of π»0
P-value Prob of observing evidence as much/more in
favor of π»1
Step 1: Set up null and alt hypotheses
Step 2: Find TS π with known dist under π»0
Step 3: Calculate p-value
< π₯} β π{π < π₯}
πβπ
Μ
)2
(βπ
π=1(ππ βπ )
Sample Variance: π 2 =
, πΈ[π 2 ] = π 2
πβ1
Use π β 1 to make E[sample var] = population var.
Chi Square Use to find π 2 from normal.
π1 β¦ ππ are iid standard normal
π = π12 + β― + ππ2 , π is chi-square with π deg freedom Ξ§π2
πΜ
βπ
π₯Μ
βπ
π₯Μ
βπ
2
2
Ξ§π
+ Ξ§π2 = Ξ§π+π
π = π {| β 0| β₯ | β 0 |} = 2 (1 β Ξ¦ (| β 0 |))
π βπ
π βπ
π βπ
π2
Sampling from a Normal Population: πΜ
~π (π, )
Step 4: check if π is small enough
π
π1 + β― + ππ ~π(ππ, ππ 2 ), πΜ
β₯ π 2
If π is smaller than your πΌ, reject π»0
π tells us how cautious/easily convinced we should be
T-Distribution If π β₯ ππ2, π = π/βππ2 /π , π‘π
πΜ
βπ
Decision Errors Type I: Reject π»0 when itβs actually true
~π‘πβ1
π/βπ
βFalse alarmβ error πΌ
π]
Method of Moments πΈ[π = ππ (π1 β¦ ππ )
Type II: Accept π»0 when itβs not true. Miss.
π
π
π = π1 +β―+ππ
Μ
Μ
Μ
Μ
Operating characteristic (OC) curve: π½(π) =
If π unknown parameters, let π
π
ππ {ππππππ‘ π»0 }
Μ
Μ
Μ
Μ
π
ππ (π1 β¦ ππ ) = π
1 β π½(π) = ππ {ππππππ‘ π»0 } is Power Function
Maximum Likelihood Estimator
One-Sided Test π»0 = π0 , π»1 = π > π0
Likelihood πΏ(π) = π(π₯1 β¦ , π₯π ; π)
πΜ
βπ0
π₯Μ
βπ
π₯Μ
βπ
π(π₯1 β¦ ; π) = ππ₯ 1 (π₯1 ; π) β ππ₯π (π₯π : π)
TS: π =
, π = π {π β₯ β 0 } = 1 β Ξ¦ ( β 0 )
π/βπ
π βπ
π βπ
Maximize the log likelihood log πΏ(π) = π(π)
Alternately π»0 : π β€ π0 , composite hypothesis
Estimate Difference in Means
π₯Μ
βπ
π₯Μ
βπ
TS same as above, π = π {π β₯ β 0 } β€ 1 β Ξ¦( β 0 )
2 +(πβ1)π 2
(πβ1)π
π βπ
π βπ
1
2
Get pooled sample variance: ππ2 = (πβ1)+(πβ1)
(the bell curve only has one region that matters)
πΜ
βπ
Like a weighted average, weights as degrees of freedom
Unknown Variance π»0 : π = π0 , TS π = πβ 0 ~π‘πβ1
βπ
Evaluating a point estimator
π₯Μ
β π0
π₯Μ
β π0
Biased: shooting off-target
π = π{|π| β₯ |
})
| = 2 (1 β π {π‘πβ1 β€
π(πΜ) = πΈ[πΜ(π1 β¦ ππ ) β π
πββπ
πββπ
2
Equality of Means ππ₯2 and ππ¦2 are known, π, π are normal
Μ
Mean Squared Error: πΈ [(πΜ β π) ] = πππ(πΜ) + π2 (π)
2 sets of indep samples
2
2
=πΈ [(πΜ β π) ] β (πΈ[πΜ] β π)
π»0 : ππ₯ = ππ¦ ππ (ππ₯ β€ ππ¦ )
Hard to assess credibility of a point estimate by itself
TS: if ππ₯ = ππ¦ , |πΜ
β πΜ
| is more likely to be small.
Precision: measure spread
π2
π2
πΜ
β πΜ
~π (ππ₯ β ππ¦ , π₯ + π¦ ) since sum of indep normal is
π
π
Confidence Interval
still normal.
Step 1: find a statistic involving the parameter
πΜ
βπΜ
β(ππ₯ βππ¦ )
πΜ
βπ
Standardize:
=TS. π = π{|ππ| β₯ |πππ πππ£ππ|}
Mean with known variance:
~π(0,1)
Mean with unknown Var:
π/βπ
πΜ
βπ
βππ₯2 βπ+ππ¦2 βπ
~π‘πβ1
π/βπ
(πβ1)π 2
2
mean:
~Ξ§πβ1
π2
=2 (1 β Ξ¦ (
Step 3: Find the equivalent interval for the parameter
πΜ
βπ
π
π
βπ§πΌβ2 β€ β β€ π§πΌβ2 β πΜ
β π§πΌβ2 β€ π β€ πΜ
+ π§πΌβ2
βπ
β€ π β€ πΜ
+ π§πΌβ2
π
βπ
βπ
}= 1βπΌ
π βπ
π2
π1 , π2 known, use πΜ
β πΜ
± π§πΌ/2 β 1 + π22 /π
π
π1 , π2 unknown but equal use πΜ
β πΜ
±
1
1
( + )(πβ1)π12 +(πβ1)π22
π π
π‘πΌ,π+πβ2 β
π+πβ2
2
Lower confidence interval
2
2
π
π
π1 , π2 known, use (ββ, πΜ
β πΜ
+ π§πΌ β 1 + 2 )
π
π
π1 , π2 unknown but equal use (ββ, πΜ
β πΜ
+
βππ₯2 βπ+ππ¦2 βπ
)) and reject if π is small
βππ2 βπ+ππ2 βπ
~π‘π+πβ2 π = π {|π‘π+πβ2 | β₯
Sample correlation coefficient: π = πππ /βπππ πππ
So π 2 = π
2 , π =
π₯Μ
βπ¦Μ
βππ2 βπ+ππ2 βπ
traditionally
π does not measure the slope of regression line
π
Slope is πΜ = π π , is about π if ππ β ππ
ππ
π β€ 1 so usually you get regression to the mean
Inferential Linear Regression
π = πΌ + π½π₯ + π where π is error, assume πΈ[π] = 0
πΈ[π|π₯] = πΌ + π½π₯
For each observation ππ = πΌ + π½π₯π + ππ
Assume random errors ππ are indep
Estimate πΌ, π½, πππ π 2 from the tuples of samples
Estimate πΌ and π½ from least squares formulas
πππ
estimates π 2
βπ (π₯ βπ₯Μ
)(ππ βπΜ
)
π΅ = π=1π π
, π΄ = πΜ
β π΅π₯Μ
2
βπ=1(π₯π βπ₯Μ
)
π΅ is normally distributed, unbiased estimator for π½
π2
π΅~π (π½,
) π΄~π (πΌ,
2
π 2 βπ
π=1 π₯π
)
πππ
ππππ
πππ
= βππ=1(ππ β π΄ β π΅π₯π )2
πππ
2
~Ξ§πβ2
, lose 2 deg. Free since A and B are linear
π2
ππ
ππ
combinations of π1 β¦ ππ , πΈ [ π
] = π 2 , πΈ [ 2π
] =
πβ2
π
ππ
2 ]
2
πΈ[Ξ§πβ2
= π β 2. 2π
~Ξ§πβ2
indep of A and B
π
Confidence Intervals
ππ
π
π½: (π΅ β β(πβ2)π
ππ
πΌ: π΄ ± β
ππ
π
π‘πΌβ2,πβ2 , π΅ + β(πβ2)π
)
ππ
βπ π₯π2 πππ
π‘
π(πβ2)πππ πΌβ2,πβ2
1
(π₯0 βπ₯Μ
)2
π
πππ
πΌ + π½π₯0 : π΄ + π΅π₯0 ± β +
πππ
π‘ β
πβ2 πΌ 2,πβ2
β
Distributional Results
(πβ2)πππ
π½: β
πππ
(π΅ β π½)~π‘πβ2
βπ π₯π2 πππ
(π΄ β πΌ)~π‘πβ2
π΄+π΅π₯0 βπΌβπ½π₯0
Μ
)2 πππ
1 (π₯ βπ₯
β( + 0
)
)(
πππ
Future Response π = πΌ + π½π₯0 + π:
~π‘πβ2
πβ2
πβπ΄βπ΅π₯0
β
Μ
)2 πππ
π+1 (π₯0βπ₯
β
+ π
π
πβ2
ππ
~π‘πβ2
Mean Response given π₯0 : πΈ[π|π₯0 ] = πΌ + π½π₯0
Estimates mean response given input π₯0
Unbiased, is a linear combination of indep normal r.v.
}
Linear Regression
Sum of squared residuals: ππ = βππ=1(ππ β π β ππ₯π )2
Choose π and π that minimize this
Take partial derivatives w/ respect to a, b
Set to 0, get normal equations π + π₯Μ
π = πΜ
, ππ₯Μ
π +
π(βππ π₯π2 ) = βππ π₯π ππ
βπ π₯ π βππ₯Μ
πΜ
Solution: π = π ππ 2π
, π = πΜ
β π₯Μ
π
βπ π₯π βππ₯Μ
β(π₯π βπ₯Μ
)(ππ βπΜ
)
ββ(π₯π βπ₯Μ
)2 β(ππ βπΜ
)2
π
If Assume not equal replace ππ with ππ₯ and ππ¦
Test Equality of Proportions L18
Identities: βππ(π₯π β π₯Μ
)2 = βππ π₯π2 β ππ₯Μ
βππ(π₯π β π₯Μ
)(ππ β πΜ
) = βππ π₯π ππ β ππ₯Μ
πΜ
2
ππ₯π
Mean Response πΌ + π½π₯0 :
1
βπ (π β πΜ
)2 similar for ππ¦
ππ₯ =
πβ1 πβ1 π
Pooled sample variance as weighted average
If π»0 is true, then ππ₯ β ππ¦ = 0
π₯Μ
βπ¦Μ
Step 4: plug in observed numbers
πΜ
βπ
One-sided upper: {βπ§πΌ β€ β < β} = 1 β πΌ
|π₯Μ
βπ¦Μ
|
πππ
πππ πππ
π(πβ2)πππ
Unknown Variance Assumed equal
π βπ
π {πΜ
β
So π
2 =
πΌ: β
Var with unknown
Step 2: find initial interval with prob. 1 β πΌ
πΜ
βπ
Two-sided:π {βπ§πΌβ2 β€ β β€ π§πΌβ2 } = 1 β πΌ
π βπ
π
π§πΌβ2
βπ
2
2
π π βπ
πππ
= βππ=1(ππ β πΜ β πΜπ₯π ) = ππ ππ π₯π
Prediction Interval
Find an interval that will contain the response with
certain confidence, NOT the same as confidence interval
Best βpoint predictionβ π΄ + π΅π₯0 , unbiased
Analysis of Residuals
Assume independence, constant variance
(homoscedasticity), normality.
Check these assumptions by looking at residuals
π β(π΄+π΅π₯π )
Standardized residuals π
(the denominator
βπππ
/(πβ2)
estimates π
Linearity violated if residuals show discernible pattern
Nonconstant variance shows increasing outliers
Check Normality
Sample quantile of standardized residuals vs standard
normal quantile
π+πβ2
Quantiles: points taken at regular intervals from CDF of
ππ₯π₯ = βππ=1(π₯π β π₯Μ
)2 = (π β 1)π π₯2
r.v.
Approximate Confidence Interval
πππ = βππ=1(ππ β πΜ
)2 = (π β 1)π π2
Have 1 β πΌ CI with length no greater than 2Ξπ. Whatβs
This is TSS, total sum of squares, amount of variance in Q-Q plot: compare 2 distributions by plotting quantiles
the min sample size required?
response variables. Part of this is caused by difference in against each other, to check normality
Transforming to Linearity
(πΜ ± π§πΌβ2 βπΜ (1 β πΜ )βπ ) Length: 2π§πΌβ2βπΜ(1βπΜ)βπ β€ 2Ξπ input variables
2
2
π(π‘) β ππ ππ‘ β πππ β ππππ β ππππ + ππ‘
πππ
= βππ=1(ππ β πΜπ ) = βππ=1(ππ β πΜ β πΜπ₯π )
Sub πΜ with πβ after testing π times
Where πΌ is ππππ and π½π₯ is ππ‘ and ππππ is Y
Hypothesis Testing
Measures remaining amount of variance in the response
Interpret the regression coefficients:
π»0 : presume this, in control, conventional, not effective, variables, after we take into account difference of input
Regression equation π = πΜ + πΜπ‘
no discrimination
variables, known as RSS, residual sum of squares
average number of defects at 0 is πΜ, and as X increases
π(π₯
Μ
π»1 : out of control, new finding, discrimination, more
ππ₯π = βπ π β π₯Μ
)(ππ β π)
1, number of defects increases by πΜ on average
effective
πππ β πππ
is amount of variation explained by input var, by
π(π‘+1)
β 1 + πΜ by Taylor expansion
Level of significance πΌ Critical Region Approach
known as explained sum of squares ESS
π(π‘)
πΈππ
π
ππ
Step 1: set up null and alternative hypotheses
π(1.01π‘)
Coefficient of determination π
2 =
=1β
β 1 + 0.01πΜ by Taylor expansion
πππ
πππ
π(π‘)
Step 2: find a statistic π with known distribution
This shows how well the model fits the data.
Linear Transformations
Called the test statistic TS, π(π1 β¦ ππ ) =
Ex. π
2 = .75, so 75% of variation of y is from changes in Exponential: ππππ β ππππ + ππ₯, π β ππ ππ₯
Step 3: Choose an πΌ and find βblue regionβ
x. The rest is from variance even when x is constant
X increases by 1, Y changes by factor of 100d%
Smaller πΌ means smaller region
1 1
π π
( + )(πβ1)π12 +(πβ1)π22
π‘πΌ,π+πβ2 β
)
So π =
Μ
βπ
π (π₯π βπ₯Μ
)(ππ βπ )
2
βπ
π (π₯π βπ₯Μ
)
= ππ₯π /ππ₯π₯
π(π₯+1)
π(π₯)
= π π₯ so
π(π₯+1)
π
Assumptions πππ are independent normal r.v.
β 1 + π₯ where π₯ < 0
Logarithm: π β ππππ + πππππ₯
X increases b factor of 1%, Y changes by d/100
π(1.01π₯) β π(π₯) = ππππ(1 + .01)
log(1 + π₯) β π₯ so π(1.01π₯) β π(π₯) β .01 β π
Power: ππππ β ππππ + πππππ₯, π β ππ₯ π
X increases by factor of 1%, Y changes by factor of d%
π(1.01π₯)
= (1 + .01)π , (1 + π₯)π β 1 + ππ₯
π(π₯)
Multiple Linear Regression
Least Squares method:
π΅0
1 π₯11 π₯1π
π΅ = [ β¦ ] = (π β² π)β1 π β² π where π = [1|π₯21 |π₯2π ] and π =
π΅π
1 π₯π1 π₯ππ
π1
[ ] π½π is amount of change in response if π₯π increased by
ππ
1 keeping others unchanged.
Residuals ππ = ππ β π΅0 β π΅1 π₯π1 β β― π΅π π₯ππ
Matrix form π = π β ππ΅
πππ
= π β² π β πβ²ππ΅ residual sum of squares
Coefficient of multiple determination
ππ
π
2 = 1 β π
proportion of variation explained by model
πππ
π
is multiple correlation coefficients
As input vars increase, π
2 goes down, πππ
goes up
β¦But this could mean overfitting
(πβ1)π
2 βπ
Adjusted/corrected π
2 : π
Μ
2 =
πβπβ1
Inference π΅ Has multivariate normal dist
π(π½, π 2 (π β² π)β1 ), π½ is unbiased estimator
π 2 (π β² π)β1 is covariance matrix, diagonals give variance
π2
For each π, π΅π ~π(π½π , )
and
πππ
πππ
1
is ππ‘β elem in diagonal of (π β² π)β1
πππ
2
= Ξ§πβπβ1
and independent of B
This is an unbiased estimator for π 2
π2
π2
π΅π ~π (π½π , ) for inference on π½π
π΅π βπ½π
βπ
Μ 2 βπππ
πππ
~π‘πβπβ1 and denominator is estimate for sd of π΅π ,
standard error. Test whether π₯π has impact on response
keeping other input vars unchanged π»0 : π½π = 0
Variances of each are the same
πππ = π + πΌπ + π½π where π is overall mean
πΌπ is correction for the πth row
π½π is correction for the πth column
π + πΌπ is mean for ππ‘β row, use π½π for mean of ππ‘β column
Hypotheses π»0π : row factor has no impact, π»0π : colun has
no impact
Estimators π.. is an unbiased estimator for π, overall
Sample Midterm 2
sample meanβ overall mean
(a) If simple linear regression of Y on X yields a line with
ππ. β π.. is unbiased estimator for πΌπ
slope equal to 2, then simple linear regression of X on Y
π.π β π.. is unbiased estimator for π½π
will yield a line with slope equal to 1/2. False.
Sample mean difference β mean difference
(b) A simple linear regression model (Y on x) has R^2 =
2
Row sum of squares πππ = βπ
π=1 π(ππ. β π₯.. )
0.7, while an exponential
Variation between rows
model (log(Y) on x) has R^2 = 0.8. This means the
2
Column sum of squares πππ = βππ=1 π(π.π β π.. )
exponential model can explain more variation in Y.
False.
Variation between columns
π
(c) In multiple regression, adding an additional
Error sum of squares πππ = βπ
π=1 βπ=1(πππ β ππ. β π.π +
2
regressor (input variable) will always increase the
π.. ) Variation due to randomness, subtract mean, πΌπ and
multiple R-square, and decrease the adjusted R-square.
π½π from πππ
False.
πππ
an unbiased estimator for π 2
(d) In linear regression, a 95% prediction interval for a
(πβ1)(πβ1)
ππ
future response given certain
2
Error: 2π ~Ξ§(πβ1)(πβ1)
π
input level always contains the 95% confidence interval
ππ
2
Row: if π»0π is true, 2π ~Ξ§πβ1
, indep. Of πππ
for the mean response given the same input level. True.
π
ππ
(e) In one-way ANOVA, the null hypothesis states that
2
Col: πΌπ π»0π is true, 2π ~Ξ§πβ1
, indep. Of πππ
π
the population mean within each group is equal to zero.
ANOVA Table Let π = (π β 1)(π β 1)
False.
source
sum of
d.f.
mean
F ratio
p-value
Having done poorly on their math final exams in June,
of var
squares
square
πππ
πππ β(π β 1)
row
πππ
π
π{πΉπβ1,π some students repeat the course in summer school, then
β1
β₯ πΉ πππ‘ππ} take another exam in August. Assume these students are
πβ1
πππ /π
πππ
πππ β(π β 1)
column
πππ
π
π{πΉπβ1,π
representative of all students who might attend summer
β1
β₯ πΉ πππ‘ππ}
πβ1
πππ /π
school. Given their scores in June and August, explain
Error
πππ
π
πππ /π
how you can test whether the summer program is
total
πππ
ππ
+ πππ
β1
worthwhile.
+ πΈπΈπ
Use paired t-test. ππ₯ is mean score in June, ππ¦ mean
2-way ANOVA with Interaction
score in August: π»0 : not effective ππ¦ = ππ₯ π»1 ππ¦ > ππ₯
Without interaction, π1π β π2π is same for all π
Let Xi and Yi be the scores of ith student, and take the
ππ1 β ππ2 same for all π
Μ
= 1 βπ π·π , and
difference Di=Yi-Xi. Sample mean π·
Interaction π = π + πΌ + π½ + πΎ
ππ
π
π
π
ππ
π = π.. : grand mean
πΌπ = ππ. β π.. effect of the ith row
Multiple R: π
π½π = π.π β π.. effect of the jth col
2
R Squared: π
= 1 β πππ
βπππ
Adjusted R Squared: π
Μ
2 = ((π β 1)π
2 β π)β(π β π β 1) πΎππ interaction of ith row and jth column
πΎππ = πππ β π β πΌπ β π½π = πππ β ππ. β π.π + π..
Standard Error: πΜ = βπππ
β(π β π β 1)
Data m rows, n columns, m*n cells, π β₯2
Observations: π
observations/cell πππ : sample mean of cell (π,j)
π
1 π₯1 π₯
Polynomial Regression π΅ = (π β² π)β1 πβ²π, π₯ = [ |π₯ | 1π ] ππππ denotes an individual number
1 π π₯2
Assumptions ππππ = πππ + ππππ = π + πΌπ + π½π + πΎππ + ππππ
ANOVA: Analysis Of Variance
ππππ Independent random error, ~π(0, π 2 )
Study the impact of a certain factor
{π,π,πππ‘}
π»0
: no {row,column,interaction}
π»0 : all means are equal
Estimation
ππ total number of observation
πβ¦ : πππππ ππππ π
π.. sample mean of all observation
β’
ππ.. : πππ€ ππππ ππ.
πππ πth observation in πth group
β’
π.π. : ππππ’ππ ππππ π.π
ππ . Sample mean of the πth group
2 π : ππππ ππππ π
Between-group sum of squares πππ = βπ
ππ.
ππ
π=1 ππ (ππ. β πβ’.. )
Amount of variation that can be explained by groupsβ’
Row effect πΌπ = ππ. β π
Within-group sum of squares
β’
estimator: πΌΜπ = ππ.. β πβ¦
2
ππ
β’
Column effect π½π = π.π β π
πππ€ = βπ
π=1 βπ=1(πππ β ππ. ) amount of var that canβt be
ππ
β’
estimator: π½Μπ = π.π. β πβ¦
explained by groups, 2π€ ~ππ2π βπ , πππ and πππ€ are indep.
π
β’
Interaction
πΎ
=
π
β
π β πΌπ β π½π
ππ
ππ
ππ β(πβ1)
TS: πΉ = πβ
~πΉπβ1,ππ βπ if π»0 is true
β’
estimator: πΎΜππ = πππ. β ππ.. β π.π. +
πππ€ (ππ‘ βπ)
πβ¦
π-value=π{πΉπβ1,ππ βπ β₯ πππ πππ£ππ ππ}
Error ππππ = ππππ β πππ estimator ππππ β πππ.
ANOVA table for multiple regression π»0 : Y doesnβt β’
2
π
π
depend on input
β’
Error sum of squares πππ = βπ
π=1 βπ=1 βπ=1(ππππ β πππ. )
Reading Excel
source
of var
between
groups
within
groups
total
sum of
squares
πππ
πππ€
πππ
+ πππ€
d.f.
π
β1
ππ
βπ
ππ
β1
mean
square
πππ
πβ1
πππ
ππ β π
F ratio
p-value
πππ
πππ β(π β 1)
πππ β(π π β π)
π{πΉπ,πβπβ1
β’
β₯ πΉ πππ‘ππ}
π
Interaction sum of squares πππππ‘ = βπ
π=1 βπ=1 π(πππ. β
β’
Two-Factor ANOVA
One-way: single factor
Ex. gas mileage: difference between cars and with
drivers?
Each row/column represents one level
Column averages π.1 β¦ π.π
Row averages π1. β¦ ππ. , overall average π..
β’
β’
π2
βΌ
πππ
2
πππ(πβ1)
,
ππ(πβ1)
2
is an unbiased estimator for π
ππ
2
ππ.. β π.π. + πβ¦ ) , Under π»0πππ‘ , πππ‘
βΌ π(πβ1)(πβ1)
π2
Row sum of squares
2
β’
πππ = βπ
π=1 ππ(ππ.. β πβ¦ )
π πππ
2
β’
Under π»0 , 2 βΌ ππβ1
π
Column sum of squares
2
β’
πππ = βππ=1 ππ(π.π. β πβ¦ )
β’
Under π»0π ,
πππ
π2
2
βΌ ππβ1
2
sample sd ππ· = β
1
πβ1
Μ
)
βπ(π·π β π·
2
Μ
/ππ· is π‘πβ1 π = Pr{π‘πβ1 β₯ ππ}
TS: π = βππ·
3. In a study of the relationship between the starting
salary (Y) and GMAT score (x)
of n = 100 MBA students, the following results are
obtained: average starting salary
avg(Y) = $90,000, sample standard deviation s(Y) =
$45,000, average GMAT avg(x) = 600,
sample standard deviation s(x) = 100, sample
correlation coefficient r = 0.3.
(a) (10 pts) Find and interpret the slope of the simple
linear regression of Y on x.
πππ
πΜ =
= 135
ππ₯
πΜ = πΜ
β πΜπ₯Μ
=90000-135*60=9000
Y=9000+135x
(b) (10 pts) Find a 95% confidence interval for the
mean salary of people who got 700 in GMAT.
Get 100(1 β πΎ) C.I. for πΌ + π½π₯0
π΄ + π΅π₯0 β (πΌ + π½π₯0 )
~π‘πβ2
1 (π₯ β π₯Μ
)2 β πππ
β + 0
π
ππ₯π₯
πβ2
1
(π₯0 βπ₯Μ
)2
π
ππ₯π₯
Use π΄ + π΅π₯0 ± β +
β
πππ
π‘πΎ
πβ2 2,πβ2
(c) (10 pts) Test at 0.05 level on whether higher GMAT
score means higher salary.
π»0 : π½ = 0 mean salary doesnβt increase GMAT
(πβ2)ππ₯π₯
TS is π‘πβ2 , ππβ2 = β
πππ
π΅
(πβ2)π π₯π₯
p-value = π(ππβ2 > β
π π π
πΜ
Since π‘98 close to standard normal, know p=1-Ξ¦(3)
(d) (15 pts) Construct the ANOVA table for linear
regression.
(e) (10 pts) Simple linear regression of log(Y) on x
yields slope = 0.015%, and that
of Y on log(x) yields slope = 8000. Interpret the results.
If GMAT incrases by 1, starting salary increases by
.015% factor on average.
For slope=8000: if GMAT score increases by factor of
1%, salary increases by 8000*.01=80 on average.
An emergency room physician wanted to know whether
there were any differences in the amount of time it takes
for three different inhaled steroids to clear a mild
asthmatic attack. Over a period of weeks she randomly
administered these steroids to asthma sufferers, and
noted the time it took for the patientsβ lungs to become
clear. Afterward, she discovered that 12 patients had
been treated with each type of steroid, with the
following sample means (in minutes) and sample
variances resulting. Construct the ANOVA table to test
on whether the mean time to clear a mild asthma attack
is the same for all three steroids.
Steroid
Sample mean
Sample
variance
A
32
145
B
40
138
C
30
150
π=3, n=12, ππ is mean time to clear attack for ith
steroid
ππ β(πβ1)
|π»0 : all means are equal, π
~πΉπβ1,ππβπ
πππ€ /(ππβπ)
π π π = π βπ
Μ
π β π₯Μ
Μ
)2 = 12 β [(32 β 34)2 + (40 β 34)2 + (30 β 34)2 ] =
π=1(π₯
672
2
π
π
2
π π π€ = βπ
π=1 βπ=1(π₯ππ β π₯Μ
π ) =βπ=1(π β 1)π π =
11(145 + 138 + 150) = 34
source
of
varianc
e
betwee
n
groups
within
groups
total
sum of squares
d.f
.
mean
square
F ratio
P-val
672
2
336
πΉπβ1,ππβπ
π(πΉ2,33
β₯πΉ
β πππ‘ππ)
33
144.333
3
672+4763=543
5
35
© Copyright 2026 Paperzz