Statistics

Statistics and Data
Analysis
Professor William Greene
Stern School of Business
IOMS Department
Department of Economics
1/25
Part 14: Statistical Tests – Part 2
Statistics and Data Analysis
Part 14 – Statistical
Tests: 2
2/25
Part 14: Statistical Tests – Part 2
Statistical Testing Applications
Methodology
 Analyzing Means
 Analyzing Proportions

3/25
Part 14: Statistical Tests – Part 2
Classical Testing Methodology






4/25
Formulate the hypothesis.
Determine the appropriate test
Decide upon the α level. (How confident do we
want to be in the results?) The worldwide
standard is 0.05.
Formulate the decision rule (reject vs. not
reject) – define the rejection region
Obtain the data
Apply the test and make the decision.
Part 14: Statistical Tests – Part 2
Comparing Two Populations
These are data on the number of calls cleared by the operators at two
call centers on the same day. Call center 1 employs a different set of
procedures for directing calls to operators than call center 2.
Do the data suggest that the populations are different?
Call Center 1 (28 observations)
797 794 817 813 817 793 762 719 804 811 747 804 790 796 807 801 805
811 835 787 800 771 794 805 797 724 820 701
Call Center 2 (32 observations)
817 801 798 797 788 802 821 779 803 807 789 799 794 792 826 808 808
844 790 814 784 839 805 817 804 807 800 785 796 789 842 829
5/25
Part 14: Statistical Tests – Part 2
Application 1: Equal Means
Application: Mean calls cleared at the
two call centers are the same
 H0: μ1 = μ2
H1: μ1 ≠ μ2
 Rejection region: Sample means from
centers 1 and 2 are very different.
 Complication: What to use for the
variance(s) for the difference?

6/25
Part 14: Statistical Tests – Part 2
Standard Approach
H0: μ1 = μ2
H1: μ1 ≠ μ2
 Equivalent: H0: μ1 – μ2 = 0
 Test is based on the two means: x1 - x 2
 Reject the null hypothesis if x1 - x 2


7/25
is very different from zero (in either direction.
Rejection region is large positive or negative
values of x1 - x 2
Part 14: Statistical Tests – Part 2
Rejection Region for Two Means
Reject H if |x1 - x 2 | > t  s
where t is the t value (or normal). Use 1.96 as
usual for 5% significance. "s" is the standard
error of the difference in the means. What to use?
Two issues:
Equal variances in the two populations?
Both sample sizes large enough to use CLT?
8/25
Part 14: Statistical Tests – Part 2
Easiest Approach: Large Samples
9/25

Assume relatively large samples, so we
can use the central limit theorem.

It won’t make much difference whether
the variances are assumed (actually are)
the same or not.
Part 14: Statistical Tests – Part 2
Variance Estimator
In all cases, you can use s* =
2
1
2
2
s
s

N1 N2
Use 1.96 for the the critical t value because we
are using the central limit theorem to allow us
to use the normal distribution.
10/25
Part 14: Statistical Tests – Part 2
Test of Means

H0: μCall Center 1 – μCall Center 2 = 0
H1: μCall Center 1 – μCall Center 2 ≠ 0

Use α = 0.05

Rejection region:
x1  x2 - 0
x1  x2
=
s*
(s12 / N1 )  (s22 / N2 )
11/25
> 1.96
Part 14: Statistical Tests – Part 2
Basic Comparisons
Descriptive Statistics: Center1, Center2
Variable N Mean SE Mean StDev Min.
Med.
Max.
Center1 28 790.07
6.05 32.00 701.00 798.50 835.00
Center2 32 805.44
2.98 16.87 779.00 802.50 844.00
Boxplot of Center1, Center2
860
840
Means look different
820
Data
800
Standard deviations
(variances) look quite
different.
780
760
740
720
700
Center1
12/25
Center2
Part 14: Statistical Tests – Part 2
Test for the Difference
z=
 x1
 x2   0
s12 s22

N1 N2
=
=
790.07 - 805.44
32.002 16.87 2

28
32
-15.37
Note minus 0 because that is the
hypothesized value. It could have
been some other value. For
example, suppose we were
investigating a claim that a test
prep course would raise scores by
50 points.
45.465
-15.37
=
6.742
= -2.279.
This is larger (in absolute value) than 1.96, so we reject the
null hypothesis that the means are equal. It appears that
the means of the numbers of calls cleared at the two centers
are different.
Stat  Basic Statistics  2 sample t (do not check equal variances box)
This can also be done by providing just the sample sizes, means and standard deviations.
13/25
Part 14: Statistical Tests – Part 2
Application: Paired Samples



14/25
Example: Do-overs on SAT tests
 Hypothesis: Scores on the second test are no better
than scores on the first.
 (Hmmm… one sided test…)
 Hypothesis: Scores on the second test are the same
as on the first.
 Rejection region: Mean of a sample of second scores
is very different from the mean of a sample of first
scores.
Subsidiary question: Is the observed difference (to the
extent there is one) explained by the test prep
courses? How would we test this?
Interesting question: Suppose the samples were
not paired – just two samples.
Part 14: Statistical Tests – Part 2
Paired Samples
No new theory is needed
 Compute differences for each observation
 Treat the differences as a single sample
from a population with a hypothesized
mean of zero.

15/25
Part 14: Statistical Tests – Part 2
Testing Application 2: Proportion
Investigate: Proportion = a value
 Quality control: The rate of defectives
produced by a machine has changed.
 H0: θ = θ 0
(θ 0 = the value we thought it was)
H1: θ ≠ θ 0
 Rejection region: A sample of rates
produces a proportion that is far from θ0

16/25
Part 14: Statistical Tests – Part 2
Procedure for Testing a Proportion
Use the central limit theorem:



17/25
The sample proportion, p, is a sample mean.
Treat this as normally distributed.
The sample variance is p(1-p).
The estimator of the variance of the mean is
p(1-p)/N.
Part 14: Statistical Tests – Part 2
Testing a Proportion




H0: θ = θ 0
H1: θ ≠ θ 0
As usual, set α = .05
Treat this as a test of a mean.
Rejection region = sample
proportions that are far from θ0.
Test statistic =
18/25
p - 0
0 (1 - 0 )/N
Note, assuming θ=θ0 implies
we are assuming that the
variance is θ0(1- θ0)
Part 14: Statistical Tests – Part 2
Default Rate




19/25
Investigation: Of the 13,444 card applications,
10,499 were accepted.
The default rate for those 10,499 was
996/10,499 = 0.09487.
I am fairly sure that this number is higher than
was really appropriate for cardholders at this
time. I think the right number is closer to 6%.
Do the data support my hypothesis?
Part 14: Statistical Tests – Part 2
Testing the Default Rate
Sample data: p = 0.09487
 Hypothesis: θ0 = 0.06
 As usual, use  = 5%.

z
0.09487  0.06
0.06(1  0.06) / 10,499
 15.045.
This is much larger than the critical value of 1.96,
so my hypothesis is rejected. The default rate in the
population is different from 6%.
20/25
Part 14: Statistical Tests – Part 2
Application 3: Comparing Proportions
Investigate: Owners and Renters have the
same credit card acceptance rate
 H0: θRENTERS = θOWNERS
H1: θRENTERS ≠ θOWNERS
 Rejection region: Acceptance rates
for sample of the two types of applicants
are very different.

21/25
Part 14: Statistical Tests – Part 2
Comparing Proportions
H0 : OWNERS - RENTERS = 0
H0 : OWNERS - RENTERS  0
Use α = 0.05 as usual.
Base the test on z =
Note, here we are not assuming
a specific θO or θR so we use the
sample variance.
(pO - pR ) - 0
pO (1- pO ) pR (1- pR )
+
NO
NR
If z is greater than the critical value, reject the null
hypothesis. We are using the CLT throughout, so
use the normal distribution; z = 1.96
22/25
Part 14: Statistical Tests – Part 2
The Evidence
= Homeowners
23/25
Part 14: Statistical Tests – Part 2
Analysis of Acceptance Rates
5469
5030
= 0.7477
= 0.8206, pR =
pO =
5469 +1845
5030 +1100
0.8206 - 0.7477
z=
0.8206(0.1794) 0.7477(.2523)
+
7314
6130
0.0729
=
0.007082
= 10.294
This is larger than the critical value of 1.96, so the hypothesis
that the proportions are equal is rejected. It looks like owners
are accepted much more often than renters.
24/25
Part 14: Statistical Tests – Part 2
Followup Analysis of Default
OWNRENT
0
DEFAULT
0
1
All
4854
615
5469
46.23 5.86
52.09
1
4649
44.28
381
3.63
5030
47.91
All
9503
90.51
996
9.49
10499
100.00
Are the default rates the same for owners and renters? The data for the
10,499 applicants who were accepted are in the table above. Test the
hypothesis that the two default rates are the same.
25/25
Part 14: Statistical Tests – Part 2