Inference for Population Proportions p and p1‐p2, more than 2 samples and association When Y ~ Bin(n, p), and the population proportion, p is unknown, we can build confidence intervals and do hypothesis tests for it (just as we learned to make a confidence interval for the population mean, ). We’ll use the same format for the confidence interval and hypothesis on p as we did when we learned to make a confidence interval on the population mean, . First, decide which point estimate to use for the parameter, discover its distribution, and find critical points in that distribution that have a certain percentage between them. Point Estimator If asked to make a point estimate (your “best guess”) of the population proportion, p, it makes sense to use the sample proportion, p
Question: What is the distribution of p? Then, what is the standard error of p? Y
as the best guess. n
Critical Point Just as the confidence interval we learned for the population mean was a different form of the same inference made using the t hypothesis test, the confidence interval we present here is also a different form of a hypothesis test. We know the distribution of p is binomial, so we could build a confidence interval or hypothesis test based on this distribution, but instead the confidence interval presented here uses an asymptotic distribution (we estimate the binomial distribution with a normal distribution). Use a Z critical point, Z/2. Confidence Interval for p Test Statistic for Hypothesis test for p Assumption check: InferenceforPopulationProportionsandAssociation
Page2
Note that your text uses something different from this hypothesis test / confidence interval where an adjustment is made by adding 4 trials (2 more successes and 2 more failures). There exist many different types of hypothesis tests and this adjustment is made based upon using a different type of hypothesis test and its corresponding confidence interval. Although using this adjustment is generally considered better than the hypothesis test and interval we are using, we will use ours since there are readily available R functions for it. R function “prop.test” will do a hypothesis test and confidence interval for p. prop.test(y,n,correct=FALSE) with the usual arguments for the alternative hypothesis, null value, and confidence level. This test outputs a Chi‐square test statistic, but this number is simply the square of our Z statistic (Chi‐square distribution coming later in this set of notes….). Example A survey of English and Scottish male college students revealed 40 of 400 were left‐handed. Construct and interpret 90% confidence interval for the proportion of those left‐handed, p. InferenceforPopulationProportionsandAssociation
Page3
Inference on p1 – p2 Similar to the one sample case, inference can be made on the difference between two population proportions. For the case where the data are from a binomial setting with Y1 ~ Bin(n1,p1) independent of Y2 ~ Bin(n2,p2) we can make a confidence interval and hypothesis test for p1 – p2. Angina pectoris is a chronic condition in which the sufferer has periodic attacks of chest pain. In a study to evaluate the effectiveness of the drug Timolol in preventing angina attacks, patients were randomly allocated to receive a daily dosage of either Timolol or placebo for 28 weeks. Let Y1 = number of patients angina free after Timolol Y2 = number of patients angina free after placebo Compute and interpret a 95% confidence interval for the difference in proportion of angina free under Timolol versus placebo. Use prop.test to enter the data for your two samples. Enter your data as a matrix using the command angina<‐matrix(c(44,116,19,128),nrow=2) Notice that you enter by columns….prop.test assumes your successes are the first row. InferenceforPopulationProportionsandAssociation
Page4
Hypothesis Test for p1‐p2 We have learned a confidence interval for p1 – p2, the difference in the population proportions. We want a hypothesis testing procedure for this difference. Definitions A contingency table is a tabular arrangement of count data representing how the row factor frequencies relate to the column factor. We call a contingency table with “r” rows and “c” columns, an r x c contingency table. Each category in a contingency table is called a cell. Example Consider a 2 x 2 contingency table with the row factor denoting a success versus failure, and the column factor denoting Group 1 or Group 2, where the samples for both Group 1 and Group 2 are independent of each other. Then, the contingency table looks like this: Group 1 Group 2
Y2 Success Y1 Failure Recall Example 10.37 regarding effectiveness of Timolol on angina status. The contingency table would be as follows: Timolol
Angina free 44 Not Angina Free 116
Placebo
19 128
We have already used this data to construct a 95% confidence interval for the difference in the proportion of angina free for the Timolol versus the Placebo conditions. InferenceforPopulationProportionsandAssociation
Page5
Let p1 denote the probability (or population proportion) of success for Group 1 Let p2 denote the probability (or population proportion) of success for Group 2 To test HO: p1 = p2 against some alternative, we’ll introduce Pearson’s 2 (Chi‐square) statistic. Definition O‐E
2
where the sum is over all the cells in the table, O denotes Pearson’s 2 statistic is Xs2 ∑
E
observed values in each cell, and E denotes the value we’d expect to see (if HO were true). Now, we have the observed values (the data we collected). What are the E’s? Remember, we conduct hypothesis tests under the assumption that the null hypothesis is true. If the null hypothesis were true, then _____________. So, then p1 and p2 would be estimating a common p (i.e. the probability of a success would be the same under Group 1 or Group 2 in our example). Then, we could estimate this common p by using a weighted (“pooled”) estimator. ppool
n1 p1 n2 p2
n1 n2
n1
Y1
Y
n2 2
n1
n2
n1 n2
Y1 Y2
n1 n2
Little Sidebar… Suppose you are flipping an unfair coin, where the probability of a heads is 0.3 and the probability of a tails is 0.7. How many heads would you expect to see if you were to flip this unfair coin ten times? Now, apply this thought process to get the expected successes for Group 1. And compute the expected successes for Group 2. InferenceforPopulationProportionsandAssociation
Page6
Fill out the “Expected Table” for the Group 1/Group 2 success/failure contingency table. Success Failure Group 1
Group 2
Things to remember The E’s (expected counts) need not be integers and we do not round them The row and column totals are the same for observed and expected tables (this is a good way to check your calculations!) For the Chi‐square test (we’ll begin implementing in just a moment) to be valid, we need each E 1 and for the average E 5 InferenceforPopulationProportionsandAssociation
Page7
Calculating P‐values under the 2 distribution The 2 distribution is a right skewed distribution. The values of a 2 random variable are greater than or equal to 0. The 2 distribution has degrees of freedom. The degrees of freedom for a 2 test with a contingency table are df = (# of rows ‐ 1)(# of columns – 1) For a non‐directional alternative, P = P{2df X2s} If df=1, we have the option of performing a directional alternative. In this case, 1
P
2
P χ2df Xs2 ifdatadeviateinthedirectionspecifiedbyHA
0.5otherwise
InferenceforPopulationProportionsandAssociation
Page8
Example Using the table below, conduct a test of hypothesis at the = 0.01 significance level, to determine whether there is a significant difference in the probability of being angina free under Timolol or placebo. Timolol
Angina free 44 Not Angina Free 116
InferenceforPopulationProportionsandAssociation
Placebo
19 128
Page9
A Test for Association The work‐up of all the previous examples assumed we had two independent samples and we were observing those two samples for the outcome of one variable. Many times, we are in the situation where we observe one sample for two explanatory factors. Factor 2 Factor 1
Level 1
Level 1 Y1
Level 2 Level 2
Y2 In the case where we have one sample and we’re observing it for two explanatory factors, we’ll test the hypothesis of association. The test for HO: there is no association is numerically equivalent to that of HO: p1 = p2 but the hypotheses and interpretations are different. InferenceforPopulationProportionsandAssociation
Page10
Example 10.21 To study the association of hair color and eye color in a German population, an anthropologist observed a sample of 6,800 men. Hair Color
Dark
Light
Dark 726 131 Eye Color Light 3,129 2,814 Test at the = 0.05 significance level, whether hair color is associated with eye color in this population of German men. InferenceforPopulationProportionsandAssociation
Page11
General r x c Case The ideas presented in the 2 x 2 cases just presented can be easily extended to general r x c contingency tables. For the case where we have c different samples (your columns), and we’re checking each sample for different levels of the row factor, the hypothesis will change slightly. Here, we’ll test whether the distributions are the same for each sample. (Think about it, if we have more than a success and a failure, then for each column we’ll have P(level 1), P(level 2), …,P(level r). And then, the null hypothesis would be testing whether p11 = p12 = …= p1c and p21 = p22 = … = p2c, etc… This is called a compound hypothesis.) For the case where we have one sample and we’re checking that one sample for different levels of two different factors, we’ll still be testing association. Example The following table shows the observed distribution of A, B, AB, and O blood types in three samples of African Americans living in different locations. I (Florida) II (Iowa) III (Missouri)
A 122 1781
353
B 117 1351
269
AB 19 289
60
O 244 3301
713
Test at the = 0.05 level of significance, whether the distribution of blood type for African Americans is different across the three regions. InferenceforPopulationProportionsandAssociation
Page12
InferenceforPopulationProportionsandAssociation
Page13
© Copyright 2026 Paperzz