Confidence Intervals & Hypothesis Testing for Proportions Know We the symbols and the meanings can always calculate/know a statistic Often we don’t know (and will never know) the value of the parameter(s); we would have to take a census... Is that even possible? Half of U.S. college graduates agree their education was worth the cost. What are some questions you have about these findings? Discuss with a partner for 2 minutes & then be prepared to share out. Don’t look ahead! http://www.gallup.com/services/185888/gallup-purdue-index-report2015.aspxg_source=REPORT&g_medium=topic&g_campaign=tiles Who did Gallup ask? AA grads? BA/BS grads? MA/MS grads? What was Gallup’s process for selecting the graduates? What was the question that Gallup asked (how was it worded)? How many graduates did Gallup ask? Since they probably didn’t ask EVERY college graduate (do you agree with this assumption?), Gallup is trying to make a statement about ALL college graduates based on just a sample of college graduates. So do you think the sample results exactly predict the entire population of US college graduates? Do you think the entire population who feels this way is exactly 50%? More? Less? Do you agree that it could be within a range of reasonable values? The examples have described situations in which we know the value of the population parameter, p. Very unrealistic The whole point of carrying out a survey (most of the time) is that we don’t know the value of p, but we want to estimate it Think about the elections... Parties are taking a lot of sample surveys (polls) to see who is ‘leading’ Took a random sample of 2,928 adults in the US and asked them if they believed that reducing the spread of AIDS and other infectious diseases was an important policy goal for the US government. 1,551responded Spiral ‘yes;’ 53% back for a moment... Random? Large sample? Big population? Random? Large sample? Big population? (So, if we wanted to find a probability using the CLT, we could...) These are the exact conditions we must check to create a confidence interval as well More on confidence intervals in a few... The above percentage just tells us about OUR sample of those specific 2,928 people. What about another sample? Would we get a different % of yes’s? What about the percentage of all adults in the US who believe this? Do you think the percentage of all adults in the US who believe this is exactly 53%? If not, then what is a reasonable ‘guess range’? Come up to the board and write your reasonable/likely ‘guess range.’ How about: From 0% to 100%? What do you think about that reasonable ‘guess range’? We don’t know p, population parameter; we do know p ˆ for this sample; it’s 53%. We also know: Our estimate (53%) is unbiased; remember sampling distributions? (maybe not exactly = p; maybe just a little low or a little high) Standard error (typical amount of variability) is pˆ (1 pˆ ) 0.53(1 0.53) about 0.0092 0.9% n 2928 Because we have a ‘large sample,’ the probability distribution of our p ˆ s is close to Normally distributed & centered around the true population parameter. True, unknown population parameter probably centered around 0.53; Normally distributed; standard error (SD; amount of variability in sample statistic) = 0.009; so ... About 68% of the data is as close or closer than 1 standard error away from the unknown population parameter, p 95% of the data is as close or closer than 2 standard errors away from the unknown population parameter, p 99.7% of the data is as close or closer than 3 standard errors away from the unknown population parameter, p True, unknown population parameter probably centered around 0.53; Normally distributed; standard error (SD) = 0.009 So we can be highly confident, 99.7% confident, that the true, unknown population proportion, p, is between 0.53 + (3)(0.009) to 0.53 – (3)(0.009) This is a confidence interval; we are 99.7% confident that the the interval from about 50.3% to 55.7% captures the true, unknown population proportion of Americans who believe that reducing the spread of AIDS and other infectious diseases is an important policy goal for the US government. Confidence interval; we are 99.7% confident that the the interval from about 50.3% to 55.7% captures the true, unknown population proportion of Americans who believe that reducing the spread of AIDS and other infectious diseases is an important policy goal for the US government. How did we do on our reasonable guesses? Look on the board... True, unknown population parameter probably centered around 0.53; Normally distributed; standard error (SD) = 0.009 What if we wanted to construct a confidence interval in which we are 95% confident? Let’s try it now. Let’s do the calculations... What about a 90% confidence interval? Can we do the calculations by hand? Why/why not? Let’s use StatCrunch... What proportion of us have at least one tattoo? So our sample statistic, our p ˆ= If we were to ask another group of COC students, we would get another (likely different) p ˆ? 445 Math 075 students were asked this last Spring; 133/445 = 0.299 = 29.9% had at least one tattoo Remember, larger n, generally less variation; but still value (unbiased estimator) centered at same We want to be able to say with a high level of certainty what proportion of all COC students have at least one tattoo. But we don’t know the true, unknown population parameter, p. We don’t know p (population parameter) ˆ (sample statistic); actually we have 2 We do know p sample statistics – our class and the Math 075 data Our estimators are unbiased (what does that mean?) check conditions for each of our samples: Let’s Random selection; Large sample; Big population Can we use either sample statistic (either pˆ )? If so, which should we use? Calculate our standard deviation (our standard error) Our distribution is ≈ Normal (because our conditions are met), centered around p ˆ ; 68% with 1 SD; 95% within 2 SDs; 99.7% within 3 SDs Let’s create a 95% confidence interval ... We are 95% confident that the interval from _____ to _____ captures the true, unknown population parameter, p, the proportion of all COC students that have at least one tattoo. This is a confidence interval with a 95% confidence level Our distribution is ≈ Normal (because our conditions are met), centered around p ˆ ; 68% with 1 SD; 95% within 2 SDs; 99.7% within 3 SDs Let’s create a 99.7% confidence interval ... We are 99.7% confident that the interval from _____ to _____ captures the true, unknown population parameter, p, the proportion of all COC students that have at least one tattoo. This is a confidence interval with a 99.7% confidence level Our distribution is ≈ Normal (because our conditions ˆ ; 68% with 1 SD; 95% are met), centered around p within 2 SDs; 99.7% within 3 SDs How about a 90% confidence interval or 99% confidence interval? StatCrunch! We are xx% confident that the interval from _____ to _____ captures the true, unknown population parameter, p, the proportion of all COC students that have at least one tattoo. How will we ever know if we did a good job estimating the true proportion of all COC students who have at least one tattoo? What did you notice about the lengths of our confidence intervals as we changed confidence levels? More on this a little later... Statistical inference provides methods for drawing conclusions about a population based on sample data Methods used for statistical inference assume that the data was produced by properly randomized design Confidence intervals, are one type of inference, and are based on sampling distributions of statistics. The other type of inference we will learn and practice is hypothesis testing (more on this later). Estimator ± margin of error Our estimator we just used was our sample proportion, our Our margin of error we just used was our standard error, our standard deviation, multiplied by the number of standard deviations away we are from the center Margin of error tells us amount we are most likely ‘off’ with our estimate Margin of error helps account for sampling variability (NOT any of the bias’ we discussed...voluntary response, nonresponse, et.) pˆ that the mean temperature in Santa Clarita in degrees Fahrenheit is between -50 and 150? that the mean temperature in Santa Clarita in degrees Fahrenheit is between 70 and 70.001? that the mean temperature in Santa Clarita in degrees Fahrenheit is between -50 and 150? that the mean temperature in Santa Clarita in degrees Fahrenheit is between 70 and 70.001? In general, large interval high confidence level; small interval lower confidence level 99% confidence level 95% confidence level 90% confidence level Typically we want both: a reasonably high confidence level AND a reasonably small interval; but there are trade-offs; more on this in a little bit Will we ever know for sure if we captured the true unknown population parameter p? No. Actual p is unknown. Interpretation of a confidence interval: “I am ___% confident that the interval from _____ to _____ captures the true, unknown population proportion of (context).” Stat, Proportion Stat, One Sample, With Summary # of successes: 7 # of observations: 10 Confidence Interval for p Level: .90 Now, change to 700, 1000, & .90; what do you observe about the width of the confidence interval? Now, change to 7000, 10000, & .90; what do you observe about the width of the confidence interval? Stat, Proportion Stat, One Sample, With Summary # of successes: 7000 # of observations: 10,000 Confidence Interval for p Change Level to: .99 What do you observe about the width of the confidence interval? Now, change to 0.60 level; what do you observe about the width of the confidence interval? The lower the confidence level (say 10% confident), the shorter/more narrow the confidence interval (I am 10% confident that the mean temperature in Santa Clarita is between 70.01 degrees and 70.02 degrees) The higher the confidence level (say 99% confident), the wider the confidence interval (I am 99% confident that the mean temperature in Santa Clarita is between 40 degrees and 100 degrees) Also, the larger the n (sample size), shorter the confidence interval (small MOE) Smaller the n (sample size), longer the confidence interval (large MOE) So, if you want (need) high confidence level AND small(er) interval (margin of error), it is possible if you are willing to increase n Can be expensive, time-consuming Sometimes In not realistic (why?) reality, you may need to compromise on the confidence level (lower confidence level) and/or your n (smaller n). Alcohol abuse is considered by some as the #1 problem on college campuses. How common is it? A recent SRS of 10,904 US college students collected information on drinking behavior & alcohol-related problems. The researchers defined “frequent binge drinking” as having 5 or more drinks in a row 3 or more times in the past 2 weeks. According to this definition, 2,486 students were classified as frequent binge drinkers. Based on these data, what can we say about the proportion of all US college students who have engaged in frequent binge drinking? Let’s create a confidence interval so we can approximate the true population proportion of all US college students who engaged in frequent binge drinking. How confident do we want to be (i.e, what confidence level do we want to use)? We must check conditions before we calculate a confidence interval... Random? Large sample? Big population? Perform Stat Crunch calculations stat, proportion stat, 1 sample, with summary Always conclude with interpretation, in context I am 99% confident that the interval from about 22% to 24% contains the true, unknown population parameter, p, the actual proportion of all US college students who have engaged in frequent binge drinking. In a random sample of 400 Americans, each person was asked if they are satisfied with the amount of vacation time they given by their employers. 336 of them said that they were not satisfied with their vacation time. In Stat Crunch, construct a 99% confidence interval in order to estimate the true percent of all Americans that are not satisfied with their vacation time. Remember to check conditions & provide a well-worded conclusion. Review: What would happen to the width of the confidence interval if we created a 90% confidence interval? In a random sample of 72 adults in Santa Clarita, each person was asked if they support the death penalty. 31 adults in the sample said they do support the death penalty. Using Stat Crunch, calculate a 95% confidence interval to estimate the true proportion of all people in Santa Clarita that support the death penalty. Remember to check conditions & provide a well-worded conclusion ? What percent of eligible Americans vote? In 2008, a random sample of 3,000 American adults that were eligible to vote was taken and we found that 2,040 of them voted. Construct a 90% confidence interval estimate of the true population percent of all Americans that vote. Don’t forget about conditions & provide a well-worded conclusion. Choose a data set from the Math 140 Spring data (that you have not used before); it should be ‘yes/no’ or ‘black/brown/red’ or ... What type of data am I telling you to choose? Why? Cut and paste into Stat Crunch Check conditions to be sure you can calculate a meaningful confidence interval Calculate a confidence interval (you choose the confidence level) so you can confidently say something about ALL COC students based on the Math 140 data Interpret your results; be sure to use the word ‘all’ in your interpretation; print out; turn in The basics of Significance Testing Already discussed confidence intervals for unknown population parameter, p Confidence Intervals used when the goal is to estimate an unknown population parameter like ρ (like when we estimated the true proportion of all 20,000 COC students who have at least one tattoo) Now... statistical inference through significance tests Evaluate evidence (a statistic) provided by sample data about some claim concerning an unknown population parameter like ρ There once were four students who missed the midterm for their statistics class. They went to the professor together and said, “Please let us make up the exam. We carpool together, and on our way to the exam, we got a flat tire. That’s why we missed the exam.” The professor didn’t believe them, but instead of arguing he said, “Sure, you can make up the exam. Be in my office tomorrow at 8.” The next day, they met in his office. He sent each student to a separate room and gave them an exam. The exam consisted of only one question: “Which tire?” Let’s image all four students answered, “left rear tire.” So... what do you think? Were students most likely telling the truth? Lying? What are the chances of all of them guessing the same tire? Let’s simulate; using StatCrunch, input RFront, LFront, RRear, Lrear Data, sample, choose your data, sample size 4, number of samples 10, sample with replacement How many times, just by random chance, does Stat Crunch choose the same tire? Let’s create a dot plot on the board; what do you think? Assuming the students were lying, the chances that all four of them would guess the same tire, just by random chance, according to our simulation, is ... look at our dot plot... If we carried out this simulation again, would we get the same data? The same exact dot plot? Surprised or not? The professor suspected they had been lying. That’s why he did what he did. Maybe they just got lucky ... just by chance they all guess the same tire. How ‘lucky’ would they have to be? The theoretical probability that all four students would guess the same tire is about ... Do you consider that likely/typical or unlikely/rare that they could have just simply, by chance guessed the same tire? Look at our dot plot... Let’s think about another hypothesis test/inference example/situation... I claim that in the last 5 years of playing basketball, I, on average, make 90% of my basketball free throws. To test my claim, I am asked to shoot 10 free throws. I make 2 of the 10 (only 20%). Do you still believe my claim that I make 90% of my free throw? Why or why not? Do you still believe my claim that I make 90% of my free throw? Why or why not? Do you agree that statistics vary from sample to sample? So if I attempted another 10 free throws, chances are I would make something other than 2 of them? So the question is... would me actually making only 2 out of 10 (or 20%) happen so, so very rarely (assuming my 90% claim were true) that now you are starting to question/doubt if my claim really is true. Let’s simulate. Let’s assume that we believe the claim. Pull up the random digits table. Let’s say 0 through 8 represent making a basket; 9 represents missing a basket. Go into the table on a random line, look at ten 1-digit numbers, duplicates are OK. Count the number of ‘baskets’ made in ten 1-digit numbers. Do this three times. Put your magnets up on the class dot plot. So the question is... would me actually making only 2 out of 10 (or 20%) happen so, so very rarely (assuming my 90% claim were true) that now you are starting to question/doubt if my claim really is true. A formal procedure that enables us to choose between two hypotheses when we are uncertain about our measurements. Basic idea... An outcome that would rarely happen if a claim were really true is good evidence that the claim is not true. If we flip a penny, we can agree that the probability of heads or tails is 0.50; fair. However, some claim if we spin a penny on a table, because the heads side bulges outwards, the lack of symmetry will cause the spinning coin to land on one side more often than the other; probability is not 0.50 for each side; unfair Some people might find this claim outrageous; completely false Null hypothesis, Ho, p = 0.50 null hypothesis is always neutral, no change, always equal null hypothesis is always in terms of population parameter (like p or μ) Alternative hypothesis, Ha, p ≠ 0.50 - alternative hypothesis is always <, >, or ≠ alternative hypothesis is always in terms of population parameter (like p or μ) Null hypothesis, Ho, p = 0.50 In the beginning, we assume null is true (like defendant is assumed not guilty in the beginning of a trial) until there is overwhelming evidence that suggests this is not so; then we may reject this belief if/when the evidence is clearly against it Alternative hypothesis, Ha, p ≠ 0.50 The null hypothesis always gets the benefit of the doubt and is assumed to be true throughout the hypothesis-testing procedure. If we decide at the last step that the observed outcome (our sample statistic) is extremely unusual under this assumption, then and only then do we reject the null hypothesis. If null hypothesis is correct, then when we spin a coin a number of times, about ½ of the outcomes should be heads. If null hypothesis is wrong, we will see either a much larger or much smaller proportion. Let’s spin some pennies. Spin (on desk) 20 times. Count the # of heads Calculate the sample proportion, and write on board pˆ of heads Let’s look at our sampling distribution; describe using SOCS (review) If we did this again, would be get different results? How ‘extreme’ of a result would we need for you to not believe our null hypothesis/to reject null? We will come back to this later in the chapter... What’s Ho: Ho: Ho: Ho: Ho: wrong with ... p = 0.17 p = - 0.20 p > 0.45 p = 1.50 pˆ = 0.92 Ha: Ha: Ha: Ha: Ha: p p p p ≠ 0.19 < 0.15 = 0.45 > 1.50 pˆ < 0.92 A recent Gallup Poll report on a national survey of 1028 teenagers revealed that 72% of teens said they rarely or never argue with their friends. You wonder whether this national result would be different in your school. So you conduct your own survey of a random sample of students at your school. The proportion of people who live after suffering a stroke is 0.85. A drug manufacturer has just developed a new treatment that they claim will increase the survival rate. A change is made that should improve student satisfaction with the parking situation at COC. The null hypothesis, that there is an improvement, is tested versus the alternative, that there is no change. A researcher tests the following null hypothesis Ho: pˆ = 0.80 A statistics instructor at COC read that 90% of all college students use social media on a regular basis. She wonders if the percent of COC students who use social media on a regular basis is different. Ho: p = 0.90 Ha : p > 0.91 The Census Bureau reports that households typically spend 31% of their total spending on housing. A homebuilders association in Cleveland believes that it is lower in their area. They interview a sample of 40 households in the Cleveland metropolitan area to learn what percent of their spending goes toward housing. Take p to be the typical percent of spending devoted to housing among all Cleveland households. H0: p = 31% Ha: p < 31% Surprise itself; when something unexpected occurs (like only making 20% of free throws when we claimed to make 90%) Null hypothesis tells us what to expect; it’s what we believe throughout the process until we see evidence otherwise If we see something unexpected, then we should doubt the null hypothesis If we are really surprised, then we should rejected it altogether Instead of just not surprising, kind of surprising, very surprising, etc., we have... p-value A p-value is a probability. Assuming the null hypothesis is true, the p-value is the probability that if the experiment were repeated many times, we would get as extreme or more extreme outcome than the one we actually got (our statistic). A small p-value suggests that a surprising outcome has occurred and discredits the null hypothesis. A p-value is a quantitative measure of rarity of/how unlikely a finding Small p-values are evidence against Ho Large p-values fail to give evidence against Ho Understanding how to interpret a p-value is crucial to understanding hypothesis testing. StatCrunch will calculate the p-value, but we need to understand how the software did the calculation The meaning of the phrase, “as extreme as or more extreme than’ depends on the alternative hypothesis Note: the closer the number of heads is to 10, the larger the p-value Also note the p-value for an outcome of 11 heads is the same as for 9 heads, etc. Most of the time, we take one more step to assess evidence against Ho We compare the p-value to some predetermined value (versus ‘unlikely’) called a significance level, symbol α (alpha) Can think of this as a rejection zone (sketch) Significance level makes ‘not likely’ more exact, more informative Most common α levels are α = 0.05 or α = 0.01 Interpretation: At α = 0.05, data give evidence against Ho so strong it would happen no more than 5% of the time If p-value is as small or smaller than α, we say data are statistically significant at level α Note: ‘significant’ in statistics doesn’t mean important (like in English); it means not likely to happen by chance Ho: p = ... Ha: p ... I gathered sample data, and calculated a pvalue based on sample data (probability of getting that value or more extreme assuming that null hypothesis is true) 1-sided 2-sided If p-value is p = 0.03... this is significant at α = 0.05 level (in rejection zone) If p-value is p = 0.03... this is not significant at α = 0.01 level (not in rejection zone) Reject Ho (Null Hypothesis): This happens when sample statistic is statistically significant, p-value is too unlikely to have occurred by chance (we don’t believe null hypothesis), in the rejection zone Wording must reference all of the following for a complete interpretation... p-value, α level, reject Ho, and conclusion in context (caution about using the word ‘cause’ or ‘prove’). Fail to Reject Ho (Null Hypothesis): This happens when sample statistic could have occurred by chance (we do believe null hypothesis; we don’t believe the alternative), not in rejection zone Wording must reference all of the following for a complete interpretation... p-value, α level, fail to reject Ho, and conclusion in context (caution about using the word ‘cause’ or ‘prove’) Random Sample ... randomly selected or randomly assigned Large Sample Size; Normality (see next slide) ... npo ≥ 10 and n(1 – po) ≥ 10; the sample has at least 10 expected successes and at least 10 expected failures Big Population (Independence) ... Population at least 10 times sample size; and each observation has no influence on any other ...if these conditions are satisfied, then we can use the Central Limit Theorem for sample proportions; distribution is ≈ Normal! That’s a great thing! When doing a hypothesis test, you MUST check conditions... this is an essential part of the hypothesis testing process According to the National Institute for Occupational Safety and Health, job stress poses a major threat to the health of workers. A national survey of restaurant employees found that 75% said that work stress had a negative impact on their personal lives. A random sample of 100 employees from a large restaurant chain finds that 68 answer “Yes” when asked, “Does work stress have a negative impact on your personal life?” Is this good reason to think that the proportion of all employees in this chain who would say “Yes” differs from the national proportion p0 = 0.75? H0: p = 0.75 Ha: p ≠ 0.75 We want to test a claim about p, the true proportion of all of this chain's employees who would say that work stress has a negative impact on their personal lives. Conditions: 1-sample proportion hypothesis test; α = 5% (rejection zone) Random Sample – stated in problem Large Sample Size/Normality - The expected number of “Yes” and “No” responses are (100)(0.75) = 75 and (100)(0.25) = 25, respectively. Both are at least 10. Big Population (Independence) - Since we are sampling without replacement, this “large chain” must have at least (10)(100) = 1000 employees. Calculations for 1-sample proportion 2-sided hypothesis test; use Stat Crunch Stat, proportion stats, 1 sample, with summary z = -1.616 P-value = 0.1059 Interpretation: Fail to reject Ho. With a p-value of 0.1059 and an α = 5%, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest that the proportion of this chain restaurant's employees who suffer from work stress is different from the national survey result, 0.75. Fail to reject Ho. With a p-value of 0.1059 and an α = 5%, we fail to reject the null hypothesis and conclude that there is not enough evidence to suggest that the proportion of this chain restaurant's employees who suffer from work stress is different from the national survey result, 0.75. Run a Confidence Interval; which confidence level? Why can we do this? In a recent study, 73% of first-year college students responding to a national survey identified “being very well-off financially” as an important personal goal. A state university finds that 132 of a random sample of 200 of its firstyear students say that this goal is important. Is there evidence that the proportion of all firstyear students at this university who think being very well-off is important differs from the national value, 73%? Carry out a significance test to help answer this question. We want to test Ho: p = 0.73 versus Ha: p ≠ 0.73 regarding the proportion of all first-year students at this university who think being very well-off is important differs from the national value of 73%. Conditions: 1-sample proportion hypothesis test; α = 5% (rejection zone) Random Sample/SRS – stated in problem Large Sample Size/Normality – np ≥ 10 & n (1 – p) ≥ 10 (200)(0.73) ≥ 10 & (200) (1 -0.73) ≥ 10 Big Population (Independence) – We must assume at least (10)(200) first-year students in the population. Calculations... Stat Crunch Stat, proportion stats, 1 sample, with summary z = -2.22 P-value = 0.0258 Reject Ho. With a p-value of 0.0258, and assuming an α = 0.05, we have statistically significant evidence that the proportion of all first-year students at this university who think being very well-off is important differs from the national value. (decision, p-value, α, and context... always in terms of alternative hypothesis) Reject Ho. With a p-value of 0.0258, and assuming an α = 0.05, we have statistically significant evidence that the proportion of all first-year students at this university who think being very well-off is important differs from the national value. What if.... Our alpha had been 1%? Would our decision have changed? Reject Ho. With a p-value of 0.0258, and assuming an α = 0.05, we have statistically significant evidence that the proportion of all first-year students at this university who think being very well-off is important differs from the national value. Let’s go back to 5% alpha level; Can we calculate a confidence interval? Why? How? Does it confirm out findings? Do we have to calculate a CI to confirm our findings every time we conduct an hypothesis test? Researchers wondered whether a greater proportion of people now dream in color than did so before color television and movies became as prominent as they are today. In the past, before color TV and movies, this proportion was 0.29. Recently researchers took a random sample of 113 people. Of these 113 people, 92 reported dreaming in color. Is there evidence (at a significance level of 1%) that more people today dream in color than in the past (before color TV and movies became as prominent as they are today)? Carry out an appropriate hypothesis test to help answer this question. What are our null and alternative hypotheses? Conditions: 1-sample proportion; α = 1% (rejection zone) Random Sample – Large Sample Size/Normality Big Population/Independence – Calculations – Determination and interpretation - What are our null and alternative hypotheses Ho: p = 0.29 Ha: p > 0.29 Conditions 1-sample proportion hypothesis test; α = 1% (rejection zone) Random Sample Large Sample Size/Normality Big Population/Independence Calculations z = 12.28, p-value ≈ 0 Decision and interpretation Reject null hypothesis. At an alpha level of 1%, and a p-value of about zero, there is sufficient evidence to suggest that more people today dream in color than in the past (before color TVs, etc.) Can we/should we calculate a confidence interval to confirm our findings? Ho: Ha : p1 = p 2 p1 ≠ or > or < p2 Stat Crunch will calculate this for us; no need to memorize Random; each n must be randomly selected or randomly assigned; each n must be independent from the other Large Count/Normality: Each of the following must be ≥ 10: n1 pˆ1 10 n1(1 pˆ1) 10 n2 pˆ 2 10 n2 (1 pˆ 2 ) 10 Big Population Each of the populations must be at least (10) times each of the corresponding sample sizes To study the long-term effects of preschool programs for poor children, a research foundation followed two randomly-chosen/assigned groups of Michigan children since early childhood. A control group of 61 children represents population 1, poor children with no preschool. Another group of 62 from the same area and similar backgrounds attended pre-school as 3- and 4year-olds represents population 2, poor children who attend pre-school. Sizes are n1 = 61 and n2 = 62. One response variable of interest is the need for social services as adults. In the past ten years, 38 of the preschool sample and 49 of the control sample have needed social services (mainly welfare). Carry out an hypothesis test to determine if there is significant evidence that pre-school reduces or increases the later need for social services? State null and alternative hypothesis Ho: pno pre-school = ppre-school OR pno pre-school - ppre-school = 0 Ha: pno pre-school ≠ ppre-school OR pno pre-school - ppre-school ≠ 0 Where p is the true, unknown population proportion for all children like these needing social services Procedure: 2-proportion hypothesis test Random, Large Count/Normal, Big Population Stat Crunch to calculate test statistic, p-value, etc. Stats, proportion stats, two sample, with summary z = -2.3201 p-value = 0.0203 Interpretation: Reject null hypothesis. At a significance level of 5% (α = 0.05), and a p-value of approximately 0.02 there is sufficient evidence to show that p no pre-school ≠ p pre-school (or evidence that pre-school reduces or increases/changes the later need for social services) Reject null hypothesis. At a significance level of 5% (α = 0.05), and a p-value of approximately 0.02 there is sufficient evidence to show that p no pre-school ≠ p pre-school (or evidence that pre-school reduces or increases/changes the later need for social services) Can/should we calculate a confidence interval to confirm our findings? Why? How? The elderly fear crime more than younger people, even though they are less likely to be victims of crime. One of the few studies that looked at older blacks recruited random samples of 56 black women and 63 black men over the age of 65 from Atlantic City, New Jersey. Of the women, 27 said they “felt vulnerable” to crime; 46 of the men said this. What proportion of women in the sample feel vulnerable? Of men? (Note: Men are victims of crime more often than women, so we expect a higher proportion of men to feel vulnerable.) Test the hypothesis that the true, unknown population proportion of all elderly black males who feel vulnerable is higher than that of all elderly black women who feel vulnerable. You may assume that all conditions have been checked and met. sample statistics: 46/63 men & 27/56 women z = 2.7731 Reject null hypothesis. At any reasonable alpha level, with a p-value less than 1%, we have evidence to suggest that the proportion of all black men who feel vulnerable is higher than the proportion of all black women who feel vulnerable. Can/should we calculate a confidence interval to confirm our findings? P-value = 0.0028 California’s controversial ‘three strikes law’ requires judges to sentence anyone convicted of three felony offenses to life in prison. Supporters say that this decreases crime; opponents argue that people serving life sentences have nothing to lose, so violence within the prison system increases. Researchers looked at data from the California Department of Corrections. Of 734 randomly-selected prisoners who had three strikes, 163 of them had committed ‘serious’ offenses while in the prison system Of 3,188 randomly-selected prisoners who did not have three strikes, 974 had committed ‘serious’ offenses while in the prison system Determine whether those with three strikes tend to have more offenses than those who do not. Use a 5% significance level. Sample statistic for prisoners who had three strikes was 163/734 ≈ 22.2% Sample statistic for prisoners who did not have three strikes was 974/3188 ≈ 30.6% z = - 4.49 P-value = 0.9999 Fail to reject null hypothesis. At a 5% alpha level and a p-value ≈ 1, there is not sufficient evidence to conclude that all prisoners who have three strikes commit more serious offences within the prison system than all prisoners who do not have three strikes. Fail to reject null hypothesis. At a 5% alpha level and a p-value ≈ 1, there is not sufficient evidence to conclude that all prisoners who have three strikes commit more serious offences within the prison system than all prisoners who do not have three strikes. Confidence interval? Why? Why not? High levels of cholesterol in the blood are associated with higher risk of heart attacks. Will using a drug to lower blood cholesterol reduce heart attacks? The Helsinki Heart Study looked at this question. Middle-aged men were assigned at random to one of two treatments: 2,051 men took the drug gemfibrozil to reduce their cholesterol levels, and a control group of 2,030 men took a placebo. During the next five years, 56 men in the gemfibrozil group and 84 men in the placebo group had heart attacks. Is the apparent benefit of gemfibrozil statistically significant? Use a 1% alpha level. We want to draw conclusions about p1, the proportion of all middle-aged men who would suffer heart attacks after taking gemfibrozil, and p2, the proportion of all middle-aged men who would suffer heart attacks if they only took a placebo. We hope to show that gemfibrozil reduces heart attacks, so we have a one-sided alternative. n gemfibrozil = 2,051 x gemfibrozil = 56 n placebo = 2,030 x placebo = 84 Sample statistic for gemfibrozil = 56/2051 ≈2.7% had heart attacks Sample statistic for placebo = 84/2030 ≈ 4.1% had heart attacks Is this difference just due to chance? Or is there really a difference between the medication and the placebo? Perform an hypothesis test to help you come to a conclusion. Significance tests are used in a variety of settings... Marketing, FDA drug testing, discrimination court cases, etc. Significance tests quantify event that is unlikely to occur simply by chance Different levels of significance (α) are chosen depending on the given situation; typically α = 0.10, 0.05, or 0.01 Continue to use caution when using “prove” or “cause”... even when doing hypothesis testing P-values allow us to decide individually if evidence is sufficiently strong But, there is still no practical distinction between p-values of, say, 0.049 and 0.051 if our alpha level was, say, 5% Statistical inference does not correct basic flaws in survey or experimental design, such as ... Sometimes we do everything correctly... data collection, conditions, calculations, interpretation... but we still make an incorrect decision/determination... perhaps we just happen to get a sample statistic that is very extreme... that really doesn’t represent our population accurately ... we reject the null hypothesis when we really should have failed to reject (Ho was really true) OR we fail to reject the null hypothesis when we really should have rejected the null hypothesis (Ho was really false) ... we make an “error”... that is NOT our fault! Type I Error We reject Ho (null hypothesis) when Ho is really true In other words, we determine Ha (alternative hypothesis) is true when, in actuality, Ho (null hypothesis) is true Type II Error We fail to reject Ho (null hypothesis) when Ho is really false In other words, we determine Ho (null hypothesis) is true, when, in reality, Ha (alternative hypothesis) is true • Probability of Type I Error (rejecting Ho when null is really true): α, your significance level for the hypothesis test. Probability of Type II Error (failing to reject Ho when alternative is really true): β. Very complicated to calculate. Power: Probability that a test will reject Ho when Ha is true Think of power as making the correct decision, not making an error, not making a mistake High level of power is a good thing Power = 1 – β (remember β is probability of making a type II error); so ‘power’ and β are complimentary How can we increase power (making the correct decision)? Increase α Increase n Decrease standard deviation (same effect as increasing the sample size, n) Following 3 problems are some practice problems in which you need to decide what procedure to do and why. Be prepared to defend your choice of procedures. After deciding which procedure, go ahead and check conditions and run the procedure. Work with a partner. You try it for 15-20 minutes then we will debrief. 1. A question was asked in a tattoo magazine whether a man or a woman is more likely to have a tattoo. A random sample of 857 men found that 146 of them had at least one tattoo. A random sample of 794 women found that 137 of them had at least one tattoo. 2. A body mass index of 20-25 indicates that a woman is of a healthy weight. A recent inter-national study reported that 30% of all adult women in the world maintain a healthy BMI. A random survey of 745 women living in Los Angeles found that 198 of them had a healthy BMI score. We wonder if the inter-national study results are also true for women who live in Los Angeles. 3. In March 2003, a research group asked 2400 randomly selected Americans whether they believe that the U.S. made the right or wrong decision to use military force in Iraq? Of the 2400 adults, 1862 said that they believed that the U.S. did make the correct decision. In February 2008, the question was asked again to 2180 randomly selected Americans and 684 of them said that the U.S. did make the correct decision. Has the proportion of all Americans’ opinions changed between 2003 and 2008? 1-proportion confidence interval 2-proportion confidence interval 1-proportion hypothesis test 2-proportion hypothesis test All inference processes/methods, making a statement about a population based on a sample statistic
© Copyright 2026 Paperzz