Overview: Part I December 3, 2012 Basics Sources of data Sample surveys Experiments 1.0 Basics Observational Units. Variables, Scales of Measurement. 1.1 Walking and Texting An article in Seattle Times with the headline” “Walking and texting: watch out!” . Here is an excerpt. Sometimes, pedestrians using phones do not notice objects or people right in front of them, even a clown riding a unicycle! That was the finding of a recent study at Western Washington University in Bellingham by a psychology professor, Ira Hyman, and his students. One student dressed as a clown and unicycled around a central square on campus. About half the people walking in the square by themselves said they had seen the clown; the number was slightly higher for people walking in pairs. But only 25 percent of people talking on a cellphone said they had, Hyman said. 1.1 Walking and Texting 1. The relationship between two variables is under study here. What are they? What is the observational unit? Are the variables quantitative or qualitative? 2. In a study of the relationship between variables, one variable is usually called the response and the other an explanatory variable. Which of the variables mentioned in the previous part is the response and which is explanatory? 1.2 Education and Mortality An article in Seattle Times with the headline “Educating women saves millions of kids”. Here is an excerpt. Giving young women an education resulted in saving the lives of more than 4 million children worldwide in 2009, a new study says. American researchers analyzed 915 censuses and surveys from 175 countries tracking education, economic growth, HIV rates and child deaths from 1970 to 2009. 1.2 Education and Mortality 1. The relationship between two variables is under study here. What are they? What is the observational unit? Are the variables quantitative or qualitative? 2. In a study of the relationship between variables, one variable is usually called the response and the other an explanatory variable. Which of the variables mentioned in the previous part is the response and which is explanatory? 1.3 Importance of Observational Unit A common misconception is that relationships that emerge when data have been aggregated in some way also apply to individuals who form the aggregate. This is not always true. Nation-wide census in 1930 found that states with a larger % who are foreign born have a higher literacy rate. The correlation is 0.53. Does this imply that a person who is foreign born is more likely to be literate? Not necessarily. The correlation computed at the individual level is -0.11. Conclusions are restricted to the level of observational unit. 2.0 Sources of Data Sample Surveys Experiments: observational, controlled. Identify the source of the data on which conclusions are based in each of the following: 1. A newspaper article about a book entitled Wilderness Within, Wilderness Without, by Shannon Szwarc, tells how “living in a rugged outdoor environment with firm but nurturing counselors” at a therapeutic wilderness camp transformed the author. A reader of the news article concludes that the wilderness program is effective. 2. Choosing the winner of American Idol. 2.0 Sources of Data Identify the source of the data on which conclusions are based in each of the following: 1. A Swedish researcher proposed a theory that links the production of shoes to the prevalence of schizophrenia: “Heeled footwear began to be used more than 1,000 years ago, and led to the occurrence of the first cases of schizophrenia ... Mechanization of the production started in Massachussetts, spread from there to England and Germany, and then to the rest of Western Europe. A remarkable increase in schizophrenia prevalence followed the same pattern.” 2. You find 30 adults and divide them into two groups. One group is told to wear a jacket on cold days, the other is told not to. You then compare the number from each group who get sick after a string of cold days. 3.0 Sample Surveys Sampling Designs: convenience, voluntary response, probability. Terminology: population, sample, parameter, statistic. Behavior of samples: bias versus variability. Simple random sampling: M.O.E., confidence. 3.1 Sampling Design A manufacturer of rubber wishes to evaluate certain characteristics of its product. It is known that their bales of synthetic rubber are stored on 300 pallets with a total of 15 bales per pallet. What type of sampling scheme is being implemented under the following scenarios? Include a brief justification for your answer. 1. Five pallets of bales are randomly chosen; then eight bales of rubber are randomly selected from each pallet. 2. Forty bales are randomly selected from the 4,500 bales. 3. All bales that face the warehouse aisles and can be reached by a forklift truck are selected. 3.2 Sampling Design 1. An opinion poll on whether biology teachers back biblical creationism sent questionnaires to 400 teachers at random from the National Science Teachers association list. Of these, only 200 usable responses were received. Identify the sampling scheme. 2. The Seattle Times conducts a poll on whether the people of Seattle believe red light cameras are effective. The newspaper contacts 1000 subscribers to get their opinions. Identify the sampling scheme. 3.3 Terminology 1. The Seattle Times conducts a poll on whether the people of Seattle believe red light cameras are effective. The newspaper contacts 1000 subscribers. The population of this poll is: 1.1 1.2 1.3 1.4 The 1000 people surveyed Those who favor or disapprove of red light cameras Their subscribers People who live in Seattle 2. Ballard High School announces the results of a survey – 31% of the senior class has an Ipod. This result was based on a random sample of 100 seniors. What is the parameter? 2.1 2.2 2.3 2.4 The random sample of 100 students Ballard High School The percentage of the senior class who has an Ipod. 31%. 3.4 Bias and Variability Two samples from the same population will almost never give the same estimates. The following figure shows the behavior of the sample statistic in many samples in four situations. Label each graph as as showing high or low bias and as showing high or low variability. (a) Population parameter (c) Population parameter (b) Population parameter (d) Population parameter 3.4 Bias and Variability Determine if there is any bias in the following sampling designs. 1. The first 50 people exiting a movie are asked what type of movie people in the town like to see. 2. A librarian randomly selects 100 titles from the library data base to calculate the average length of a library book. 3.4 Bias and Variability Identify the source of error in each of the following: 1. Although 18% of the students in the student body are minorities, in a random sample of 20 students, 5 are minorities. 2. In a survey about sexual habits, an embarrassed student deliberately gives the wrong answer. 3. A surveyor mistakenly records answers to one question in the wrong space. 3.4 Bias and Variability Which of the following are true statements about random sampling error. 1. Random sampling error can be eliminated only if a survey is extremely well designed and also well conducted. 2. Random sampling error concerns natural variation between samples, is always present, and can be described using probability. 3. Random sampling error is generally smaller when the sample size is larger. 3.5 M.O.E. and Simple Random Sampling 1. A survey organization wants to take a S.R.S. in order to estimate the proportion of people who have a seen a certain T.V. program. Their client will only tolerate a chance error of 1 % point. How large a sample should they use: 100; 2,500 or 10,000? 2. One public opinion poll uses a S.R.S. of size 1,500 drawn from a town with a population of 25,000. Another poll uses a S.R.S. of size 1,500 from a town with population of 250,000. The polls are trying to estimate the proportion of voters who favor single payer health insurance. Other things being equal, which poll is likely to be more accurate? Or is there no difference in accuracy? 4.0 Experiments Drawing schematics to de-construct an experiment. Understanding confounding. 4.1 Design Schematic Oregon has an experimental program to rehabilitate prisoners before their release. The object is to reduce the “recidivism” rate. Prisoners volunteer for the program which lasts several months. Some prisoners drop out before completing the program. To evaluate the program, investigators compared prisoners who completed the program with prisoners who dropped out. The recidivism rate for those who completed the program was 29%. For the dropouts, the recidivisim rate was 74%. The difference was highly statistically significant. On this basis, investigators argued that the program worked. Draw a schematic. Are you skeptical of the results? Why? 4.2 Bells and Whistles 1. Random assignment is important in experimental design because it: 1.1 1.2 1.3 1.4 Reduces bias Creates groups that are similar in all variables Mitigates the effects of lurking variables All of the above 2. The following feature of a designed experiment (when present) enables cause-effect conclusions to be drawn. 2.1 2.2 2.3 2.4 Placebo Double blinding Random assignment Informed consent 4.3 Confounding Epidemiologists find an association between high levels of cholesterol in the blood and heart disease. They conclude that cholesterol causes heart disease. However, a statistician argues that smoking confounds the association. This means one of the following. Which one? 1. Smoking causes heart disease. 2. Smoking is associated with heart disease, and smokers have high levels of cholesterol in their blood. 3. Smokers tend to eat a less healthful diet than non-smokers. Thus, smokers have high levels of cholesterol in the blood, which in turn causes heart disease.
© Copyright 2025 Paperzz