Overview: Part I

Overview: Part I
December 3, 2012
Basics
Sources of data
Sample surveys
Experiments
1.0 Basics
Observational Units.
Variables, Scales of Measurement.
1.1 Walking and Texting
An article in Seattle Times with the headline” “Walking and
texting: watch out!” . Here is an excerpt.
Sometimes, pedestrians using phones do not notice
objects or people right in front of them, even a clown
riding a unicycle! That was the finding of a recent
study at Western Washington University in
Bellingham by a psychology professor, Ira Hyman,
and his students.
One student dressed as a clown and unicycled around
a central square on campus. About half the people
walking in the square by themselves said they had
seen the clown; the number was slightly higher for
people walking in pairs. But only 25 percent of
people talking on a cellphone said they had, Hyman
said.
1.1 Walking and Texting
1. The relationship between two variables is under study
here. What are they? What is the observational unit?
Are the variables quantitative or qualitative?
2. In a study of the relationship between variables, one
variable is usually called the response and the other an
explanatory variable. Which of the variables mentioned in
the previous part is the response and which is
explanatory?
1.2 Education and Mortality
An article in Seattle Times with the headline “Educating
women saves millions of kids”. Here is an excerpt.
Giving young women an education resulted in saving
the lives of more than 4 million children worldwide in
2009, a new study says.
American researchers analyzed 915 censuses and
surveys from 175 countries tracking education,
economic growth, HIV rates and child deaths from
1970 to 2009.
1.2 Education and Mortality
1. The relationship between two variables is under study
here. What are they? What is the observational unit?
Are the variables quantitative or qualitative?
2. In a study of the relationship between variables, one
variable is usually called the response and the other an
explanatory variable. Which of the variables mentioned in
the previous part is the response and which is
explanatory?
1.3 Importance of Observational Unit
A common misconception is that relationships that emerge
when data have been aggregated in some way also apply to
individuals who form the aggregate. This is not always true.
Nation-wide census in 1930 found that states with a larger %
who are foreign born have a higher literacy rate. The
correlation is 0.53.
Does this imply that a person who is foreign born is more
likely to be literate? Not necessarily. The correlation
computed at the individual level is -0.11.
Conclusions are restricted to the level of observational unit.
2.0 Sources of Data
Sample Surveys
Experiments: observational, controlled.
Identify the source of the data on which conclusions are based
in each of the following:
1. A newspaper article about a book entitled Wilderness
Within, Wilderness Without, by Shannon Szwarc, tells
how “living in a rugged outdoor environment with firm
but nurturing counselors” at a therapeutic wilderness
camp transformed the author. A reader of the news
article concludes that the wilderness program is effective.
2. Choosing the winner of American Idol.
2.0 Sources of Data
Identify the source of the data on which conclusions are based
in each of the following:
1. A Swedish researcher proposed a theory that links the
production of shoes to the prevalence of schizophrenia:
“Heeled footwear began to be used more than 1,000 years
ago, and led to the occurrence of the first cases of
schizophrenia ... Mechanization of the production started
in Massachussetts, spread from there to England and
Germany, and then to the rest of Western Europe. A
remarkable increase in schizophrenia prevalence followed
the same pattern.”
2. You find 30 adults and divide them into two groups. One
group is told to wear a jacket on cold days, the other is
told not to. You then compare the number from each
group who get sick after a string of cold days.
3.0 Sample Surveys
Sampling Designs: convenience, voluntary response,
probability.
Terminology: population, sample, parameter, statistic.
Behavior of samples: bias versus variability.
Simple random sampling: M.O.E., confidence.
3.1 Sampling Design
A manufacturer of rubber wishes to evaluate certain
characteristics of its product. It is known that their bales of
synthetic rubber are stored on 300 pallets with a total of 15
bales per pallet. What type of sampling scheme is being
implemented under the following scenarios? Include a brief
justification for your answer.
1. Five pallets of bales are randomly chosen; then eight bales
of rubber are randomly selected from each pallet.
2. Forty bales are randomly selected from the 4,500 bales.
3. All bales that face the warehouse aisles and can be
reached by a forklift truck are selected.
3.2 Sampling Design
1. An opinion poll on whether biology teachers back biblical
creationism sent questionnaires to 400 teachers at
random from the National Science Teachers association
list. Of these, only 200 usable responses were received.
Identify the sampling scheme.
2. The Seattle Times conducts a poll on whether the people
of Seattle believe red light cameras are effective. The
newspaper contacts 1000 subscribers to get their
opinions. Identify the sampling scheme.
3.3 Terminology
1. The Seattle Times conducts a poll on whether the people
of Seattle believe red light cameras are effective. The
newspaper contacts 1000 subscribers. The population of
this poll is:
1.1
1.2
1.3
1.4
The 1000 people surveyed
Those who favor or disapprove of red light cameras
Their subscribers
People who live in Seattle
2. Ballard High School announces the results of a survey –
31% of the senior class has an Ipod. This result was
based on a random sample of 100 seniors. What is the
parameter?
2.1
2.2
2.3
2.4
The random sample of 100 students
Ballard High School
The percentage of the senior class who has an Ipod.
31%.
3.4 Bias and Variability
Two samples from the same population will almost never give
the same estimates. The following figure shows the behavior
of the sample statistic in many samples in four situations.
Label each graph as as showing high or low bias and as
showing high or low variability.
(a)
Population parameter
(c)
Population parameter
(b)
Population parameter
(d)
Population parameter
3.4 Bias and Variability
Determine if there is any bias in the following sampling
designs.
1. The first 50 people exiting a movie are asked what type
of movie people in the town like to see.
2. A librarian randomly selects 100 titles from the library
data base to calculate the average length of a library
book.
3.4 Bias and Variability
Identify the source of error in each of the following:
1. Although 18% of the students in the student body are
minorities, in a random sample of 20 students, 5 are
minorities.
2. In a survey about sexual habits, an embarrassed student
deliberately gives the wrong answer.
3. A surveyor mistakenly records answers to one question in
the wrong space.
3.4 Bias and Variability
Which of the following are true statements about random
sampling error.
1. Random sampling error can be eliminated only if a survey
is extremely well designed and also well conducted.
2. Random sampling error concerns natural variation
between samples, is always present, and can be described
using probability.
3. Random sampling error is generally smaller when the
sample size is larger.
3.5 M.O.E. and Simple Random
Sampling
1. A survey organization wants to take a S.R.S. in order to
estimate the proportion of people who have a seen a
certain T.V. program. Their client will only tolerate a
chance error of 1 % point. How large a sample should
they use: 100; 2,500 or 10,000?
2. One public opinion poll uses a S.R.S. of size 1,500 drawn
from a town with a population of 25,000. Another poll
uses a S.R.S. of size 1,500 from a town with population
of 250,000. The polls are trying to estimate the
proportion of voters who favor single payer health
insurance. Other things being equal, which poll is likely to
be more accurate? Or is there no difference in accuracy?
4.0 Experiments
Drawing schematics to de-construct an experiment.
Understanding confounding.
4.1 Design Schematic
Oregon has an experimental program to rehabilitate prisoners
before their release. The object is to reduce the “recidivism”
rate. Prisoners volunteer for the program which lasts several
months. Some prisoners drop out before completing the
program.
To evaluate the program, investigators compared prisoners
who completed the program with prisoners who dropped out.
The recidivism rate for those who completed the program was
29%. For the dropouts, the recidivisim rate was 74%. The
difference was highly statistically significant. On this basis,
investigators argued that the program worked.
Draw a schematic. Are you skeptical of the results? Why?
4.2 Bells and Whistles
1. Random assignment is important in experimental design
because it:
1.1
1.2
1.3
1.4
Reduces bias
Creates groups that are similar in all variables
Mitigates the effects of lurking variables
All of the above
2. The following feature of a designed experiment (when
present) enables cause-effect conclusions to be drawn.
2.1
2.2
2.3
2.4
Placebo
Double blinding
Random assignment
Informed consent
4.3 Confounding
Epidemiologists find an association between high levels of
cholesterol in the blood and heart disease. They conclude that
cholesterol causes heart disease. However, a statistician argues
that smoking confounds the association. This means one of
the following. Which one?
1. Smoking causes heart disease.
2. Smoking is associated with heart disease, and smokers
have high levels of cholesterol in their blood.
3. Smokers tend to eat a less healthful diet than
non-smokers. Thus, smokers have high levels of
cholesterol in the blood, which in turn causes heart
disease.