Chapter 1 Introduction to Statistics

Chapter 1 Introduction to Statistics
Chapter Problem: Why was the Literary Digest poll so wrong?
Literary Digest successfully predicted elections (16, 20, 24, 28 and 32)
1936 Presidential Election (between Alf Landon and Franklin Roosevelt)
Literary sent 10 million ballots ( 57% for Landon)
Roosevelt actually received 61% of the votes, Landon 37%.
Literary Digest suffered a humiliating defeat and soon went out business.
Gallup used a much smaller sample poll of 50,000 subjects, predicted
correctly.
70%
Roosevelt actually received
61% Of the popular votes.
60%
Percentage for Roosevelt
•
•
•
•
•
•
50%
40%
30%
20%
10%
0%
Literary Digest
Poll
Gallup
Poll
1
1.1 Review and Preview
Definitions
• Data are observations (such as measurements, genders, …)
• Statistics is a science of planning, collecting data, …, draw
conclusion
• A Population is a complete collection of all elements
• A census is the collection of data from every member from the
population
• A sample is a sub collection of members selected from a
population
Key concepts:
• Sample data must be collected in a appropriate way
• Otherwise, the data may be complete useless
2
1.2 Statistical Thinking
Basic principles of statistical thinking used throughout this book
•
Context of the data
•
Source of the data
•
Sampling method
•
Conclusion
•
Practical implication
3
1.2 Statistical Thinking
1. Context of the data
Consider the data in table 1-1 (from data set 3 in appendix B)
table 1-1 Data Used for Analysis
x 56 67 57 60 64
y 53 66 58 61 68
E.g.1 Context for table 1-1
•
•
•
•
Weights (in kg) of Rutgers students
x values are measured in September of freshman year
y values are measured in April of the following spring semester
Real data! “Changes in Body Weight and Fat Mass of Men and Women in the
first year of college: A Study of the ‘Freshman 15,’” Journal of American
College Health
• Goal: determine whether college students actually gained 15 lb in freshmen
year.
4
1.2 Statistical Thinking
2. Source of the data
E.g.2 Source of the Data for table 1-1 (need to be objective, not
biased)
• Reputable researchers from the Department of Nutritional
Sciences at Rutgers University compiled the measurements in
table 1-1.
– No incentive to distort or spin results
– They have nothing to gain
– Not paid
• We can be confident that these researchers are unbiased and they
did not distort results
5
1.2 Statistical Thinking
3. Sampling method
E.g.3 Sampling used for table 1-1 (need to be random)
• Weights in table 1-1 are from large sample data set 3 appendix B
• 217 students participated in September
• All invited for a follow-up in the spring, 67 of those responded and were
measured in the last two weeks of April
• Voluntary response sample
• The research wrote that “the sample obtained was not random and may have
introduced self-selection bias”, and that “only those students who felt
comfortable enough with their weights to be measured both times.”
6
1.2 Statistical Thinking
Conclusion
E.g.4 Conclusion from Data in table 1-1
• Published
• Weight gain occurred during freshman year
• For the small non-random group studied, the weight gain is less
than 15 lb, and this amount is not universal
• They concluded that “Freshman 15” weight gain is a myth
7
1.2 Statistical Thinking
5. Practical implications
E.g.5 Practical Implication from Data in table 1-1
In addition to conclusions of the statistical analysis, we should also
identify any practical implication of the results.
• They wrote that “it is perhaps most important for students to
recognize that seemingly minor and perhaps even harmless
changes in eating or exercise behavior may result in large changes
in weight and body fat mass over an extended period of time.”
8
1.2 Statistical Thinking
6. Statistical significance versus practical significance
E.g.6 Statistical significance versus practical significance
Atkins weight loss program. 40 subjects, mean weight loss is 2.1 lb.
More discussions on statistical significance throughout the book.
9
1.2 Statistical Thinking
6. Statistical significance
E.g.7 Statistical significance A company developed MicroSort, which supposedly
increases the chances of a couple having a baby girl. In a preliminary test,
researchers located 14 couples who wanted baby girls. After using MicroSort, 13
of them had girls, and one had a boy. There two conclusions:
• MicroSort is not effective and 13 girls in 14 births occurred by chance
• MicroSort is effective as claimed by the company
Statistician thinks:
• If MicroSort has no effect, then there is about 1 chance in 1000 of getting
results above.
• Because the likelihood is so small, statistician conclude that the result is
statistically significant. So it appears MicroSort is effective.
10
1.2 Statistical Thinking
6. Statistical significance
E.g.8 Statistical significance A company developed MicroSort, which supposedly
increases the chances of a couple having a baby girl. In a preliminary test,
researchers located 14 couples who wanted baby girls. After using MicroSort, 8 of
them had girls, and one had a boy. There two conclusions:
• MicroSort is not effective and 13 girls in 14 births occurred by chance
• MicroSort is effective as claimed by the company
Statistician thinks:
• If MicroSort has no effect, then there is about two chances in 5 of getting
results above.
• Unlike the one chance in 1000, two chances in 5 indicate that the results could
easily occur by chance. That indicate that the result of 8 girls in 14 births is not
statistically significant.
11
1.3 Types of Data
Definition – parameter and statistic
• A parameter is a numerical measurement describing some
characteristic of a population
• A statistic is a numerical measurement describing some
characteristic of a sample
• Example 1
– Parameter: there are 100 senators in the 109th Congress, and 55% of them
are republican. 55% is a parameter, since it is based on the entire
population of 100 senators.
– Statistic: In 1936, Literary Digest polled 2.3 million adults in the U.S., and
57% of them said that they would vote for Alf Landon. 57% is a statistic,
because it is based on a sample.
12
1.3 Types of Data
Definition – quantitative data and qualitative data
• Quantitative data consist of numbers representing counts or
measurements.
• Categorical (or qualitative or attribute) data consist of names or
labels that are not numbers representing counts or measurements.
• Example 2
– Quantitative data: the ages (in years) of survey repondents
– Categorical data: the political party affiliation (democratic, republican,
independent, other) of survey respondents
– Categorical data: the numbers 24, 28, 17, 54, and 31 are sewn on the shirts
of the LA Lakers starting basketball team. Theses numbers are substitutes
for names.
13
1.3 Types of Data
Definition – discrete data and continuous data
Quantitative data can be further divided into discrete data and
continuous data
• Discrete data result when the number of possible values is either
finite number or a “countable” number
–
E.g.3 the number of eggs
• Continuous data result from infinitely many possible values that
correspond to some continuous scale.
–
E.g.3 the amount of milk from cows
14
1.3 Types of Data
Four levels of measurements
• Nominal – data that consist of names, labels or categories only.
– E.g.4 yes/no/undecided response from a survey
– E.g.4 Political party affiliation
• Ordinal – data has order, but no difference.
– E.g.5 Course grades (A, B, C, D, or F)
– E.g.5 Ranks (e.g., first, second, third, …)
• Interval – data has difference but no starting value
– E.g.6 Temperature
– E.g.6 Years
• Ratio – data has natural zero starting point
– E.g.7 Distances
– E.g.7 Prices
15
1.3 Types of Data
Four levels of measurements
•
Most difficulty arises with the distinction between the
interval and ratio
–
Hint: use “ratio test” – consider two quantities where one
number is twice the other, and ask whether “twice” can be
used to correctly describe the quantities.
16
1.4 Critical Thinking
Key concept – common sense, interpretation
• Two main source of deception
– Intentional (Evil intent)
– Unintentional (error)
17
1.4 Critical Thinking
Example of bad samples
• Definition – a voluntary response sample (or selfselected sample) is one in which the respondents
themselves decide whether to be included.
• E.g.1 Newsweek.msnbc.com Napster poll.
– People with strong opinions are likely to participate
– Other example, internet, mail-in polls, call-in polls
• E.g.2 Literary Digest. Ballots were sent to magazine
subscribers, registered car owners, and those who use
telephones.
18
1.4 Critical Thinking
Publication bias: Tends to publish positive results.
Reported result: when collecting data from people,
• It is better to take measurements yourself
• E.g.3 Voting behavior. When 1002 eligible voters were surveyed
– 70% said that they voted in a recent presidential election (ICR Research)
– Record show only 61% voted.
Small samples
• E.g.4 Children’s Defense Fund published Children Out of School
in America.
– Among secondary school students, 67% were suspended at least 3 times
– Sample size = 3 students
19
1.4 Critical Thinking
Percentages
• Can be misleading
– E.g.5 In referring to lost baggage, Continental Airlines Ads claiming this
was “an area where we’ve already improved 100% in the last six months.”
– 100% improvement means no baggage is lost
20
1.4 Critical Thinking
Loaded Questions
• Intentionally worded to elicit a desired response
Example 6
– 97% yes: “should the President have the line item veto to eliminate waste?”
– 57% yes: “should the President have the line item veto, or not?”
Order of Questions
Example 7
• Would you say that traffic contributes more or less to air pollution
than industry?
45% blame traffic, 27% blame industry
• Would you say that industry contributes more or less to air
pollution than traffic? 24% blame traffic, 57% blame industry
21
1.4 Critical Thinking
Nonresponse
•
Refuse to answer or unavailable
•
Refusal rate has been growing in recent years
•
View of world around them is different than one was
Missing data
•
Sometimes random
•
Special factor at other times
•
Census suffers from missing data
22
1.4 Critical Thinking
Correlation and Causality
•
Statistical association between two variables
•
E.g. wealth and IQ
•
Correlation does not imply causality
Self-Interest Study
•
Sponsored by parties with interests to promote products, etc
•
Kiwi Brands, a maker of scuffed shoes
•
Pharmaceutical companies pay doctor …
23
1.4 Critical Thinking
Precise Numbers
•
There are now 103,215,027 households in the U.S.
•
Better to say, the number of households is about 103 millions
Deliberate Distortion
•
Hertz sued Avis (false advertising by Avis based on the survey)
•
Claimed that Avis was the winner in a survey of people who
rent cars.
24
1.5 Collecting Sample Data
Key concept – simple random samples
• If sample data are not collected in an appropriate way, the data
may be so completely useless that no amount of statistical
torturing can salvage them.
• Throughout this book
– Will use a variety of different statistical procedures
– Often requires that sample selected is a simple random sample
25
1.5 Collecting Sample Data – Part 1
Part 1: Basics of collecting data
Definition – observational studies and experiments.
• In an observational study, we observe and measure specific
characteristics, but we don’t attempt to modify the subjects being
studied - e.g. Gallup Poll.
• In an experiment, we apply some treatment and then proceed to
observe its effects on the subjects. (subjects in experiments are
called experimental units)
• Example 1
– Observational study: a poll in which subjects are surveyed, but they are
not given any treatment.
– Experiment: In the largest public health experiment, 200,745 children
were given a treatment consisting of the Salk vaccine, while 201,229 other
children were given a placebo.
26
1.5 Collecting Sample Data – Part 1
Common sampling methods:
• A simple random sample of n subjects is selected in such a way
that every possible sample of the same size n has the same chance
of being selected.
• In a random sample members from the population are selected in
such a way that each individual member has an equal chance of
being selected.
• A probability sample involves selecting members from a
population in such a way that each member has a known (but not
necessarily the same) chance of being selected.
• Example 2 Sampling senators. Create 50 index cards. There is a
state name on each card. We mix 50 card in a bowl and then
select one card. If we consider the two senators to be a sample, is
this result a random sample? Simple random sample? Probability
27
sample? Random sample, NOT simple random sample, probability sample.
1.5 Collecting Sample Data – Part 1
Common sampling methods:
• Systematic sampling – select a starting point and then select
every kth element (e.g. every 50th)
• Convenience sampling – use results that are very easy to get
• Stratified sampling – subdivide the population into at least two
different subgroups (or strata) so that subjects within the same
subgroup share the same characteristics (e.g. gender), then draw a
sample from each subgroup (or stratum)
• Cluster sampling – first divide the population into sections (or
clusters), then randomly select some of those clusters, and choose
all the members from those selected clusters
28
1.5 Collecting Sample Data – Part 1
Common sampling methods:
• Neither stratified sampling nor cluster sampling satisfies the
simple random sample requirement
29
1.5 Collecting Sample Data – Part 1
Example 3. Multistage sample design The U.S. government’s
unemployment statistics are based on surveyed households. U.S.
Census Bureau and the Bureau of Labor conduct a survey called the
Current Population Survey. This survey obtains data describing such
factors as unemployment rates, college enrollments, and weekly
earning amounts. The survey incorporates a multistage sample
design, roughly following these steps:
30
1.5 Collecting Sample Data – Part 1
Example 3 (continue)
1. The entire U.S. is partitioned into 2007 different regions called
primary sampling units (PSU). The primary sampling units are
metropolitan areas, large counties, or groups of smaller counties
2. For the Current Population Survey, 792 of the PSU are used.
(All of the 432 PSU with the largest populations are used, and
the other 360 PSU are randomly selected from the other 1575.)
3. Each of the 792 selected PSU’s is partitioned into blocks, and
stratified sampling is used to select a sample of blocks
4. In each selected block, clusters of households that are close to
each other are identified. Clusters are randomly selected, and all
households in the selected clusters are interviewed.
31
1.5 Collecting Sample Data – Part 2
Part 2: Beyond the Basics of Collecting Data
• Observational study
• Experiment
32
1.5 Collecting Sample Data – Part 2
Observational Study
Observe and measure,
but do not modify
When
are the
observation
made?
Past period
of time
Retrospective
(or case-control)
study:
Go back in time
to collect data
over some past
period
One point
in time
Cross-sectional
study:
Data are
measured at one
poing in time
Forward
in time
Prospective
(or longitudinal
or cohort) study
Go forward in
time and observe
group sharing
common factors,
such as smokers
And nonsmokers
Figure 1-3. Types of Observational Studies
33
1.5 Collecting Sample Data – Part 2
Part 2: Beyond the Basics of Collecting Data
Definition – in observational study
• In a cross-sectional study, data are observed, measured and
collected at one point in time.
• In a retrospective (or case-control) study, data are collected
from the past by going back in time
• In a prospective (or longitudinal or cohort) study, data are
collected in the future from groups sharing common factors
(called cohorts)
34
1.5 Collecting Sample Data – Part 2
Design of experiment – Start with example.
E.g.4 The Salk Vaccine Experiment In 1954, a large-scale
experiment was designed to test the effectiveness of the Salk vaccine
in preventing polio.
• 200,745 children were given a treatment consisting of Salk vaccine injection,
while a second group of 201,229 children were injected with a placebo that
contained no drug.
• The children being injected did not know whether they were getting the Salk
vaccine or the placebo.
• Children were assigned to the treatment or placebo group through a process of
random selection, equivalent to flipping a coin.
• Among the children given the Salk vaccine, 33 later developed paralytic polio,
but among the children given the placebo, 115 later developed paralytic polio.
35
1.5 Collecting Sample Data – Part 2
Experiment
• Randomization
• Replication: Replication is the repetition of an experiment on more than
one subject.
– Repeating the experiment so that result(s) can be verified.
– Use large sample size, and good sampling method
– e.g. in Salk Vaccine test, about 200,000 children in each group
• Blinding:
– A technique in which the subject does not know whether he/she is
receiving treatment or placebo.
– Blinding allows us to determine whether the treatment effect is
significantly different the placebo effect.
– Polio experiment was double-blinded
36
1.5 Collecting Sample Data – Part 2
Experiment
• Confounding occurs in an experiment when you are not able to
distinguish among the effects of different factors
Bad experimental design:
treat all women subjects,
and don’t treat men.
(Problem: we don’t know if
effects are due to sex or to
treatment)
Completely randomized
experimental design:
Use randomness to
determine who gets the
treatment.
Randomized block design:
1. Form a block of women
and a block of men
2. Within each block,
randomly select subjects
to be treated.
Treatment Group: Women
Block of Women
Treat all women subjects
Treat randomly
selected women
Block of Men
Placebo Group: Men
Give all men a Placebo
a
Treat these
randomly selected
subjects
b
Treat randomly
selected men
c
Figure 1-4 Controlling Effect of a Treatment Variable
37
1.5 Collecting Sample Data – Part 2
Experiments
Confounding
• In Figure 1-4(a), where confounding occur when the treatment
group of women shows strong positive results. We cannot
determine whether the treatment or the sex of the subjects causes
the positive results.
• Salk vaccine experiment illustrated one method – the completely
randomized experimental design.
• Four methods used to control effects of variables.
38
1.5 Collecting Sample Data – Part 2
Experiments
Controlling Effects of Variables
1. Completely randomized experimental design ( Figure 1-4 (b) )
2. Randomized block design ( Figure 1-4(c) )
3. Rigorously controlled design
4. Matched pairs design
39
1.5 Collecting Sample Data – Part 2
Experiments
• Rigorously controlled design
– Carefully assign subjects to different treatment groups, so that
those given each treatment are similar in the ways that are
important to the experiment. In an experiment testing the
effectiveness of aspirin on heart disease, if the placebo group
includes a 30-year-old overweight male smoker who drinks
heavily and consumes an abundance of salt and fat, the
treatment group should also include a person with similar
characteristics.
40
1.5 Collecting Sample Data – Part 2
Experiments
• Matched pairs design
– Compare exactly two treatment groups (such as treatment and
placebo) by using subjects matched in pairs that are somehow
related or have similar characteristics. A test of Crest
toothpaste used matched pairs of twins, where one twin used
Crest and the other used another toothpaste.
41
1.5 Collecting Sample Data – Part 2
Experiments
Summary Three very important considerations in the design of
experiments:
• Use randomization to assign subjects to different groups
• Use replication by repeating of experiment on enough subjects so
that effects of treatments or other factors can be clearly seen
• Control the effects of variables by using such techniques as
blinding ad a completely randomized experimental design
42
1.5 Collecting Sample Data – Part 2
Sampling Errors
•
A sampling error is the difference between a sample result and
the true population result; such an error results from chance
sample fluctuations
•
A nonsampling error occurs when the sample data are
incorrectly collected, recorded, or analyzed (such as by
selecting a biased sample, using a defective measure
instrument, or copying the data incorrectly).
43