n n - mathchick.net

Statistics 11
Example 8: Please provide an example of each level of measurement:
(a) Nominal
(b) Ordinal
(c) Ratio
(d) Interval
1.4 & 1.5: CRITICAL THINKING & COLLECTING SAMPLE DATA
Key Concepts…
We focus on meaning of information obtained by studying data as well as the methods used in
collecting sample data.
Goals in this section are to:
Learn how to interpret information based on data.
Thinking carefully about the context of the data, the source of the data, the method used
in data collection, the conclusions reached and practical implications.
Misuse of statistics
Evil intent on the part of a dishonest person
Unintentional errors on the part of people who do not know better.
σ If sample data are not collected in the appropriate way, the data may be so completely useless that no
amount of statistical torturing can salvage them.
"Lies, damned lies, and statistics" is a phrase describing the persuasive power of numbers, particularly
the use of statistics to bolster weak arguments, and the tendency of people to disparage statistics that
do not support their positions. It is also sometimes colloquially used to doubt statistics used to prove an
opponent's point.
Statistics 12
Sampling Techniques

Voluntary Response Sample (self-selected sample) – respondents themselves
decide whether to be included
o

Literary Digest study
Simple Random Sample – Every subject in the population has an equally likely
chance of being selected (each subject has the same probability of
being selected). Any pair (or triples, quads, etc) has an equally likely
chance of being selected.
o

Drawing names from a hat
o
First draw: Population size “N” – each subject has probability 1/N
o
Second draw: Population size “N – 1” – each subject has prob 1/(N-1)
Random Sample – members from a population are selected in such a way that
each individual member in the population has either an equal or unequal
chance of being selected; that is the probability of selected different
subjects need not be equal.
o

NBA draft
Systematic Sample – select a starting point and then select every kth element in
the population

o
Put 30 numbers in a hat (there are 30 students on my roster).
o
Choose every 6th student to survey.
Convenience Sampling – use results that are very easy to obtain.
o
It is convenient for me to take my sample at the grocery store at 10 a.m. close to
the 55+ living.
BTW – I am asking people whether they believe that abortion should be
legal/illegal.

Stratified Sampling – subdivide the population into at least two different
subgroups (strata) so that subjects within the same subgroup share the
same characteristics (such as gender or age bracket), then we draw a
sample from each subgroup (or stratum)
o
Suppose a group want to learn the chances of a bond issue passing. A community
is composed of two distinct income groups – low and high. Population of community
is stratified into low and high income and a simple random sample from the low
and high income groups.

Cluster sampling – first divide the population into sections (or clusters), then
randomly select some of those clusters, and then survey every member in the
cluster.
o

Randomly select airline flights – then survey every passenger on the plane.
Multistage sampling – use a different sampling method in different stages
Statistics 13
Example 9: Determine what sampling technique: Voluntary, Random, simple random,
systematic, cluster, convenience, or stratified sample would be useful in each situation.
(a)
(b)
An experimenter wants to estimate the average water consumption per family in a
city. He chooses a random starting point on every block and uses the water bill
of every 4th house.
____________________
Six different health plans were randomly selected and all of their members were
surveyed about customer satisfaction.
____________________
(c)
I surveyed all of my students to obtain sample data consisting of the number of
credit cards students possess.
_____________________
(d)
In a study of college programs, 820 students are randomly selected from those
majoring in communications, 1463 were randomly selected from those majoring in
mathematics and 760 were randomly selected from those majoring in history.
_____________________
(e)
Wes set up a booth outside the cafeteria and waited for students to approach
him. He asked students whether professor’s at IVC should be allowed to assign
homework.
_____________________
(f)
On your desk, you have five flash cards. Each flash card either has A, B, C, D
or F on it. Draw your flash card and that is your grade for the course.
_____________________
Statistics 14
Misuse of Data

Reported Results – take the data yourself rather than allowing someone to report their
data.
o
I will measure the height of each of you rather than you report to me how tall
you are.

Small Samples – conclusions shall not be reached on samples that are “not large enough”.
The question arises – how large of sample is large enough – and this brings us to a
discussion on the normal distribution.
o

I will ask five students on their feeling about euthanasia.
Misused Percentages – misleading or unclear percentages. (% review at end of section)
o
Continental Airlines claimed that they have improved 100% on lost luggage in the
last six months. The New York Times interpreted (correctly) to mean that no
baggage is now being lost – an accomplishment not yet enjoyed by Continental
Airlines.

Loaded Questions – if survey questions are not worded carefully, the results of a study
can be misleading. Survery questions can be “loaded” or intentionally worded to
elicit a desired response.

Order of questions -- results from a poll conducted in Germany
o
Would you say that traffic contributes more or less to air pollution than industry?

o
Would you say that industry contributes more or less to air pollution than traffic?


Traffic presented first, 45% blamed traffic, 27% blamed industry
Industy presented first, 24% blamed traffic and 57% blamed industry.
Missing Data
o
Sometimes missing data is out of control of experimenter (subjects dropping out
of a study for reasons unrelated to study)

o
Low income people less likely to respond to a questions such as income
o
U.S. Census – missing data from homeless people –
o
people that do not respond to phone surveys (or don’t have land lines)
Self-Interest study
o
Aware of monetary gains from results.

Our survey 250 people from hiring professionals demonstrated that you
are more likely to get hired if you purchase Tistoni shoes for men (price
tag: $38,000). (By the way, I work for Tistoni and as part of my
marketing, I took every one of these executives to dinner).

Deliberate Distortions – see graphs next page

Small samples – make sure your sample size is large enough!
Statistics 15
Example 10: Do you want to work for company A or company B. The x-axis represents
time and the y-axis represents salary. All aspects of both Company A and
Company B are equivalent.
Company B
Salary
Salary
Company A
Salary
Salary
Bias – everything sampling technique will generate some bias. Goal of the researcher should
be to minimize bias. These are two biases that can be minimized. The non-response
bias – at least report the non-response, or take another random sample from the
differing population.

Non-response bias – answers of those that respond potentially differ from
respondents that did not respond.

Response Bias – respondents answering questions in the way that they believe
the surveyor wants them to respond.
Correlation vs. Causality
Example 11: It is a known fact that as ice cream sales increase, shark attacks increase.
Therefore, we can conclude that sharks love to eat people that eat ice cream!! Obviously,
‘we’ ice cream eaters taste better!
Is this a valid conclusion? Why/ why not?
Statistics 16
Correlation vs. Causality and Confounding
Finding a statistical association between two variables and to conclude that one of the
variables causes (or directly affects) the other variable.

Two variables may seem to be linked (smoking and pulse rate), but the increase in pulse
rate may/may not be caused by smoking.

The relationship of the shark attacks vs. ice cream sales as well as smoking vs. pulse rate
are correlations.

Even though we may find a number of cases to be true – we cannot conclude that one
variable caused the other
o
Need to consider confounding variables.
Confounding Variables – not able to distinguish among the affects of different factors.
Moral of the story – CORRELATION DOES NOT IMPLY CAUSALITY!!!
Example 12: David’s study demonstrated that tall people read better.
(a)
Does being tall CAUSE people to read better?
(b)
Is it possible that there is a correlation between being tall and reading better?
(c)
What are some possible confounding variables?
Statistics 17
Studies

Observational Study – we observe and measure specific characteristics, but we
do not attempt to modify the subjects being studied.
o
Cross-Sectional Study – Data are observed, collected and measured at one
particular point in time. Aims to provide data on the entire population.
o
Retrospective study (Observational Study)

Case-Control study – Data are observed, collected and measured
from a particular subset of the population.

Used frequently in epidemiology – compare subjects with a
particular condition “the cases” with those that do not have
the condition “the controls” but who are otherwise similar.

Data are collected from the past by going back in time (through
examination of records, interviews, and so on.

o
A type of case-control study (look back historically)
Prospective (longitudinal) study

Data are collected in the future from groups sharing common
factors (called cohorts).

Experiment – we apply some treatment and then proceed to observe its effect
on the subjects (Subjects in experiments are called experimental units)
**Note: In an observational study, no treatment is given.
Example 13: Identify whether each study is cross-sectional, retrospective or prospective.
(a)
Qualcomm funded a project that studied the affects of 4th – 6th gradestudents who were
taught by a math specialist ability to communicate mathematics over a period of 4 years.
(b)
Physicians at the Mount Sinai Medical Center plan to study emergency personnel who worked
at the site of the terrorist attacks in New York City on September 11, 2001. They plan to
study these workers from now until several years into the future.
(c)
University of Toronto researchers studied 669 traffic crashes involving drivers with cell
phones. They found that cell phone use quadruples the risk of a collision.
Statistics 18
Experimental Design
 Designing an experiment is important. A faulty design can result in ‘GIGA’ (as your author
suggests) – garbage in, garbage out!
 Famous example: The Salk Vaccine Experiment
o 1954 – test the effectiveness of the Salk vaccine in preventing polio.
o 200,745 children given a treatment
 201,229 injected with placebo (essentially, a sugar pill that has no affect)
 200,745 injected with Salk vaccine injection
o Result: 115 injected with placebo developed paralytic polio
33 developed paralytic polio of those injected with vaccine.
Types of Experiments

Randomization – subjects assigned to different groups through a process of random
selection. Equivalent to flipping a coin!
o
The children in the Salk Vaccine experiment were assigned treatment or control
group based on random selection.
o

Use chance as a way to create similar groups.
Replication – replication of an experiment on more than one subject.
o
Sample sizes should be large enough so that the erratic behavior characteristic of
small samples do not disguise the true effects of differing treatments.
o
Larger sample sizes increase the change of recognizing different treatment effects;
but large sample sizes do not necessarily indicate a good sample.

Completely Randomized Experimental Design – assign subjects to different
treatment groups through a process of random selection (Salk Vaccine)

Randomized Block Design – Group subjects that are similar but blocks differ in ways
that might affect the outcome of an experiment (i.e. gender and assigning
treatment for heart medication ). A block is a group of subjects that are
similar.

Rigorously Controlled Design – subjects are assigned to different treatment groups
in ways that are important to experiment

Matched Pairs Design – Compare exactly two treatment groups by using subjects matched in
pairs that somehow are related and/or share similar characteristics. Can be subject (twin
studies)…Coke/Pepsi anyone??
Statistics 19
Blinding & Placebo Affect

Blinding is a technique used in an experiment in which the subject doesn’t know
whether he or she is receiving the treatment or the placebo.

In a double-blind experiment, both the subject and the investigator do not know
whether the subject received the treatment or the placebo.

Placebo affect occurs when an untreated subject reports an improvement in
symptoms.
** Note: Salk Vaccine was a double-blind experiment.
Sampling Errors
 There will always be sampling error, no matter how well the experimental design is
planned.
 If we randomly sample 1000 IVC students and asked if they obtained a high school
diploma or a GED the result will differ slightly from another 1000 students asked the
same question. (we often hear the term “margin of error” in reporting statistics. This will
be discussed later).

Sampling Error – the difference between a sample result and the true population
result; such an error results from change sample fluctuations.

Non-sampling error – occurs when the sample data are incorrectly collected,
recorded, or analyzed (such as selected a biased sample, using a defective
measurement instrument, or copying the data incorrectly).
o
The student that ends up with 1005% in the class had an incorrect test score
input in the grade book!
Statistics 20
Example 14: Choose something you want to study and design an experiment.
Statistics 21
Homework Chapter 1
1.1
1.2
1.3
1.4
1.5
NA
1-18, 23, 26, 28
1-32, 34
1,3, 4, 5, 6, 8, 9, 10, 12, 13, 15-19, 21, 24, 25, 28, 30
1-4,6, 9, 11, 12, 13, 15, 16, 18, 19, 21-26, 27, 29, 31
PERCENTAGE REVIEW
“of” often means multiply
Percent means per hundred so
n% 
Percentage of: Change the % to
n
100
1
then multiply.
100
Fraction to percentage: Divide by denominator and multiply by 100.
Decimal to percentage: Multiply the decimal by 100 and put in the percent symbol.
Percentage to decimal: Remove the percent symbol and divide by 100.
Perform the indicated operation.
a. 12% of 1200
c. 12% of 1200
b. Write 5/8 as a percentage.
d. Write 5/8 as a percentage.