Sample

1
Introduction – part 1



Course Syllabus – (Reminder: prepare a folder for homework)
Prepare a 3-ring binder with dividers for class notes,
homework, quizzes, tests, or related information
Homework








Check the answers for odd numbers on the back of the book, show
your work. No credit will be awarded if no work is shown.
Use 3 holes line papers with your name on the right upper corner with
your ID; write down the homework # and assignments #.
Place your homework on the homework folder and is due Monday
TI -84+ Calculator is required. Write down calculator commands on
your homework.
Homework log – for your own record
Prepare a folder for homework with homework log
Write the date next to the homework number and homework log
should be in your homework folder with your homework
Failure to comply with instructions, will result point deduction
2
Introduction – part 2

Lecture notes will be on line, please print it out before the
class

Lecture notes may only contain the summary of lectures,
examples will be shown on the whiteboard, thus taking notes
is essential part of the learning.
Beginning each class , there may be a warm-up (may be
counted as class participation points).
At the end of each class, there may be an exit slip which
contains in-class work (may be counted as class participation
points).
Quizzes (counted as your grade) may be given randomly
unannounced, may contains vocabularies, homework
problems.
Familiar with syllabus, and keep track your total score on your
own.
3




1-1 Overview (1)


Key Concept: Overview and Introduction of statistics. After this lesson, you
will be able to know some of the examples of statistics and identify the terms,
and data types used in statistics
Chapter Problem - 1936 presidential election predictions with Literary Digest
Poll and George Gallup’s poll





Literacy Digest polled 10 millions people, got about 2millions+ back, and predicted
Alf Landon would capture 57% of votes
George Gallup polled 50,000 people, predicted that Roosevelt would capture 57%
of votes and he actually got 61% of the popular votes
Why was Literary digest poll so wrong?
Does size of poll matter?
What is Statistics
The word “statistics” is derived from the Latin word status (meaning “state”).
Statistics - The science of collecting, organizing, and interpreting numerical
facts – data
 You live in the world of statistics
 Statisticians assemble, classify, and tabulate data, then analyze the data in order
to make generalizations and decisions
 Households, governments, and businesses rely heavily on statistical data for
guidance


4
1-1 Overview (2)

Gallup Pull conducted by the Gallup Organization (invented by
George Gallup); Example: Election – predict the % of votes, Prescription
Drug (e.g. side effects), or ethnic group population, etc.
 Census - United States conducts the census every 10 years. The result is used
for many government projects, highly political considerations


Baseball batting average, imported car sales, or president’s
popularity
Why study statistics

Provide a perspective about how to look at data and effectively
communicate information about it to others.
 Provide understanding some research reports that encountered daily news,
such as:



The annual earnings of college graduates exceed, on the average, those of high
school graduates by $30,000
Heavy uses of tobacco suffer significantly more respiratory ailments than those
nonusers
Help for future research project with statistical analysis
5
1-1 Overview (3) Definitions used in Statistics

Data


Collection of observations such as measurements, genders, survey responses
Statistics – a science of

Planning studies and experiments, obtaining data
 Organizing, summarizing, presenting, analyzing, interpreting, and drawing
conclusions based on the data

Population



Sample


The complete collection of all elements (e.g. scores, people, measurements)
The collection is complete in the sense that it includes all subjects to be studied
A subcollection of members selected from a population
Census

The collection of data from every member of the population
6
1-2 Statistical Thinking

Statistical thinking is the ability to see the big picture and to consider the
following relevant factors:







Context of the data – what are the data values?
Source of the data – obtained objectively or biased?
Sampling method – how the data was collected?
Conclusions – what can we conclude analysis?
Practical implications – identify any implications, statistical significance?
Statistical thinking may involve determining whether results are statistical
significant (vs practical significant)
Sample data collection for statistics

Must be collected in an appropriate way through a random process
 If not collected in an appropriate way, the data may be useless, because it may
be biased.


Statistics is an interesting subject that are extensive, real and meaningful in
real world applications
HW #1: PP 9-11, #1 – 17 Every Other Odd (EOO)
7
1-3 Types of Data (1)

Key Concepts

The subject of statistics is largely about using sample data to make inferences
(or generalizations) about an entire population. It is essential to know and
understand the definitions that follows.
 To study different types of data (numbers) know the difference between
quantitative data and qualitative data.

Parameter (for population)






A numerical measurement describing some characteristic of a population
3250 (walk buttons for pedestrians to cross the intersections in New York City)
77% do not work (based on a report)
The figure of 77% is a parameter
3250 is entire population of walk buttons for NYC
Statistic (for sample) (Note: sample is part of population)

A numerical measurement describing some characteristic of a sample
 Surveyed 877 executives, 45% would not hire someone made typos on their job
application
 The figure of 45% is a statistic, because it is based on a sample (877), not entire
population of executives
8
1-3 Types of Data (2)

Quantitative data



Numbers representing counts or measurements (it is important to use
the appropriate units of measurement, such as $, hours, feet, meters,
ages, etc)
Examples: thousands of dollars, hundredths of a seconds, 250 million
people, the weights of supermodels
Two types of quantitative data



Definitions
Discrete – finite number or a countable number (0, 1, 2, 3, ..), e.g. the number
of eggs that a hen lays
Continuous (numerical) data – values correspond to some continuous scale
(without gap), e.g. amount of milk that a cow produced
Qualitative data


Sometimes called categorical or attribute data
Data can be separated into different categories that are distinguished by
some nonnumeric characteristic; example: the genders (male/female) of
professional athletes, colors of eyes, political party affiliations
9
1-3 Types of Data (3) Classified to Four Levels of Measurement
1.
Nominal – categories only

Data that consist of names, labels, or categories only; cannot be ordered or used for
calculations.
 Examples: Yes/No/Undecided; Colors of cars; political party affiliations
2.
Ordinal – categories (data) can be arranged in some order

Difference between data values either cannot be determined or are meaningless
 Examples: courses grades A, B, C, D, F in order, but A – B is not a quantity; Ranks such
as cities as livability by magazine.
 Provide information about relative comparisons, but not the difference
3.
Interval – Ordinal with meaningful differences (with no natural starting
point; i.e. data cannot have zero as starting point; zero = no quantity)

4.
Examples: (1) Body temperature between 98.2F and 98.6F – ordered, with difference
0.4F ; (2) Years 1000, 2008, 1492. Not 0 year, e.g. 0 century?
Ratio – Interval with differences and a natural starting point zero

Examples (1) Weights, like diamond in ring (2) Prices – college books. Zero means no
cost
10
1-3 Summary (4) and Homework #2

Levels of Measurement
Ratio
There is a natural zero starting point
and ratios are meaningful
Interval
Differences are meaningful, but there is no Example: body temperatures
natural zero starting point and ratios are
in degrees Fahrenheit or
meaningless
Celsius
Ordinal
Categories are ordered, but differences
can’t be found or are meaningless
Example: rank of colleges in
U.S. News and World Report
Nominal
Categories only, Data cannot be arranged
in an ordering scheme.
Example: Eye colors

Distance
Example: The finishing positions of a sample of drivers in a NASCAR
race: 3, 8, 12, 15, 17 (3rd place, 8th place, etc.)

What is the level of measurement of these data?
 Are these data discrete or continuous?
 Are the data qualitative or quantitative?

HW #2: Page 16: #1-25 EOO
11
1.4 Critical Thinking (1)

Key Concepts:
Success in the introductory statistics course typically requires more common
sense than mathematical expertise.
This section is to illustrate how common sense is used when we think critically
about data and statistics.




Importance of good samples : Sampling techniques (sampling procedures);
Randomness; Avoid the biased result
Correlation and Causality – correlation does not imply causality

is another way to misinterpret statistical data, i.e. to find a statistical association
between two variables and to conclude that one of the variables causes (or
directly affects) the other variable.
 Correlation means two variables are related – will learn more later in Ch. 10;
Two variables may be related, but may not be directly related (cause of the
other)
 IQ and wealth?
 News report often use the wording: one is the cause of the other
12
(1-4) Critical Thinking (2) Bad Samples

Voluntary response sample (or self-selected sample): is one in which the
respondents decide whether to be included.

Results in biased samples, e.g. Literary Digest poll
 Examples: Polls conducted through Internet; Mail-in; Telephone
 In this case, valid conclusion can be made only about the specific group of people who
agree to participate.
 What went wrong in the Literary Digest Poll, which received 2.3 million responses out of
10 million ballots? The sampling method is a voluntary response sample and is biased,
since the ballots were sent to the magazine subscribers, car owners, or telephone users.

Small Samples: Your opinion: good? bad? Biased?

Can we draw a reasonable conclusion or inference?
 Example: basing a school suspension rate on a sample of only three students
 Small or large may not be too significant, the importance is appropriate collection of
sample

Voting behavior Samples with Reported Results

When 1002 eligible voters were surveyed, 70% of them said they voted in recent
presidential election. However, voting records show that only 61% of eligible voters
actually did vote.
 When ask subjects their weights, you most likely would get their desired weights, not
their actual weights.
13
(1-4) Critical Thinking (3) Bad Samples
Bar graphs or pie graphs can be misleading due to scale; you must
analyze the numerical information given in the graph, not the graph’s
shape
$32,996
35000
$30,180
30000
25000
20000
15000
10000
5000
0
0
0
$33,500
$33,000
$32,500
$32,000
$31,500
$31,000
$30,500
$30,000
$29,500
$29,000
$28,500
$32,996
$30,180
California
California
Nevada
Nevada
Pictographs - Use drawings of the object to depict the data can be
misleading to: due to scale
25
Daily Oil Comsumption

Personal Income per Capita
Personal Income per Capita

20
20
20.0
15
10
5.4
5.4
5
0
USA
Japan
USA
Japan
14
(1-4) Critical Thinking (4) Bad Samples

Use percentages also can be misleading due to the interpretation of the
meaning of %

Example as book stated, in referring to lost baggage, Continental Airlines ran ads claiming
that this was “an area where we’ve already improved 100% in the last six months”.
 What does 100% mean to you? Some said it means there is no baggage being lost and which
is not true

Survey questions can be biased depend how it asked and order of questions
97% yes: “Should the President have the line item veto to eliminate waste?”
 57% yes: “Should the President have the line item veto, or not?”
 “Would you say that traffic contributes more or less to air pollution than industry?” Or
“Would you say that industry contributes more or less air pollution than traffic



No response -Non response for a survey questions can skew the data whether
the person is not available or refuses to answer the questions. E.g.
telemarketers phone calls
Missing Data - Result can be dramatically affected by missing data
Data can be missing randomly – i.e. related to its value or other values
 Data can be missing due to some factors such as non-reporting. (as a result of how
the survey was conducted)

15
(1-4) Critical Thinking (5) Bad Samples


Self-Interest Study - Studies are sponsored by parties with
interests to promote. Can you think of good examples?
Precise Number - Can be misleading, and biased the result


Partial Pictures



Most of us believe when the number was stated precisely, we think it’s
accurate! Example: the number of households in the United States is
103,215,027. In this case, it’s better just say 103 Million
“90% of all our cars sold in this country in the last 10 years are still on
the road”; 10 years data may be just 3 years
Misleading the consumers, even though it’s not a false statement
Deliberate Distortions - Produce biased result, can misleading
the users


The survey of car rental company, Avis was winner, but was distorted.
Hertz sued Avis for false advertising based on the survey
16
1-4 Summary and Homework #3




There are many cases that misuses the statistics, you probably
heard or read about those statistical report
Study Statistics will make you smarter to decide what study or
research is better than others; Use your common sense to
interpret data and statistics
Example: During a show on MTV, the host asks viewers to
call in and vote for or against a new song, with the result that
74% of 12,335 viewers favor it. Given that it is a large sample,
and more than 50% favored the song, is it valid to conclude
that the majority of Americans favor the song? Why or why
not?
Homework #3 (Chap. 1-4)

Page 23-24, 1-17 EOO
17
1-5 : Collecting Sample Data (1)






Key Concept – Discuss the methods to collect sample data
appropriately for the quality of statistical analysis.
Observational study – observe and measure specific
characteristics, but don’t attempt to modify the subject being
studied
Experiments – apply some treatment and then observe ,
proceed to observe its effect on the subjects (experimental
units)
Cross-sectional study – data are observed, measured, and
collected at one point in time
Retrospective (case-control) study – data are collected from
the past by going back in time (through examination of
records, interviews, and so on)
Prospective (longitudinal or cohort) study – data are collected
in the future from groups sharing common factors (i.e. cohort)
18
1-5 : Collecting Sample Data (2) Experiments and Important considerations


Experiments mean some treatment are applied. The result can be
confounding (confounding means unable to distinguish among the effects
of different factors)
Important Considerations:

Control the effects of variables





Use replication



Blinding – used a lot in the experimental drugs; Placebo effect, Blinding technique,
Double blinding technique
Blocks – a group of subjects that are similar, example like fertilizer or drugs;
Randomize the block design – block the subjects with similar characteristics, and
randomly assign treatments
Completely Randomized Design – assign subjects to different treatment groups
through random process selection. Example: polio experiment
Rigorously Controlled Design – subjects are carefully chosen and given similar
treatment for the experiment. E.g. blood pressure lowering drug
Replication – repetition of an experiment to recognize differences from different
treatments
Sample size that is large enough to avoid erratic behavior and to see the true nature
of any effects and obtain that sample using an appropriate method, such as one
based on randomness
Use randomization (very important, see next chart)
19
1-5 : Collecting Sample Data (3) Randomization and Sampling
Strategies

Random sample – Members from a population are selected
randomly, i.e. everyone has equal chance; e.g. use computer to
generate telephone numbers.

Simple Random sample of n subjects – Subjects selected in
such a way that every possible sample of same size n has equal
chance of being chosen; this is a requirement will be used for
various statistical procedures.
Probability sample – members selected from a population is
20
known chance of being selected.

1-5 : Collecting Sample Data (4) Sampling Techniques

Systematic sampling – select some starting point and
then every nth element in the population (e.g. every
3rd)

Convenience sampling – use the results easy to get
21
1-5 : Collecting Sample Data (5) Sampling Techniques

Stratified sampling – subdivide the population to different
subgroups (strata, two or more) with same characteristics (e.g.
gender or age bracket), than draw sample from each subgroup
(stratum); e.g. men or women or Democrats or republicans.

Cluster Sampling – divide the population to sections (or
clusters), then randomly select some of these clusters and form
the samples, e.g. voting precincts or divide all classes at a
college with subjects and sections, then poll all students in
randomly selected classes.
22
1-5 : Collecting Sample Data (6) Multistage Sampling


Multistage sample design – selection of a sample in different states
(includes random, stratified, and cluster sampling at different stages), is a
very complicated design
Example: unemployment statistics use households survey in US







2007 different regions (Primary Sampling Units or PSU) with metropolitan
areas, counties, groups of smaller counties
792 from 2007
Each one of 792 is partitioned into blocks
Identify clusters of households that are close to each other
Select the clusters randomly
Sampling errors – the difference between a sample result and the true
population result
Non-sampling errors – occurs when sample data are incorrectly collected,
recorded, or analyzed (from a biased sample, or defective measurement
instrument, or data was entered wrong)
23
1-5 Statistical Studies & Homework #4
Statistical
Studies
Observational study
Past period
of time
When was the
observations
Observations or
experiments?
Forward in time
One point in time
Retrospective
study
Cross-sectional
study
Experimental study
Design the experiment
1. Control effects of the variables
by blinding, blocks, completely
randomized
2. Replication
3. Randomization
Prospective
study
Types of studies and experiments; Controlling the effects of variables; Randomization;
Types of sampling; Sampling errors
HW #4 (Chap 1-5), Pg. 34-36, 1 – 25 EOO
24