Document

Introduction to Statistics for the Social Sciences
SBS200, COMM200, GEOG200, PA200, POL200, or SOC200
Lecture Section 001, Fall 2015
Room 150 Harvill Building
10:00 - 10:50 Mondays, Wednesdays & Fridays.
http://courses.eller.arizona.edu/mgmt/delaney/d15s_database_weekone_screenshot.xlsx
Everyone will want to be enrolled
in one of the lab sessions
Labs continue
next week
Please re-register your clicker
http://student.turningtechnologies.com/
By the end of lecture today
10/9/15
Law of Large Numbers
Central Limit Theorem
Schedule of readings
Before next exam (October 16th)
Please read chapters 1 - 8 in OpenStax textbook
Please read Chapters 10, 11, 12 and 14 in Plous
Chapter 10: The Representativeness Heuristic
Chapter 11: The Availability Heuristic
Chapter 12: Probability and Risk
Chapter 14: The Perception of Randomness
Homework
On class website:
Please print and complete homework worksheet #11
Due Monday October 12th
Dan Gilbert Reading and Law of Large Numbers
Review of
Homework Worksheet
just in case of questions
Homework review
2
= .40
5
Based on apriori probability – all options equally likely – not based on previous experience or data
Based on expert opinion - don’t have previous data for these two companies merging together
Based on frequency data (Percent of rockets that successfully launched)
Homework review
Based on apriori probability – all options equally likely – not based on previous experience or data
30
= .30
100
Based on frequency data (Percent of times at bat that successfully resulted in hits)
Based on frequency data (Percent of times that pages that are “fake”)
Homework review
5
= .10
50
Based on frequency data (Percent of students who
successfully chose to be Economics majors)
.
.8276
.1056
.2029
.1915
.4332
44 - 50
4
.3944
= -1.5
z of 1.5 = area of .4332
55 - 50
4
= +1.25
z of 1.25 = area of .3944
.4332 +.3944 = .8276
.3944
.3944
55 - 50
4
= +1.25
1.25 = area of .3944
.5000 - .3944 = .1056
52 - 50
4
= +.5
z of .5 = area of .1915
55 - 50
4
= +1.25
z of 1.25 = area of .3944
.3944 -.1915 = .2029
Homework review
.3264
.2152
.5143
.1736
3000 - 2708
650
= 0.45
z of 0.45 = area of .1736
.5000 - .1736 = .3264
.1255
.1736
.3888
.3888
3000 - 2708
650
= 0.45
z of 0.45 = area of .1736
3500 - 2708
650
= 1.22
2500 - 2708
650
= -.32
z of -0.32 = area of .1255
3500 - 2708
650
= 1.22
z of 1.22 = area of .3888
z of 1.22 = area of .3888
.3888 - .1736 = .2152
.3888 +.1255= .5143
Homework review
.0764
.9236
.4236
.4236
20 - 15
3.5
= 1.43
z of 1.43 = area of .4236
.5000 - .4236 = .0764
.1185
.4236 .3051
20 - 15
3.5
= 1.43
z of 1.43 = area of .4236
.5000 + .4236 = .9236
10 - 15
3.5
= -1.43
z of -1.43 = area of .4236
12 - 15
3.5
= -0.86
z of -.86 = area of .3051
.4236 – .3051 = .1185
Comments on Dan Gilbert
Reading
Law of large numbers: As the number of measurements
increases the data becomes more stable and a better
approximation of the true (theoretical) probability
As the number of observations (n) increases or the
number of times the experiment is performed, the
estimate will become more accurate.
Law of large numbers: As the number of measurements
increases the data becomes more stable and a better
approximation of the true signal (e.g. mean)
As the number of observations (n) increases or the
number of times the experiment is performed, the
signal will become more clear (static cancels out)
With only a few people any little error is noticed
(becomes exaggerated when we look at whole group)
With many people any little error is corrected
(becomes minimized when we look at whole group)
http://www.youtube.com/watch?v=ne6tB2KiZuk
Sampling distributions of sample means
versus frequency distributions of individual scores
Distribution of raw scores: is an empirical probability distribution
of the values from a sample of raw scores from a population
Frequency distributions of individual scores
X
• derived empirically
XX
• we are plotting raw data
XXX
• this is a single sample
Take a single
score
x
Repeat
over and over
x
Population
x
x
x
x
x
x
Preston
X
X
X
X
X
X
X
X
X
X
XX
XX
XX
XX
XX
XX
Eugene
X
X
X
X
X
X
X
XX
XX X
XX X X
Melvin
Sampling distribution: is a theoretical probability distribution of
the possible values of some sample statistic that would
occur if we were to draw an infinite number of same-sized
samples from a population
important note:
“fixed n”
Sampling distributions of sample means
• theoretical distribution
• we are plotting means of samples
Take sample
– get mean
Repeat over and over
Population
Mean for 1st
sample
Sampling distribution: is a theoretical probability distribution of
the possible values of some sample statistic that would
occur if we were to draw an infinite number of same-sized
samples from a population
important note:
“fixed n”
Sampling distributions of sample means
• theoretical distribution
• we are plotting means of samples
Take sample
– get mean
Repeat over and over
Population
Distribution of
means of samples
Sampling distribution: is a theoretical probability distribution of
the possible values of some sample statistic that would
occur if we were to draw an infinite number of same-sized
samples from a population
Frequency distributions of individual scores
• derived empirically
• we are plotting raw data
• this is a single sample
Sampling distributions sample means
• theoretical distribution
• we are plotting means of samples
Eugene
X
X
X X
XX X
XXX X
X
X
X
X
X
XX
XX
XX
XX
XX
XX
X
X
X
X
X
X
X
XX
XX X
XX X X
Melvin
23rd sample
2nd sample
Sampling distribution for continuous distributions
Central Limit Theorem: If random samples of a fixed N are drawn
from any population (regardless of the shape of the
population distribution), as N becomes larger, the
distribution of sample means approaches normality, with
the overall mean approaching the theoretical population
mean.
Distribution of
Raw Scores
Melvin
Eugene
X
X
X X
XX X
XXX X
X
X
X
X
X
XX
XX
XX
XX
XX
XX
Sampling
Distribution of
Sample means
23rd sample
X
X
X
X
X
X
X
XX
XX X
XX X X
2nd sample
Sampling distribution: is a theoretical probability distribution of
the possible values of some sample statistic that would
occur if we were to draw an infinite number of same-sized
samples from a population
Notice: SEM is smaller than SD
– especially as n increases
Mean = 100
Standard Deviation = 3
µ= 100
σ=3
X
X
X X
XX X
XXX X
X
X
X
X
X
XX
XX
XX
XX
XX
XX
100
Eugene
X
X
X
X
X
X
X
XX
XX X
XX X X
Melvin
23rd sample
An example of a
sampling distribution of sample means
Mean = 100
Standard Error
of the Mean = 1
2nd sample
µ = 100
=1
100
Central Limit Theorem
Proposition 1: If sample size (n) is large enough (e.g. 100)
The mean of the sampling distribution will
As n ↑
x will approach µ approach the mean of the population
Proposition 2: If sample size (n) is large enough (e.g. 100)
The sampling distribution of means will
As n ↑ curve
be approximately normal, regardless of the
will approach
shape of the population
normal shape
Proposition 3: The standard deviation of the sampling
distribution equals the standard deviation of
As n ↑ curve
variability gets the population divided by the square root of the
sample size. As n increases SEM decreases.
smaller
X
X
X X
XX X
XXX X
X
X
X
X
X
XX
XX
XX
XX
XX
XX
X
X
X
X
X
X
X
XX
XX X
XX X X