Document

Introduction to Statistics for the Social Sciences
SBS200 - Lecture Section 001, Fall 2016
Room 150 Harvill Building
10:00 - 10:50 Mondays, Wednesdays & Fridays.
https://www.youtube.com/watch?v=RhyRx42H-EQ
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
By the end of lecture today
10/21/16
Law of Large Numbers
Central Limit Theorem
Before next exam (November 18th)
Please read chapters 1 - 11 in OpenStax textbook
Please read Chapters 2, 3, and 4 in Plous
Chapter 2: Cognitive Dissonance
Chapter 3: Memory and Hindsight Bias
Chapter 4: Context Dependence
Everyone will want to be enrolled
in one of the lab sessions
Labs continue
Next week
With Project 3
Homework
On class website:
Homework Assignments 15 & 16
Please complete the homework modules on the D2L website
HW15-Confidence Intervals
Please complete this homework worksheet 16 - Confidence Intervals
Both are due: Monday, October 24th
Central Limit Theorem
Central Limit Theorem: If random samples of a fixed N are drawn
from any population (regardless of the shape of the
population distribution), as N becomes larger, the
distribution of sample means approaches normality, with
the overall mean approaching the theoretical population
mean.
Distribution of
Raw Scores
Melvin
Eugene
X
X
X X
XX X
XXX X
X
X
X
X
X
XX
XX
XX
XX
XX
XX
Sampling
Distribution of
Sample means
23rd sample
X
X
X
X
X
X
X
XX
XX X
XX X X
2nd sample
Central Limit Theorem
Proposition 1: If sample size (n) is large enough (e.g. 100)
The mean of the sampling distribution will
As n ↑
x will approach µ approach the mean of the population
Proposition 2: If sample size (n) is large enough (e.g. 100)
The sampling distribution of means will
As n ↑ curve
be approximately normal, regardless of the
will approach
shape of the population
normal shape
Proposition 3: The standard deviation of the sampling
distribution equals the standard deviation of
As n ↑ curve
variability gets the population divided by the square root of the
sample size. As n increases SEM decreases.
smaller
X
X
X X
XX X
XXX X
X
X
X
X
X
XX
XX
XX
XX
XX
XX
X
X
X
X
X
X
X
XX
XX X
XX X X
Central Limit Theorem: If random samples of a fixed N are drawn from
any population (regardless of the shape of the population distribution), as
N becomes larger, the distribution of sample means approaches normality,
with the overall mean approaching the theoretical population mean.
Distribution of
Raw Scores
Animation for creating
sampling distribution
of sample means
Eugene
Distribution of
single sample
Sampling
Distribution of
Sample means
Mean for
sample 12
Mean for
sample 7
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
Sampling
Distribution of
Sample means
What are the three propositions
of the Central Limit Theorem?
As n goes up …
1. Sample mean approaches
true population mean
2. Curve becomes more “normal”
3. Variability goes down
(includes standard deviation, variance,
width of the curve, and random error)
What is the formula for the
standard error of the mean?
What are confidence intervals for…
Estimating a value (could be a single score, a mean
of a sample or a mean of a population) by
providing a range within we believe (with a certain
level of confidence) it falls
Confidence Intervals: What are they used for?
We are estimating a value by providing two scores between which we
believe the true value lies. We can be 95% confident that our mean falls
between these two scores.
95% Confidence Interval: We can be 95% confident that our
population mean falls between these two scores
99% Confidence Interval: We can be 99% confident that our
population mean falls between these two scores
Confidence Intervals: What are they used for?
We are using this to estimate a value such as a mean,
with a known degree of certainty with a range of values
• The interval refers to possible values of the population mean.
• We can be reasonably confident that the population mean
falls in this range (90%, 95%, or 99% confident)
•
In the long run, series of intervals, like the one we
figured out will describe the population mean about 95%
of the time.
Greater confidence
implies loss of precision.
(95% confidence is most
often used)
Can actually generate
CI for any confidence
level you want –
these are just the
most common
How to find the two scores that border the
middle 95% of the curve up …
Normal distribution
Raw scores
z-scores
Have z
Find raw score
Formula
probabilities
Z
Scores
z table
Have z
Find area
Have area
Find z
Have raw score
Find z
Raw
Scores
Area &
Probability
.
Try this one:
Please find the (2) raw scores that border exactly the middle 95% of the curve
Mean of 30 and standard deviation of 2
Go to
.4750
nearest z = 1.96
table
mean + z σ = 30 + (1.96)(2) = 33.92
.4750
Go to
table
nearest z = -1.96
mean + z σ = 30 + (-1.96)(2) = 26.08
.
Try this one:
Please find the (2) raw scores that border exactly the middle 99% of the curve
Mean of 30 and standard deviation of 2
Go to
.4950
nearest z = 2.58
table
mean + z σ = 30 + (2.58)(2) = 35.16
.4950
Go to
table
nearest z = -2.58
mean + z σ = 30 + (-2.58)(2) = 24.84
.
Please find the raw scores that Which is
border the middle 95% of the curve wider?
Please find the raw scores that
border the middle 99% of the curve
.
Please find the raw scores that
border the middle 95% of the curve
95% Confidence Interval:
We can be 95% confident that the
estimated score really does fall
between these two scores
99% Confidence Interval:
We can be 99% confident that the
estimated score really does fall
between these two scores
Please find the raw scores that
border the middle 99% of the curve
Part 1
Find the scores that border
the middle 95%
Mean = 50
Standard deviation = 10
x = mean ± (z)(standard deviation)
95%
30.4
?
69.6
?
.9500
.4750
?
.4750
?
1) Go to z table - find z score for
for area .4750
z = 1.96
Please note:
We will be using this
same logic for
“confidence intervals”
2) x = mean + (z)(standard deviation)
x = 50 + (-1.96)(10)
x = 30.4
3) x = mean + (z)(standard deviation)
x = 50 + (1.96)(10)
x = 69.6
Scores 30.4 - 69.6 capture the
middle 95% of the curve
Construct a 95% confidence interval
Mean = 50
Standard deviation = 10
n = 100
s.e.m. = 1
.9500
.4750
.4750
?
95%
48.04
?
51.96
?
For “confidence intervals”
same logic – same z-score
But - we’ll replace standard deviation
with the standard error of the mean
x = mean ± (z)(s.e.m.)
x = 50 + (1.96)(1)
x = 51.96
x = 50 + (-1.96)(1)
x = 48.04
95% Confidence Interval
is captured by the scores 48.04 – 51.96
standard error
of the mean
=
σ
n
=
10
100
Confidence
interval
uses SEM
29.2
80.8
29.2
80.8
Upper boundary raw score
x = mean + (z)(standard deviation)
x = 55 + (+ 2.58)(10)
x = 80.8
Lower boundary raw score
x = mean + (z)(standard deviation)
x = 55 + (- 2.58)(10)
x = 29.2
29.2
80.8
Upper boundary raw score
x = mean + (z)(standard error mean)
51.3
58.7
x = 55 + (+ 2.58)(1.42)
x = 58.7
Lower boundary raw score
x = mean + (z)(standard error mean)
x = 55 + (- 2.58)(1.42)
x = 51.3
10
49
1.42
51.3
58.7
Confidence Intervals: A range of values that, with a known degree of
certainty, includes an estimated value (like a mean)
• How can we make our confidence interval smaller?
• Decrease variability (make standard deviation smaller)
1. Increase sample size (This will decrease variability)
2. Very careful assessment and measurement practices
(improve reliability will minimize noise)
• Decrease level of confidence
.
95%
95%