Mean is 6` Generally, (on average)

Introduction to Statistics for the Social Sciences
SBS200, COMM200, GEOG200, PA200, POL200, or SOC200
Lecture Section 001, Fall, 2014
Room 120 Integrated Learning Center (ILC)
10:00 - 10:50 Mondays, Wednesdays & Fridays.
http://www.youtube.com/watch?v=oSQJP40PcGI
Labs
Remember:
•
•
•
Bring electronic copy of your data
(flash drive or email it to yourself)
Your data should have correct formatting
See Lab Materials link on class website to doublecheck formatting of excel is exactly consistent
Schedule of readings
Before next exam (September 26th)
Please read chapters 1 - 4 in Ha & Ha textbook
Please read Appendix D, E & F online
On syllabus this is referred to as online readings 1, 2 & 3
Please read Chapters 1, 5, 6 and 13 in Plous
Chapter 1: Selective Perception
Chapter 5: Plasticity
Chapter 6: Effects of Question Wording and Framing
Chapter 13: Anchoring and Adjustment
Reminder
Use this as your
study guide
By the end of lecture today
9/17/14
Characteristics of a distribution
Central Tendency
Dispersion
Primary types of “measures of central tendency”?
Mean
Median
Mode
Measures of variability
Range
Standard deviation
Variance
Memorizing the four definitional formulae
Homework due – Monday (September 22nd)
On class website: please print and
complete homework worksheet # 6 & 7
Overview
Frequency distributions
Challenge yourself as we work through
characteristics of distributions to try to
categorize each concept as a measure of
1) central tendency
2) dispersion or 3) shape
Mean, Median,
Mode, Trimmed Mean
Standard deviation,
Variance, Range
Mean Absolute Deviation
Skewed right, skewed left
unimodal, bimodal, symmetric
The normal curve
Another example:
How many kids in your family?
Number of kids in family
1
4
3
2
1
8
4
2
2
14
14
4
2
1
4
2
2
3
1
8
Measures of Central Tendency
(Measures of location)
The mean, median and mode
Mean: The balance point of a distribution. Found
by adding up all observations and then dividing
by the number of observations
Mean for a sample:
Σx / n = mean = x
Mean for a population:
ΣX / N = mean = µ (mu)
Measures of “location”
Where on the number line
the scores tend to cluster
Note: Σ = add up
x or X = scores
n or N = number of scores
Measures of Central Tendency
(Measures of location)
The mean, median and mode
Mean: The balance point of a distribution. Found
by adding up all observations and then dividing
by the number of observations
Mean for a sample:
Σx / n = mean = x
41/ 10 = mean = 4.1
Number of kids in family
1
4
3
2
1
8
4
2
2
14
Note: Σ = add up
x or X = scores
n or N = number of scores
How many kids are in your family?
What is the most common family size?
Number of kids in family
1
3
1
4
2
4
2
8
2
14
Median: The middle value when observations are
ordered from least to most (or most to least)
How many kids are in your family?
What is the most common family size?
Number of kids in family
1
4
3
2
1
8
4
2
2
14
Median: The middle value when observations are
ordered from least to most (or most to least)
1, 3, 1, 4, 2, 4, 2, 8, 2, 14
1, 1, 2, 2, 2, 3, 4, 4, 8, 14
How many kids are in your family?
What is the most common family size?
Number of kids in family
1
3
4
1
3
4
2
2
1
4
8
2
4
8
2
2
14
Median: The middle value when observations are
ordered from least to most (or most to least)
1, 3, 1, 4, 2, 4, 2, 8, 2, 14
1, 1, 2, 2, 2, 3, 4, 4, 8, 14
2.5
Median always has a
percentile rank of 50%
regardless of shape of
distribution
2+3
µ=2.5
If there appears to be two medians,
take the mean of the two
Mode: The value of the most frequent observation
Number of kids in family
1
3
1
4
2
4
2
8
2
14
Bimodal distribution:
If there are two most
frequent observations
Score
1
2
3
4
5
6
7
8
9
10
11
12
13
14
f .
2
3
1
2
0
0
0
1
0
0
0
0
0
1
Please note:
The mode is “2” because
it is the most frequently
occurring score. It occurs
“3” times. “3” is not the
mode, it is just the
frequency for the value
that is the mode
What about central tendency
for qualitative data?
Mode is good for nominal or ordinal data
Median can be used with ordinal data
Mean can be used with interval or ratio data
Overview
Frequency distributions
Challenge yourself as we work through
characteristics of distributions to try to
categorize each concept as a measure of
1) central tendency
2) dispersion or 3) shape
Mean, Median,
Mode, Trimmed Mean
Skewed right, skewed left
unimodal, bimodal, symmetric
The normal curve
A little more about
frequency distributions
An example of a normal distribution
A little more about
frequency distributions
An example of a normal distribution
A little more about
frequency distributions
An example of a normal distribution
A little more about
frequency distributions
An example of a normal distribution
A little more about
frequency distributions
An example of a normal distribution
Measure of central tendency: describes how scores tend to
cluster toward the center of the distribution
Normal distribution
In all distributions:
mode = tallest point
median = middle score
mean = balance point
In a normal distribution:
mode = mean = median
Measure of central tendency: describes how scores tend to
cluster toward the center of the distribution
Positively skewed distribution
In all distributions:
mode = tallest point
median = middle score
mean = balance point
In a positively skewed distribution:
mode < median < mean
Note: mean is most affected by outliers or skewed distributions
Measure of central tendency: describes how scores tend to
cluster toward the center of the distribution
Negatively skewed distribution
In all distributions:
mode = tallest point
median = middle score
mean = balance point
In a negatively skewed distribution:
mean < median < mode
Note: mean is most affected by outliers or skewed distributions
Mode: The value of the most frequent observation
Bimodal distribution: Distribution with two most
frequent observations (2 peaks)
Example: Ian coaches two boys
baseball teams. One team is made up of
10-year-olds and the other is made up
of 16-year-olds. When he measured the
height of all of his players he found a
bimodal distribution
Overview
Frequency distributions
The normal curve
Mean, Median,
Mode, Trimmed Mean
Standard deviation,
Variance, Range
Mean Absolute Deviation
Skewed right, skewed left
unimodal, bimodal, symmetric
Dispersion: Variability
Some distributions are more variable than others
The larger the variability
the wider the curve tends to be
A
5’
5’6”
6’ 6’6”
7’
6’ 6’6”
7’
B
5’
5’6”
C
The smaller the variability
the narrower the curve tends to be
Range: The difference between
the largest and smallest
observations
Range for distribution A?
5’
5’6”
6’ 6’6”
7’
Range for distribution B?
Range for distribution C?
Wildcats Basketball team:
Tallest player
= 84” (same as 7’0”)
(Kaleb Tarczewski and Dusan Ristic)
Shortest player = 71” (same as 5’11”)
(Parker Jackson-Cartwritght)
Fun fact:
Mean is 78
Range:
The difference
between the largest
and smallest scores
84” – 71” = 13”
xmax - xmin = Range
Range is 13”
Baseball
Wildcats Baseball team:
Tallest player = 77” (same as 6’5”)
(Austin Schnabel)
Shortest player = 70” (same as 5’10”)
(Five players are 5’10”)
Fun fact:
Mean is 72
Range:
The difference between the
largest and smallest score
77” – 70” = 7”
xmax - xmin = Range
Please note:
No reference is
made to numbers
between the min
and max
Range is 7”
(77” – 70” )
Frequency distributions
The normal curve
Variability
Some distributions are more
variable than others
5’
5’6”
6’ 6’6”
7’
5’
5’6”
6’ 6’6”
7’
What might
this be?
Let’s say this is our
distribution of
heights of men on U
of A baseball team
Mean is 6
feet tall
5’
5’6”
6’ 6’6”
7’
What might
this be?
Variability
5’
5’6”
6’ 6’6”
7’
5’
5’6”
6’ 6’6”
7’
5’
5’6”
6’ 6’6”
7’
The larger the variability
the wider the curve
the larger the deviations
scores tend to be
The smaller the variability
the narrower the curve
the smaller the deviations
scores tend to be
Variability
Standard deviation: The average amount by which
observations deviate on either side of their mean
Generally, (on average) how far away
is each score from the mean?
Mean is 6’
Deviation scores
Diallo is 0”
Let’s build it up again…
U of A Baseball team
Diallo is 6’0”
Diallo’s deviation score is 0
6’0” – 6’0” = 0
Diallo
5’8”
5’10”
6’0”
6’2”
6’4”
Deviation scores
Diallo is 0”
Preston is 2”
Let’s build it up again…
U of A Baseball team
Diallo is 6’0”
Diallo’s deviation score is 0
Preston is 6’2”
Preston
Preston’s deviation score is 2”
6’2” – 6’0” = 2
5’8”
5’10”
6’0”
6’2”
6’4”
Deviation scores
Diallo is 0”
Preston is 2”
Mike is -4”
Hunter is -2
Let’s build it up again…
U of A Baseball team
Diallo is 6’0”
Diallo’s deviation score is 0
Hunter
Preston is 6’2”
Preston’s deviation score is 2”
Mike
Mike is 5’8”
Mike’s deviation score is -4”
5’8”
5’10”
6’0”
6’2”
6’4”
5’8” – 6’0” = -4
Hunter is 5’10”
Hunter’s deviation score is -2”
5’10” – 6’0” = -2
Deviation scores
Diallo is 0”
Preston is 2”
Mike is -4”
Hunter is -2
Shea is 4
David is 0”
Let’s build it up again…
U of A Baseball team
David
Shea
Diallo’s deviation score is 0
Preston’s deviation score is 2”
Mike’s deviation score is -4”
Hunter’s deviation score is -2”
Shea is 6’4”
5’8”
5’10”
6’0”
6’2”
6’4”
Shea’s deviation score is 4”
6’4” – 6’0” = 4
David is 6’ 0”
David’s deviation score is 0
6’ 0” – 6’0” = 0
Deviation scores
Diallo is 0”
Preston is 2”
Mike is -4”
Hunter is -2
Shea is 4
David is 0”
Let’s build it up again…
U of A Baseball team
David
Shea
Diallo’s deviation score is 0
Preston’s deviation score is 2”
Mike’s deviation score is -4”
Hunter’s deviation score is -2”
Shea’s deviation score is 4”
David’s deviation score is 0”
5’8”
5’10”
6’0”
6’2”
6’4”
Deviation scores
Diallo is 0”
Preston is 2”
Mike is -4”
Hunter is -2
Shea is 4
David is 0”
Let’s build it up again…
U of A Baseball team
5’8”
5’10”
6’0”
6’2”
6’4”
Standard deviation: The average amount
by which observations deviate on either side
of their mean
5’8”
5’10”
6’0”
6’2”
6’4”
Deviation scores
Diallo is 0”
Preston is 2”
Mike is -4”
Hunter is -2
Shea is 4
David is 0”
Standard deviation: The average amount
by which observations deviate on either side
of their mean
5’8”
5’10”
6’0”
6’2”
6’4”
Deviation scores
Diallo is 0”
Preston is 2”
Mike is -4”
Hunter is -2
Shea is 4
David is 0”
Standard deviation: The average amount
by which observations deviate on either side
of their mean
5’8”
5’10”
6’0”
6’2”
6’4”
Deviation scores
Diallo is 0”
Preston is 2”
Mike is -4”
Hunter is -2
Shea is 4
David is 0”
Deviation scores
Diallo is 0”
Preston is 2”
Mike is -4”
Hunter is -2
Shea is 4
David is 0”
Standard deviation:
The average amount by which
observations deviate on either
side of their mean
Mike
Hunter
5’8”
5’10”
6’0”
6’2”
6’4”
How do we find the average height?
Σx
= average height
N
How do we find the average spread? Preston
Σ(x - µ)
= average deviation
N
Σ (x - µ) = ?
5’8” 5’9” 5’10’ 5’11” 6’0” 6’1” 6’2” 6’3” 6’4” -
6’0” = - 4”
6’0” = - 3”
6’0” = - 2”
6’0” = - 1”
6’0 = 0
6’0” = + 1”
6’0” = + 2”
6’0” = + 3”
6’0” = + 4”
Σ(x - x) = 0
Σ(x - µ) = 0
Diallo
Deviation scores
Diallo is 0”
Preston is 2”
Mike is -4”
Hunter is -2
Shea is 4
David is 0”
Standard deviation:
The average amount by which
observations deviate on either
side of their mean
Σx- x=?
5’8”
Big
problem
5’10”
Σx
N
6’0”
6’2”
6’4”
Square the
deviations
2
Σ(x - µ)
N
2
Σ(x - x)
2
Σ(x - µ)
5’8” 5’9” 5’10’ 5’11” 6’0” 6’1” 6’2” 6’3” 6’4” -
6’0” = - 4”
6’0” = - 3”
6’0” = - 2”
6’0” = - 1”
6’0 = 0
6’0” = + 1”
6’0” = + 2”
6’0” = + 3”
Big
6’0” = + 4”
problem
Σ(x - x) = 0
Σ(x - µ) = 0
Standard deviation:
The average amount scores
deviate on either side of their mean
Mean:
The average value in the data
Mean is a measure of typical “value”
(where the typical scores are positioned on the number line)
Standard deviation is typical “spread”
(typical size of deviations or distance from mean)
– can never be negative
Standard deviation: The average amount by which
observations deviate on either side of their mean
These would be helpful
to know by heart –
please memorize
these formula
Standard deviation: The average amount by which observations
deviate on either side of their mean
What do these
two formula have
in common?
Standard deviation: The average amount by which observations
deviate on either side of their mean
n-1 is “Degrees of Freedom”
More, next lecture
What do these
two formula have
in common?