1.3 Density Curves and Normal Distributions

1.3 Density Curves and Normal Distributions
Ulrich Hoensch
Tuesday, January 22, 2013
Fitting Density Curves to Histograms
Advanced statistical software (NOT Microsoft Excel) can produce
“smoothed versions” of histograms.
Example The following are histograms and corresponding density
curves for data representing: (a) the acidity or rainwater; (b) the
survival time of Guinea pigs.
Fitting Density Curves to Histograms
When fitting a density curve to a histogram, we want that for any
interval on the horizontal axis that spans the width of a collection
of rectangles, the following holds:
area of rectangles ≈ area under density curve.
This requirement follows from the more general fact that for both
histograms and density curves,
area = proportion.
Definition of Density Curve
A density curve is a curve that
I
is always on or above the horizontal axis and
I
has area exactly 1 underneath it.
In addition, we have that for any two values a and b on the
horizontal axis,
area below the density curve between a and b ≈
proportion of observations that fall between a and b.
Median of a Density Curve
The median of a density curve is the point M on the horizontal
axis so that the area below the density curve and to the left of M
is 50% (and consequently the area to the right is also 50%).
50%
50%
Median
Percentiles of a Density Curve
The pth percentile of a density curve is the point P on the
horizontal axis so that p percent of the area below the density
curve lie to the left of P. The inter-quartile range is
consequently the extent of the middle 50% of the area.
50%
Q1
Q3
Mean of a Density Curve
The mean of a density curve is the “balance point” of the curve: if
the area below the curve were made of a solid material, the mean
would correspond to the position of the fulcrum when balancing it:
Mean and Median of a Density Curve
Unless a density curve is symmetric, the mean is not equal to the
median.
I
For right-skewed distributions the mean is larger than the
median;
I
For left-skewed distributions the mean is smaller than the
median.
Normal Distributions
Normal curves are the density functions of normal distributions.
They have the following general shape.
I
They are symmetric, unimodal (have only one peak), and
bell-shaped.
I
The mean is denoted by the symbol µ (small Greek letter
“mu”), and the standard deviation is denoted by the symbol
σ (small Greek letter “sigma”).
I
On either side of the mean there are two points, called
inflection points where the curve makes the transition from
bending upwards to bending downwards, and vice versa.
I
The standard deviation σ is the horizontal distance from the
mean µ to these inflection points.
Normal Distributions
Two normal curves are shown here.
The 68-95-99.7 Rule
Example: Height of Young Women
The height of young women aged 18 to 24 is approximately
normally distributed with mean µ = 64.5 inches and standard
deviation σ = 2.5 inches.
We write X ∼ N(µ, σ) if a variable X has a normal distribution
with mean µ and standard deviation σ. Consequently, we have
that for the height X of young women, X ∼ N(64.5, 2.5).
55
60
65
70
Example: Height of Young Women
Find the following, using a TI-83/TI-83 Plus/TI-84 Plus
calculator: The percentage of women that are between 60 and 65
inches tall.
55
60
65
70
1. Type [2ND] VARS (DISTR), select 2:
normalcdf(.
Example: Height of Young Women
2. Type normalcdf(60,65,64.5,2.5) and press ENTER.
3. The proportion is 0.5433 . . ., so the percentage of women who
are between 60 and 65 inches tall is about 54.3%. This
means, the shaded area is 54.3%.
54.3%
55
60
65
70
Note: The general syntax for finding the proportion between a and
b is normalcdf(a,b ,µ,σ ).
Example: Height of Young Women
Find the percentage of who are taller than 62 inches.
1. Type [2ND] VARS (DISTR), select 2:
normalcdf(.
2. Type normalcdf(62,1000000,64.5,2.5). (The number
1000000 can be replaced by any very large positive number.)
3. Press ENTER. The proportion is 0.8413 . . ., so the percentage
of women who are taller than 62 inches is about 84.1%.
84.1%
55
60
65
70
Example: Height of Young Women
What is the cutoff score for the top 10% (i.e. the 90th percentile)?
1. Type [2ND] VARS (DISTR), select 3:
invNorm(.
2. Type invNorm(0.9,64.5,2.5) and press ENTER.
Example: Height of Young Women
3. The percentile is 67.70 . . ., so 90% of women are shorter than
67.7 inches (and 10% are taller than 67.7 inches).
90%
55
60
65
70
The general syntax for finding the cutoff so that the proportion p
of observations fall below this cutoff is invNorm(p,µ,σ ).
Example: Height of Young Women
Find range of the middle 80% of the distribution. This means we
need to find the 10th and the 90th percentile.
80%
55
60
65
70
1. The 90th percentile was computed above, it is 67.7.
2. Type invNorm(0.1,64.5,2.5) to find the 10th percentile.
It is 61.29 . . . ≈ 61.3. So the middle 80% of the heights
ranges from 61.3 inches to 67.7 inches.