Probability Distributions and Statistics

Chapter 8
Probability Distributions and
Statistics
8.1
Distributions of Random Variables
I. Random Variables (r.v.s)
Def. A random variable (r.v.) is a function from a sample space of an experiment to real
numbers
X : S → R,
that is, a random variable X is a rule that assigns a number to each outcome of an experiment.
Ex. (Ex 1, p.418, finite discrete r.v.) A coin is tossed three times. Let the r.v. X denote the
number of heads that occur in the three tosses.
1. List the outcomes
2. Find the value assigned to each outcome
3. Find the event comprising the outcomes with X = 2.
Def. Usually, a r.v. X belongs to one of the three types:
1. X is finite discrete if it assumes only finitely many values (See Ex 1 above).
2. X is infinite discrete if it takes on infinitely many values which may be arrange in a
sequence.
3. X is continuous if the values it may assume comprise an interval of real numbers.
25
26
CHAPTER 8. PROBABILITY DISTRIBUTIONS AND STATISTICS
Ex. (Ex 2, p.418, infinite discrete r.v.) A coin is tossed repeatedly until a head occurs. Let
the r.v. Y denote the number of coin tosses in the experiment. What are the values of Y ?
Ex. (continuous r.v.) Let the r.v. X denote the life span of an iphone. The range of X is
[0, ∞). So X is a continuous random variable.
II. Probability Distribution of a Random Variable
In studying an experiment, it is often more convenient to study the probabilities associated
with the values of a random variable instead of the probabilities associated with the outcomes.
Def. The probability distribution of a random variable X is the collection of P (X = x)
for all possible values x of X.
Ex. (Ex 4, p.419) Find the probability distribution of the r.v. X in Ex 1.
III. Histograms
Let X be a discrete random variable. Then P (X = x) is called the probability mass
function (p.m.f.) of X. The histogram of X is the graph of the probability distribution
of X. For each possible X value x, we sketch a rectangular bar centered at x and with width
1 and height P (X = x). The area of this bar is exactly P (X = x). (See Figs 1,2,3 on p.421)
Ex. (HW 18, p.424) A survey of 1000 families was conducted to determined the distribution
of families by size. The results follow:
Family Size
2
3
4
5
6
7 8
Frequency of Occurrence 350 200 245 125 66 10 4
1. Find the prob dist of the r.v. X, the number of persons in a randomly chosen family.
2. Draw the histogram of X.
3. Find P (X ≥ 5) and P (3 ≤ X ≤ 6) from the histogram.
HW. 8.1:
SC 1, 2,
EX 3, 5, 13, 15
8.2. EXPECTED VALUE
8.2
27
Expected Value
I. Average/Mean
Def. The average/mean of numbers x1 , x2 , · · · , xn is
x̄ =
x1 + x2 + · · · + xn
.
n
Ex. (Applied Ex 1, p.428) Find the average number of cars waiting in line at the bank’s drivein teller at the beginning of each 2-minute interval during the period in question.
Cars
0 1 2
3 4 5 6 7 8
(Answer: (0·2)+(1·9)+···+(8·1)
≈ 3.1)
60
Freq. of Occur. 2 9 16 12 8 6 4 2 1
II. Expected Value
Def. Let X be a finite discrete r.v. that assumes the values x1 , x2 , · · · , xn with associated
probability p1 , p2 , · · · , pn , then the expected value of a random variable X is
E(X) = x1 p1 + x2 p2 + · · · + xn pn .
Ex. (Ex 2, p.428) In example 1, let X denote the number of cars in a 2-minutes interval.
Then
2
9
1
0·
+1·
+ ··· + 8 ·
≈ 3.1.
60
60
60
In the histogram of X (Fig 6, p.429), E(X) is the mass center (i.e. the place of balance).
Ex. (HW6, p.437) In a four-child family, what is the expected number of boys?
Ex. (HW10, p.437) The weekly demand for a certain magazine is:
Quantity Demanded 10 11 12 13 14 15
Probability
.05 .15 .25 .30 .20 .05
Find the number of issues of the magazine that the news-stand owner can expect to sell per
week.
Ex. (Applied Ex 7, p.432, Fair Games) Mike and Bill play a card game with a standard deck
of 52-cards. Mike select a card from a well-shuffled deck and receives A dollars from Bill if
the card is a diamond; otherwise, Mike pays Bill a dollar. Determined the value of A if the
game is to be fair.
28
CHAPTER 8. PROBABILITY DISTRIBUTIONS AND STATISTICS
III. Median and Mode
Def. The median of a group of well-ordered numbers x1 , x2 , · · · , xn is
1. the middle number if n is odd, or
2. the mean of the two middle numbers if n is even.
Ex. (Applied Ex 10, p.435)
Def. The mode of a group of numbers is the number in the group that occurs most frequently.
Ex. (Ex 11, p.436)
HW. 8.2:
SC 1, 2
EX 11, 13, 25, 27
8.3. VARIANCE AND STANDARD DEVIATION
8.3
29
Variance and Standard Deviation
I. Variance
The variance of a random variable X measures the dispersion of the r.v. (Fig 9, p.440)
Def. Suppose a r.v. X has the probability distribution
x
x 1 x2 x3 · · ·
P (X = x) p1 p2 p2 · · ·
xn
pn
and expected value E(X) = µ. Then the variance of the r.v. X is
Var(X) = p1 (x1 − µ)2 + p2 (x2 − µ)2 + · · · + pn (xn − µ)2
= (p1 x21 + p2 x22 + · · · + pn x2n ) − µ2 .
Ex. (Ex 1, p.442) Find the variance of the r.v.s X and Y associated with the histograms in
Fig 9a-b p.440.
II. Standard Deviation
Def. Let a r.v. X has the probability distribution:
x
x 1 x2 x3 · · ·
P (X = x) p1 p2 p2 · · ·
xn
pn
The standard deviation of X is defined by
p
p
σ = Var(X) =
p1 (x1 − µ)2 + p2 (x2 − µ)2 + · · · + pn (xn − µ)2
q
=
(p1 x21 + p2 x22 + · · · + pn x2n ) − µ2 .
Ex. Find the standard deviation of the r.v.s X and Y in (Ex 1, p.440).
Ex. (HW 20, p.448) The yearly average rents in the greater Boston area are:
Year
2002 2003 2004 2005 2006
Average Rent, $ 1352 1336 1317 1308 1355
Find the average rent, the variance, and the standard deviation for the 5 years in question.
HW. 8.3:
1, 2
7, 9, 17, 29
30
CHAPTER 8. PROBABILITY DISTRIBUTIONS AND STATISTICS
8.4
The Binomial Distribution
An important class of experiments have two outcomes, e.g. win/lost, defective/nondefective,
head/tail, etc. For example, Ex 7, p.432.
Def. A Bernoulli trial is an experiment with two outcomes named as success and failure.
The probability of success is a fixed number p (so the probability of failure is q = 1 − p).
Def. A binomial experiment consists of n independent, identical Bernoulli trials, in which
each Bernoulli trial has two outcomes: “success” with probability p and “failure” with probability q = 1 − p.
Ex. (Ex 1, p.453) A fair die is rolled four times. Compute the probability of obtaining exactly
one 6 in the four throws. (Answer: .386)
Def. Let the r.v. X denote the number of successes in a binomial experiment of n trials (each
with success rate p). X is called a binomial random variable. The probability distribution
of X is called a binomial distribution, labeled by B(n, p).
Ex. In Ex 1, the binomial r.v. X has the binomial distribution B(4, 1/6). The probability to
get exactly one 6 is
P (X = 1) = C(4, 1)(1/6)1 (5/6)3 = .386.
Thm 8.1. (P (X = x) for X ∼ B(n, p)) In a binomial experiment in which the success rate
is p, the probability of exactly x successes in n independent trials is
P (X = x) = C(n, x)px q n−x .
Ex. (Ex 2, p.455) A fair die is rolled five times. If a 1 or a 6 lands uppermost in a trial, then
the throw is considered a success. Otherwise, the throw is considered a failure.
1. Find the probability of obtaining exactly 0, 1, 2, 3, 4, and 5 successes, respectively.
(Answer: P (X = 0) = .132, P (X = 1) = .329, P (X = 2) = .329, P (X = 3) = .165,
P (X = 4) = .041, P (X = 5) = .004.)
2. Construct the binomial distribution for this experiment and draw the histogram.
Table 1 of Appendix D gives Binomial Probabilities.
Ex. (HW 24, p.460) 40% of the people who are browsing in Kramer’s Book Mart will make
a purchase. What is the probability that, among 10 peoples who are browsing in the store, at
least three will make a purchase?
Thm 8.2. Suppose a binomial r.v. X ∼ B(n, p). Let q = 1 − p.
8.4. THE BINOMIAL DISTRIBUTION
31
• The mean (expected value) of X is µ = E(X) = np.
• The variance of X is Var(X) = npq.
• The standard deviation of X is σX =
√
npq.
Ex. (HW 20, p.460) Let the r.v. X denote the number of girls in a five-child family. If the
probability of a female birth is .5,
1. Find the probability of 0, 1, 2, 3, 4, and 5 girls in a five-child family.
2. Construct the binomial distribution and draw the histogram.
3. Compute the mean and the standard deviation of X.
HW. 8.4:
SC 1, 2
EX 3, 5, 11, 13, 25, 33
32
CHAPTER 8. PROBABILITY DISTRIBUTIONS AND STATISTICS
8.5
The Normal Distribution
I. Probability Density Functions (p.d.f.) for Continuous Random Variables
Recall that a continuous random variable is a continuous function X : S → I from a sample
space S to an interval I of real numbers (e.g. the life span of a computer is a continuous r.v.).
Def. The probability density function (p.d.f.) of a continuous r.v. X : S → I is
f (x) = F 0 (x)
where F (x) = P (X ≤ x) is called the cumulative distribution function (c.d.f ) of X.
? Properties:
1. f (x) ≥ 0 for all x, and f (x) = 0 if x is not in I.
R∞
2. −∞ f (x)dx = 1, that is, the area of the region between the graph of f and the x-axis
is equal to 1 (Fig 13).
The role of p.d.f. to a continuous r.v. is similar to the role of probability mass function to
a discrete r.v.
II. Normal Distributions
The most important class of distributions are called normal distributions.
Def. A continuous r.v. X is said to follow a normal distribution with mean µ and variance
σ 2 , denoted as X ∼ N (µ, σ 2 ), if X has the p.d.f.
(x−µ)2
1
f (x) = √ e− 2σ2
σ 2π
(the formula is not required to memorize for this course).
The graph of a normal distribution is called a normal curve.
? Properties of the normal curve of N (µ, σ 2 ):
1. The curve is bell shape (Fig 15, p.463).
2. The curve has a peak at x = µ. It is symmetric w.r.t. the line x = µ.
3. The area under the curve is 1.
8.5. THE NORMAL DISTRIBUTION
33
4. 68.27% of the area under the normal curve lies within [µ − σ, µ + σ], 95.45% of the area
lies within [µ − 2σ, µ + 2σ], 99.73% of the area lies within [µ − 3σ, µ + 3σ].
Figs 16 and 17 (p.464) show some different normal curves.
Def. The normal distribution N (0, 1) (with mean 0 and variance 1) is called the standard
normal distribution. The p.d.f. of a r.v. X ∼ N (0, 1) is
1
2
f (x) = √ e−x /2 .
2π
Important: Table 2 of Appendix D gives the probabilities P (Z < z) (or P (Z ≤ z))
of the standard normal distribution Z ∼ N (0, 1).
Ex. (Ex 1, p.465) X ∼ N (0, 1). Sketch the appropriate regions and find the values of
a. P (Z < 1.24)
b. P (Z ≥ 0.5)
c. P (0.24 < Z < 1.48)
d. P (−1.65 < Z < 2.02)
Ex. (HW 16, p.470) Let Z be the standard normal variable. Find the values of z if z satisfies
a. P (Z > z) = .9678
b. P (−z < Z < z) = .8354
Thm 8.3. If a r.v. X ∼ N (µ, σ 2 ), then the r.v. Z =
X−µ
σ
∼ N (0, 1).
Ex. (HW 18, p.470) Suppose X is a normal random variable with µ = 380 and σ = 20. Find
the value of
a. P (X < 405)
b. P (400 < X < 430)
c. P (X > 400)
HW. 8.5:
SC 1, 2
EX 1, 3, 5, 13, 17, 19
34
8.6
CHAPTER 8. PROBABILITY DISTRIBUTIONS AND STATISTICS
Applications of Normal Distribution
I. Applications involving Normal Random Variables
Ex. (EX 1, p.471) At the Kaiser Memorial Hospital, the infants’ birth weight in pounds are
normally distributed with a mean of 7.4 and a standard deviation of 1.2. Find the probability
that an infant selected at random weight more than 9.2 pounds at birth.
Ex. (HW 2, p.477) In a certain city, the weekly wages of factory worker are normally distributed with a mean of $600 and a standard deviation of $50. What is the probability that a
factory worker selected at random from the city makes a weekly wage
a. Of less than $600?
b. Of more than $760?
c. Between $550 and $650?
Ex. (HW 10, p.477) The scores on an economics examination are normally distributed with
a mean of 72 and a standard deviation of 16. If the instructor assigns a grade of A to 10% of
the class, what is the lowest score a student may have and still obtain an A?
II. Approximating Binomial Distributions
A binomial random variable X ∼ B(n, p) has the probability mass function:
P (X = x) = C(n, x)px q n−x ,
x = 0, 1, 2, · · · , n,
q = 1 − p.
The mean, variance, and standard deviation of X are:
µX = np,
2
Var(X) = σX
= npq,
σX =
√
npq.
When n is large, the histogram of a binomial distribution looks like a normal curve (Figs 30,
31, p.473).
Thm 8.4. When n is large, the binomial distribution B(n, p) may be approximated in the
histogram by a normal distribution with the same mean and the same variance:
B(n, p) ≈ N (np, npq).
According to histograms, if X ∼ B(n, p) and Y ∼ N (np, npq), then for any integers
x ∈ [0, n],
P (X ≥ x) ≈ P (Y ≥ x − 0.5),
P (X > x) = P (X ≥ x + 1) ≈ P (Y ≥ x + 0.5);
P (X ≤ x) ≈ P (Y ≤ x + 0.5),
P (X < x) = P (X ≤ x − 1) ≈ P (Y ≤ x − 0.5).
Ex. (HW 14, p.478) A fair coin is tossed 20 times. What is the probability of obtaining
a. Fewer than 8 heads? b. More than 6 heads? c. Between 6 and 10 heads inclusive?
8.6. APPLICATIONS OF NORMAL DISTRIBUTION
35
Ex. (HW 15, p.478) A marksman’s chance of hitting a target with each of his shots is 60%.
If he fires 30 shots, what is the probability of his hitting the target
a. At least 20 times?
b. Fewer than 10 times?
c. Between 15 and 20 times, inclusive?
Ex. (HW 21, p.478, optional) An experiment is conducted to test the efectiveness of a new drug
in treating a certain disease. The drug was administered to 50 mice that had been previously
exposed to the disease. It was found that 35 mice subsequently recovered from the disease. It
has been determined that the natural recovery rate from the disease is 0.5.
a. Determine the probability that 35 or more of the mice not treated with the drug would
recover from the disease.
b. Using the results obtained in part (a), comment on the effectiveness of the drug in the
treatment of the disease.
HW. 8.6:
SC 1, 2
EX 1, 5, 11, 13, 17