Advanced Probability and Statistics Module 2

Advanced Probability and Statistics Module 6
This module covers topics from chapters 7 & 8 that haven’t been covered previously, as well as some additional topics developed herein
(including some very cool probability, set theory, and logic). And away we go.
Some of the people at Urbana High School are musicians (elements of set M), some are athletes (A), some are members of clubs (C), and,
sadly, some are none of these. Supposition #1: There are 100 musicians, 250 athletes, 75 club members, and 1300 total students.
Supposition #2: There are 30 kids involved in music and sports and clubs; 40 kids are musician athletes; 34 of the club kids don’t do
sports; and, of the club members who are musicians, 5/6 of them play sports.
1. Draw a Venn diagram with labels to represent A, M, and C. Put set labels on the outsides of the overlapping circles since putting a label
on the inside would imply that the set represents only a portion of the circle. You do not have to include numbers in your diagram but
for your own purposes you’ll want to do this on scratch paper.
2. Because of the overlap in the three student categories, these categories are not _______ _______ .
3. Working only with supposition #1, determine the minimum and maximum numbers of students who are neither athletes, nor musicians,
nor clubbers.
4. Using suppositions #1 and #2, find how many students are neither athletes, nor musicians, nor clubbers.
5. In terms of student activities, what does M’ represent, where the prime indicates the complement of the set?
6. Find #(M’) and, for a randomly chosen student x, find P(xM’).
7. Write an equation that relates #(M), #(A), #(M  A), and #(M  A) where these could be sets of musicians and athletes or any sets at
all. Use symbols rather than numbers and start with #(M  A) = …. Explain why you have to add and subtract.
8. Restate your equation in words (in terms of: and’s, or’s, musicians, and athletes).
9. What is the corresponding probability equation?
10. Write an equation that relates #(M), #(A), #(C), #(M  A), #(M  C), #(A  C), #(M  A  C), and #(M  A  C). Start with
#(M  A  C) = …. Explain why you have to add, subtract, and add.
In set theory there are two simple laws called De Morgan’s laws. For sets A and B, the first one is (A  B)’ = A’  B’. The second one is
(A  B)’ = A’  B’. It’s easy to see why these are true. Informally speaking, the first one simply says that if you’re not in the region that
includes either A or B, then you’re not in A and you’re not in B. To prove it formally we’ll use the fact that sets S and T are equal if they
are subsets of each other. That is, if S  T, and T  S, then S = T. Here we go. Let x(A  B)’. This means x(A  B). Since A  (A  B),
xA, meaning xA’. Similarly, since B  (A  B), xB, meaning xB’. Thus, x (A’  B’). Since an arbitrary element of (A  B)’ has
been shown also to be an element of (A’  B’), this proves (A  B)’  (A’  B’). Now for the other direction. Let y(A’  B’). So yA’
and yB’. But if y were an element of A  B, then y would be in A or in B, which would be a contradiction. Hence, y(A  B)’ and
(A’  B’)  (A  B)’. Therefore, (A  B)’ = A’  B’ and the first law is proven.
State informally why the second of De Morgan’s laws above is true. A Venn diagram may be helpful.
To prove it formally the way I did the first one, you’d have to show that ______  ______ and that ______  ______ .
One way to show that a set S is a subset of a set T is to show that an arbitrary element of ___ is also and element of ___ .
A quicker way to prove the second law is to use the first (now that it has been proven). Since the first law holds for any two sets, we
can replace A with A’ and B with B’ yielding (A’  B’)’ = (A’)’  (B’)’ which equals A  B. Thus (A’  B’)’ = A  B, which implies
that A’  B’ = (A  B)’ and the second law is proven. In this proof I assumed that the complement of the complement of a set is
______ and in the last line of the proof I also assumed that for any sets S and T, S ’ = T implies that S = _____ . Both of these
assumptions can be proven, of course, but I won’t require it. Venn diagrams make them intuitively obvious.
15. There are also De Morgan’s laws in logic, and they are completely analogous to the laws in set theory. Rather than sets, in logic we
deal with statements that are either true or false. The first law is ~(p  q)  ~p  ~q. What does this law say in words?
16. What is the corresponding second law in logic?
17. Prove the first De Morgan law (for logic) with a truth table.
11.
12.
13.
14.
In probability it is often very convenient to work with the complement of a set. The birthday problem is a classic example. Suppose we
want to know the probability of at least one matching birthday occurring in a roomful of people. To solve this problem directly would be
difficult. Instead, we’ll use the fact that the complement of “there is at least one match” is “there are no matches at all.” The latter is much
easier to work with. Let X be a random variable corresponding to the number of matches (birthday repeats), and let n be the number of
people in the room. The sample space, then, is {0, 1, 2, …, n}, and P(at least one match) = P(X  1) = 1 - P(X = 0). Here’s how we find
P(X = 0). (Let’s assume we don’t have to deal with leap years and that all birthdays are equally likely.) We ask anyone in the room his/her
birthday, which could be any date. Then we ask a second person. The probability that the second person’s birthday does not match the first
person’s is 364/365. The probability that the third person matches neither the first nor second is 363/365, and the probability that the fourth
doesn’t match any of the previous people is 362/365. Continuing in this manner, the probability that the nth person doesn't match either of
the previous n – 1 people is (365 – n + 1)/365. The product of these probabilities is the probability that none of the people match any of the
other people.
18. How many people must gather together to ensure there is at least one match? Hint: no calculation necessary.
19. What is the minimum number of people necessary in order to ensure that there is at least a 50% chance that there is at least one pair of
people have matching birthdays at the party?
20. Using the same type of reasoning, find the probability of rolling three ordinary dice and getting a sum of 4, 5, 6, …, or 18?
Suppose that on a typical Saturday night in Chambana, 35% of the people go out to eat at restaurant, 40% have pizza delivered, and the
remaining 25% make their own food (like a giant salad with purple cabbage, mixed greens, spinach, tomato, garlic, and olive oil).
Afterwards, 40% of the restaurant people go to a movie, and the rest go home. 80% of the pizza people go roller skating, and the rest go
pogosticking. 39% of the salad people take their dogs for a run; 16% go to bed early; and the rest read a book about space-time physics.
21. Create a probability tree for Chambana Saturday night activities. Set it up similar to how it’s done on page 213. That is, put the
probabilities next to the line segments and the activities in the circles.
22. A random Chambana resident is chosen. What is the probability that he or she has cabbage salad for dinner before settling down to a
good relativity book?
23. What is the probability that the randomly chosen resident has pizza or goes to the movies?
24. A joint probability table is not well suited to analyzing the Saturday night activities. Create your own simpler scenario (in words) and
display the probabilities in a table.
Some people wrongly think that, after seeing a fair coin land heads four times in a row, odds are that the next flip will be tails, since “tails
are overdo.” They think that five heads in a row is too unlikely, so they would bet on tails for the fifth flip. Clearly, though, coins have no
recollection of how they’ve landed in the past. Even if they did, they have no control over how they land. Heads is just as likely on the fifth
flip as it was on any of the others, so its probability is ½. “But,” objects the skeptic, “you taught me that the probability of getting five
heads in a row with a fair coin is 1/25 = 1/32, so I’d be crazy to bet on heads the fifth time!” In all your mathematical wisdom you reply,
“Tis true that the probability of five heads in a row is 1/32, and you would be wise to be against this from the start. However, we already
know that heads has come up four times, so what we’re really interested in is a conditional probability—the probability of five heads in a
row given that four heads have already occurred. The formula needed is P(A | B) = P(A  B)  P(B).”
25. Define events A and B appropriately for this coin situation, use the formula, and show the answer is 0.5.
26. From the UHS student activity problem earlier, what is the probability that a randomly chosen musician is an athlete and how does this
compare to a randomly chosen student being an athlete?
27. What is the probability that a randomly chosen club member is either an athlete or a musician?
28. Explain informally why the conditional formula works.
29. It is wrong to conclude that for any events A and B, P(A | B) = P(B | A). Of course, a single counterexample is sufficient to prove a
statement is not true. Use the UHS example or create your own scenario to show this equation is false.
Schmedrick is building a robot. Everything seems to be working, except, due to a programming glitch, when the robot hears a noise above
70 dB, it goes berserk and destroys everything in its vicinity. Since he can’t find the bug in the robotic hearing program, Schmed decides to
install a high-tech acoustic dampener on his robot. According to the manufacturer, sounds over 70 dB are blocked out 93% of the time that
it is exposed to noises that loud. Just to be extra safe, Schmed installs three of them. Unfortunately, on its first test run with the dampeners,
the robot accidentally runs over Schmedrick’s cat’s tail. Despite the acoustic dampeners, the cat’s 72 dB shriek causes the robot to freak
out. In the ensuing carnage, most of Schmedrick’s living room is ripped to shreds, and the cat gets eaten, but Schmedrick escapes
unscathed.
30. The fact that the robot went berserk is evidence of the failure of all three acoustic dampeners. Say we define a random experiment to
be observing whether or not a particular dampener works properly when exposed to a sound over 70 dB, and we perform this
experiment 300 times. Then we define a random variable, X, to be the sum of dampener failures. (We could use the same dampener
over and over, or a different one each time, so long as we assume the dampeners are all identical.) Review questions: What is the
expected value of X, which is written E(X), what is the name of the distribution X has, and what is P(X = 28)?
31. Now let’s hook up three identical dampeners together and put them on the robot. Thus, the only way for the robot to go nuts is for all
three dampeners to fail when exposed to a loud sound. Let’s think of success/failure of each dampener as an event. If we assume that
the failure or success of any one dampener has no effect on the other two, then the three events are ______ , and the probability of all
three failing together is ______ .
32. From the answer to the last question, it is clear that Schmedrick is either very, very unlucky, or he made a bad assumption. Bad
assumptions like these can lead to a false sense of security when it comes to backup systems. (This was an issue with the space shuttle
Challenger that exploded in 1986.) What bad assumption did Schmed most likely make? Explain briefly.
33. Let event A = temp in Urbana is over 90, B = Sox beat the Yankees, and C = snow shovels are selling like hotcakes at Farm & Fleet.
Let P(A) = a, P(B) = b, and P(C) = c. Which of the following is true and why? P(A  B) = ab; P(A  C) = ac; P(B  C) = bc.
34. Give an example a pair of dependent events in which one causes the other. Then come up with pair of dependent events in which there
is no causation.
35. The “Three-Door Problem” on page 233 is sometimes called the “Monte Hall Problem,” (after the host of Let’s Make a Deal). Let’s
assume Monte never opens the door with the car behind it. One way to explain why you should switch is that there are only two ways
to win: you pick and stick, or you pick and switch. Since you can’t do both, and you can’t do anything else, the probabilities of these
two events must sum to one. Clearly, if you just pick and stick your chances of winning are 1/3. This forces the chances of winning
when you pick and switch to be 2/3. When I first heard this puzzle, it seemed like a paradox. My argument is valid. However, the
36.
37.
38.
39.
analysis in the book should convince any skeptics. Read the analysis and tell me if it makes sense to you, whether or not you think it’s
a cool problem, and what you would have done as a contestant on the show if you hadn’t already been told the answer.
Give me one example of each type of probability: theoretical, empirical, and subjective.
What’s the difference between mutually exclusive events and independent events? Explain without formulae.
Prove that the two equivalent definitions of independence on page 230 are indeed equivalent. It’s a short proof.
Technically, “probability” is not the exact same thing as “odds.” Example: The probability of rolling a 4 on a fair die is 1/6, while the
odds are 5:1 against it happening (5 unfavorable outcomes compared to one favorable outcome). In other words, you’d expect a 4 one
out of every six rolls, but you’d expect to see five “non-fours” for every one 4. If a boxer’s odds of winning a fight are rated at 8:3 in
his favor, what is the probability of his winning?
In the past we’ve dealt with weighted averages. For example, you might create a 1 to 10 rating system for rock bands based on
showmanship (s), musical skills (m), and song writing ability (w), where you assign a value each variable from 1 to 10. If you feel each of
those criteria are equally important, your rating might be defined as (s + m + w)/3, but if showmanship doesn’t matter much to you, and
you feel song writing is a little more important than musical skills, you might define the rating as (3s + 10m + 12w)/25. This is a weighted
average. Note that the coefficients 3/25, 10/25, and 12/25 sum to one, just like probabilities do. In statistics, expected value is a weighted
average of outcomes. The “weights” are the probabilities of their respective outcomes. Here’s an example analogous to the rock band
rating. Let’s say you have 25 ping pong balls: 3 red, 10 blue, and 12 white. The game is that a ball is drawn at random and you get 80
points for a red ball, 40 points for a blue ball, and 60 points for a white one. The expected number of points upon drawing a ball is (380 +
1040 + 1260)/25 = 54.4. Notice how similar this is to the rating formula. Obviously, it is impossible to get 54.4 points on a single draw,
but after many draws, you should expect to get, on average, 54.4 points per draw. Notice also that the formula for expected value is
(probability of getting 80 on a draw  80 points) + (probability of getting 40 on a draw  40 points) + (probability of getting 60 on a draw
 60 points). In general, for a random variable X, the expected value of X = E(X) = [x f(x)] for all possible outcomes x, where f(x) is the
probability density function. In the ping pong example, x can be 40, 60, or 80 points; f(40) = 10/25, f(60) = 12/25, and f(80) = 3/25. E(X) is
also called the mean of the distribution and is denoted by .
40. Suppose you’re going to roll a fair six-sided die many times. Since all outcomes are equally likely, we have a uniform distribution
(horizontal line, rather than a bell curve or some other curve). Write a concise, simple formula for the probability density function.
41. Use the mathematical definition to show that the expected value for the die is 3.5.
42. You’re about to take a 100 question, multiple choice, history test. You’re doing so well in the class you decide you’re not even going
to read the questions. You simply guess on each one. Each question is worth one point, no penalty for guessing. There are five choices
per question. What is your expected value for the number of points you earn? Hint: no formula necessary for this one.
43. Some standardized tests do penalized for guessing. Usually, for multiple choice tests with 5 choices per question, a correct answer
earns one point, while an incorrect answer earns -1/4 point (a quarter point deduction). If you guess on the same history test, how many
points would you expect now?
44. Explain why it makes sense for tests to be weighted as they are in the last question.
45. Suppose you’re taking the same history test but you’re not doing well in the class and you haven’t studied much. You don’t know any
answers for certain, but you know enough to eliminate exactly one choice from each question. Explain in words and with a calculation
why it makes sense for you to guess.
46. Remember the Excel simulation from a couple modules back—the one in which you simulated the number of rolls required to get the
same outcome back to back? What would you do to approximate the expected value in this situation?
47. To compute the expected value for the back-to-back experiment, let X = # of rolls required to get doubles. The range of X is [2, ).
E(X) = 1P(X = 1) + 2P(X = 2) + 3P(X = 3) + . The tough part is finding each of the infinite probabilities, which is why a simulation
can be so valuable. The first term is zero since it is impossible to roll the same number back to back in only one roll. P(X = 2) = 1/6,
since the first roll can be anything and there is a 1/6 chance that the second roll matches the first. P(X = 3) = 5/36, since the first roll
can be anything, the second cannot match the first, and the third must match the second. So, 1  5/6  1/6 = 5/36. Explain why
P(X = 4) = 1  5/6  5/6  1/6.
48. Find P(X = x) for x > 1.
49. What is the limit of P(X = x) as x approaches infinity?
50. The terms in the summation for the expected value of X are xP(X = x). Use a graphing calculator or L’Hopital’s rule to find the limit of
sequence of these terms (not the series).
51. The limit in the last problem justifies using a finite number of terms to approximate E(X). Make Excel estimate E(X) by entering a
formula for an arbitrary term in the summation, filling down to see the first 50 terms, and then summing those terms. Don’t forget that
the formula holds for n > 1.
52. Variance, and hence standard deviation, can also be defined in terms of expected values. The variance,  2, is defined as
 2 = E[(X - )2], which is [(x - )2 f(x)]. For example, let Y be the number of spots on a roll of a fair die. You’ve already shown  =
E(Y) = 3.5. Then  2 = E[(Y – 3.5)2], which is [(y – 3.5)2 f(y)], where y = 1, 2, …, 6. Since it’s a fair die, f(y) = 1/6 for all the y values.
So,  2 = [(1 – 3.5)2 + (2 – 3.5)2 + (3 – 3.5)2 + (4 – 3.5)2 + (5 – 3.5)2 + (6 – 3.5)2] (1/6) = 35/12. Notice that the quantities in
parentheses are deviations, each term is the square of a deviation, and the variance is the average of the squares of the deviations, just
like we did in Module 2! Note also that we divided by n rather than n – 1, which means this definition is equivalent to what is normally
used for a population rather than a sample of a population. Suppose you repeat the experiment with a die that isn’t fair: the likelihood
of a 2, 3, 4, or 5 is still 1/6, but the likelihood of a 6 is only 0.16. Find  and  2. Hint: you’ll have to find  first.
53. I’ve been using the terms “fair” quite a bit when it comes to dice. Games involving gambling, in general, can be fair or unfair. The
technical definition of a fair game is one in which the expected value of your winnings is zero. That is, if you play long enough, you
should expect to break even if the game is fair. Lotteries, casino games, etc. are not at all fair. If all the games in a casino were fair, the
casino would expect to take in as much money as it pays out. This is certainly not the case! Someone is paying for all the bright lights,
free drinks, multimillion dollar attractions, labor, etc. Plus the owners want to make a profit on top of all the expenses. This can only
happen by attracting people from miles around to come in and lose money, and on average, they wouldn’t lose money if the games
were mathematically fair. Being unfair does not imply cheating or deception, though. It means adjusting the payoffs to the odds of
winning so that the house makes money over time. I’ve never been to Vegas, but I believe that in roulette, if you bet $10 on red, you’ll
win $10 if red comes up and lose your $10 if it comes up black. This seems fair, but it’s not. To tilt the odds in its favor, the house has
a couple of numbers that are green. This means P(red) and P(black) are each just under 0.5, which makes all the difference in the
world when thousands upon thousands of people play. Say I offer you a bet on a game: You pay $1 to play, which consists of rolling a
fair die. I give you a $5 if you roll a 5. The die is fair, but is the game is not. Prove the game is not fair and determine the proper payout
in order to make the game fair.
54. If X is a discrete random variable and f(x) is its probability density function (p.d.f.), then for all xX, what is the range of f(x) and what
is  f(x), summed over all xX ?
In past modules you’ve used the formula Z = (X - )/ . This formula transforms a normal distribution, X, with mean  and standard
deviation  into a standard normal distribution, Z, with mean 0 and standard deviation 1. (Recall the soy burger patty mass example.) A
transformation like this allows us to use the table in the front and back covers of the book. Sometimes we use it to find probabilities
involving distributions that are approximately normal, like P(heads comes up  53 times out of 100 flips), which is really a binomial
distribution, as you have seen. Here are a few important review questions.
55. Assume the number of minutes you workout daily, X, is distributed normally. On average your workouts are 25 minutes long. The
variability in your workout duration can be described by a standard deviation. Let’s say its 4 minutes. Recall the 68-95-99.7 rule.
What does this rule mean in terms of how often your workouts are of certain durations?
56. Compute the Z statistic needed to determine the likelihood of a workout being under 24 minutes. This negative Z value tells you the
number of ______ ______ that 24 minutes is below the _____ .
57. P(X  24) = P(Z  ___).
58. What is the probability that a random workout is over a half hour?
59. How many minutes above and below the mean are required to ensure that the probability of a random workout being within those
times is 0.4?
Let’s finish off this module with the Central Limit Theorem. It’s a little confusing at first, so let’s approach it
through an example. The table shows the number of goals that all members of a soccer team scored during a season.
For example, four different girls each scored a total of six goals throughout the season. First some preliminary
questions:
60.
61.
62.
63.
64.
Goals
0
1
2
3
4
5
6
7
8
Frequency
2
0
7
6
9
13
4
0
1
How many girls are on the team?
What is the average number of goals scored,  ?
What is the probability of choosing a random player who has scored 5 goals?
Is the goal distribution normal, uniform, some other symmetric distribution, or none of these?
Let’s pick a player at random and note how many goals she scored. To do this I’ll have Excel generate a random
integer from 1 to 42, inclusive, which will correspond to a number of goals scored according to the table below.
Explain why this is an appropriate way to perform this experiment.
random int
1-2 3-9 10-15 16-24 25-37 38-41 42
goals
0
2
3
4
5
6
8
65. I set up a spreadsheet to simulate picking 10 random players (independently and with replacement). The computer then determines
how many goals each scored (based on the random int) and finds the average number of goals. For my first trial Excel gave me
random ints of {4, 42, 13, 40, 5, 16, 1, 38, 36, 28} which correspond to scoring goals of {2, 8, 3, 6, 2, 4, 0, 6, 5, 5} with the mean
number of goals for this trial being 4.1 (slightly more than the actual mean for team). My first trial, therefore, was a random
experiment with a random variable, X1, defined to be the mean number of goals for 10 girls chosen randomly. I then repeated this
process until I completed 155 trials. Once the spreadsheet was set up, I only had to press F9 to create new trial, but I had to compile a
list of means by typing. I now had a list of 155 averages X1 through X155, each an average number of goals scored by 10 randomly
selected players. Explain why all the Xi have the same distribution.
66. Let X-bar be the mean of all the Xi. Each of my 155 trials yielded a mean from 2.8 to 5.0. The table below shows how often each mean
came up. X-bar is not quite normal, but it’s a heck of lot closer to being normal than the original distribution of goals. The Central
Limit Theorem says that no matter what kind of distribution you begin with, independent, random sample averages will be distributed
approximately normally. Here I
Mean 2.8
2.9
3
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
took samples of size 10 from the
Freq
2
2
2
4
4
11
4
9
14
16
12
9
original distribution. If I had
Mean
4
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
5
taken samples of size 20, how
Freq
6
12
10
10
8
4
2
7
4
1
2
would my distribution for X-bar
differ? If I had done more than 155 trials, my distribution of means would have come out about the same. It’s the size of the sample
that matters. The only reason I did so many trials is because it takes many trials to get an idea of what the distribution looks like.
67. Find the approximate expected value (the mean) of X-bar using the table above and compare that to the mean, , you computed
earlier. They should be pretty close. (The mean of the averages from the table should be about 3.9.) The Central Limit Theorem says
that, if you take a large sample from any distribution, the mean of your sample will be very close to the mean of the original
distribution (the population). I took samples of 10 players. I could have taken samples of 100 players (ok since I’m sampling with
replacement) and the distribution of the averages of the 100-player groups would have been much more normal than it was with the
10-player groups. If I select individuals (1-player groups) the distribution should be about the same as the original, which might not be
normal at all. The larger the group, the more normal the distribution for group mean. The theorem says that this mean for the
distribution of the group means is the mean for the original distribution (even though the original distribution isn’t necessarily
normal). That is, X-bar should be close to . The theorem also states that variance of the distribution of the group means is  2/n,
where  2 is the variance of the original distribution and n is the size of the groups. Suppose test scores on that history test had a mean
of 72 points and a standard deviation of 8 points (a variance of 64 points). It doesn’t matter whether the distribution of scores is
normal or not. From the stack of test papers you draw at random one at a time with replacement until you’ve drawn 100 tests. You
then compute the average of these tests and repeat process, drawing sets of 100 tests over and over. Eventually you have many
averages, each coming from a set of 100 tests. All of these averages form a distribution of averages. Some averages will come up
much more often than others. According to the theorem, the distribution of your averages should be approximately ______ , it should
have a mean of about ____ , a variance of about ____ , and a standard deviation of about ____ .
68. Since the theorem states that the mean of your test scores is distributed approximately normally, we can compute the Z stat and use the
standard normal table to find probabilities. What is the probability that the mean of your 100 scores is less than 74?
69. Admittedly, the Central Limit Theorem is tough to get a handle on. The graph on the left shows the uniform p.d.f. for one die. It really
should be six dots all at 1/6, but for simplicity I connected them with a line. Clearly, it’s not normal. Next is the p.d.f. graph for the
average of two dice. Since the sum of two dice ranges from 2 to 12, the average ranges from 1 to 6. Since 7 is the most likely sum, 3.5
is the most likely average. Sums of 2 and 12 are rare, meaning averages of 1 and 6 are unlikely. For three dice the most likely sum are
10 and 11, meaning the most likely mean is 10.5  3 = 3.5. Notice how the graph begins to curve now and that extremes averages are
even less likely. Finally, for four dice, the most likely sum is 14, so the most likely mean 14  4 = 3.5. In the case extreme means are
very unlikely, and most of the time a mean will be between 3 and 4. The point is that the distribution of means becomes more normal
as n gets larger, even through we are sampling from a non-normal distribution. The mean of the distribution of means is close to (in
this case the same as) the original mean of 3.5. (Remember 3.5 was the expected value for the roll of a six-sided die.) Furthermore, as
n gets larger, there is less variation in the distribution of means (the standard deviation decreases). If you were to draw a p.d.f. graph
for 5 dice, how would it compare to the 4-dice case in terms of concavity, max height, mean, and flatness at the ends?
0.2
0.2
0.1
0.1
2 dice
0.2
0.1
3 dice
0.2
4 dice
0.1
1 die
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
70. Looking at the four distributions above, it is clear that the area beneath the curve decreases as n increases. The total probability,
though, must be one in each case. In the one-die case, the area is exactly one: 6  1/6. In the two-dice case, there are 12 possible
outcomes. The graph is really an approximation of a histogram with 11 rectangles, the height of each rectangle representing the
probability of a particular average of two dice. The sum of those heights is one. With the dice we have a discrete situation. If it were
continuous rather than discrete, we would require the total area under the curve to be equal to one, rather than the sum of heights of
the rectangles in the histogram. The dice situation is not continuous because we can’t, for example, get an average of 1.6 with two
dice. We can only gets averages of 1, 1.5, 2, 2.5, …, 6. In a continuous situation like the diameter of maple trees in Urbana, to find the
probability that a randomly chosen maple is between 17 and 25 inches in diameter, we would have to find the area under the p.d.f.
curve between 17 and 25, and the total area under the curve would be one. The graph of the three-dice case is an approximation to a
histogram with ____ rectangles. The sum of the ______ of those rectangles is not one, but the sum of their ______ is one. The six
lowest possible outcomes (averages) are 1, ___ , ___ , ___ , ___ , and ___ .
71. In a coin game you get 5 points for heads and 2 points for tails. For one flip (n = 1) the p.d.f. histogram would look like a rectangle
with height 0.5 at 2 on the x-axis and an identical rectangle at 5 on the x-axis. This bimodal distribution certainly doesn’t look normal.
For two flips (n = 2), it’s possible to get two heads, two tails, or one of each. So, if X-bar is the average number of points,
P(X-bar = 2) = 0.25, P(X-bar = 3.5) = 0.5, and P(X-bar = 5) = 0.25. Now the p.d.f. is beginning to appear more normal. In this case
it’s as if we randomly selected two amounts of points (with replacement) from a distribution in which 5 points and 2 points were
equally likely to be selected, and the most likely average of the two point amounts is 3.5, which is the mean of original distribution.
Show that  = 3.5 for the original (n = 1) distribution using the formula  = E(X), where X is the number of points.
72. Now calculate the variance,  2, of the original coin distribution using the formula  2 = E[(X - )2]. Hint: the answer is 2.25, but show
your work.
73. Calculate  and  2 for n = 2.
74. The Central Limit Theorem says that no matter what kind of distribution you start with, if you select from that distribution at random
and with replacement n times, then the average of your selections should have an approximately normal distribution with about the
same mean as the original and a variance that is about n times smaller than the original. How well is the theorem doing so far with n =
2 in the coin game?
75. Repeat the last two question for n = 3. Hint: you should see a lot symmetry in your calculations.