Section 1.1

PROPORTIONS AND
PROBABILITIES
Proportions and probabilities
15 out of 81 students surveyed were smokers, so the
proportion of those who smoke is 15/81 = 0.185.
Suppose each student’s name of placed on a card. I
randomly select a card, note if the person is a smoker, and
then return the card and shuffle the deck. After repeating
this a sufficiently large number of times, the long-run
proportion of smokers is what we call the probability that a
randomly selected student will be a smoker.
Connection: The probability of randomly selecting an
individual with a certain characteristic is equal to the
proportion of the population that has the characteristic.
Contingency Table
A table that summarizes data for two categorical
variables is called a contingency table.
Row and column proportions
Row proportions are computed using row totals, and
column proportions using column totals.
None
Small
Big
Total
149/367 = 0.406
168/367 = 0.458
50/367 = 0.136
1.000
Not spam
400/3554 = 0.113
2657/3554 = 0.748
495/3554 = 0.139
1.000
Total
549/3921 = 0.140
2827/3921 = 0.721
545/3921 = 0.139
1.000
Spam
Row and column proportions
Row proportions are computed using row totals, and
column proportions using column totals.
None
Small
Big
Total
149/367 = 0.406
168/367 = 0.458
50/367 = 0.136
1.000
Not spam
400/3554 = 0.113
2657/3554 = 0.748
495/3554 = 0.139
1.000
Total
549/3921 = 0.140
2827/3921 = 0.721
545/3921 = 0.139
1.000
Spam
Conditional probabilities
• P(None | Spam) = 0.406
• P(None | Not Spam) = 0.113
Marginal probability
• P(None) = 0.140
Row and column proportions
Row proportions are computed using row totals, and
column proportions using column totals.
None
Small
Big
Total
Spam
149/549 = 0.271
168/2827 = 0.059
50/545 = 0.092
367/3921 = 0.094
Not spam
400/549 = 0.729
2657/2827 = 0.941
495/545 = 0.908
3684/3921 = 0.906
1.000
1.000
1.000
1.000
Total
Row and column proportions
Row proportions are computed using row totals, and
column proportions using column totals.
None
Small
Big
Total
Spam
149/549 = 0.271
168/2827 = 0.059
50/545 = 0.092
367/3921 = 0.094
Not spam
400/549 = 0.729
2657/2827 = 0.941
495/545 = 0.908
3684/3921 = 0.906
1.000
1.000
1.000
1.000
Total
Conditional probabilities
• P(Spam | None) = 0.271
• P(Spam | Small) = 0.059
• P(Spam | Big) = 0.092
Marginal probability
• P(Spam) = 0.094
Area of histograms
• 10% of the data (2 out of 20) is in the 240 to 280 range.
Area of histograms
• 10% of the data (2 out of 20) is in the 240 to 280 range.
• 10% of the total area of the histogram is in the 240 to 280 range.