Lecture 3
•
•
•
•
•
•
•
•
Q1. On this chart what does
each dot represent?
A: A car
Q2. What would be the
interpretation of the number 0.0833?
A: For each additional 1000
miles the gas mileage drops by
0.083 miles per gallon.
Q3. Does the number 48.786
have any meaning here?
A: Yes, it tells us the gas mileage
for a brand new car (i.e. a car
with zero mileage).
Q4. On a scale 0% to 100% how
good is the line fit?
A: About 80-90%
•
•
•
•
•
•
•
•
•
•
•
•
Q1. Does the data exhibit Positive
or negative trend?
A: Negative.
Q2. Is this observed trend
potentially good news or bad news
regarding the fight against cancer?
A: Potentially good news (might
offer another way to influence
cancer).
Q3.What is the slope of the line?
A: -0.015
Q4. What is the interpretation of
this number?
A: For an additional ONE gram fat
consumed daily the age of cancer
detection decreases by 0.015 years.
Q5.The y-intercept is 44.854. What
is the interpretation of this
number? (One line explanation!)
A: For people with near zero fat
diet the expected age of breast
cancer detection is 44.854 years.
Q5. On a scale of 0% to 100% how
good is the line fit?
A: close to zero %
•
•
•
•
•
•
Reading the chart.
If we have a May with fewer
than 100 tornadoes, then it is
expected that the coming June
will have considerably fewer
tornadoes as well.
Do you see this on the chart?
Namely, for each dot (year) that
had X-axis below 100 the Y-axis
was always considerably lower
than the average of all Y-axis.
Similarly, the years (i.e. dots)
with high number of tornadoes
in May had higher than average
number of tornadoes in June.
Thus # of tornadoes in May
predicts the # of tornadoes in
June.
•
•
•
•
•
•
•
•
Q1. What does each dot on these charts present? (a tornado, a year, a city, a month)
A: A year
Q2. Which of these two charts offers better fit?
May – June chart
Q2. If we observe 200 tornadoes in April what would be your (approximate) prediction
for the number of tornadoes in June?
A: About 200
Q4. What is the interpretation (if any) of number 79.011 on May vs June chart?
A: If May happens to have zero tornados we still expect about 79 tornadoes in June
Regression analysis: is a statistical process for estimating the relationship among variables.
•
•
•
•
•
•
What is the role of this R-square?
We use it to judge the “goodness of fit”. May-June chart has higher R square
thus better fit.
“Goodness of fit” describes how well the trendline/linear regression line fits a set
of observation.
Linear regression attempts to model the relationship between two variables by
fitting a linear equation to observed data.
How do we compute this R-square?
A: I am glad you asked.
•Attendance quiz 1
• Q1: If a person tosses
three coins, list the
number of ways the three
coins can come up i.e list
the possible outcomes.
•
The actual mathematical formula for •
R is complicated and beyond scope •
of this book (judge for yourself)
•
•
•
Believe it or not, but quite a few
standard text books actually ask the
students to compute R using this (or
similar) formula.
•
R_square:
Is just the square of the number R which
stands for Correlation coefficient between
variables in X and Y axis.
This is a well known and all too often abused
quantity. This number R measures the
direction and strength of the linear
“connection”, “relationship” or “correlation”
between two quantities.
R( Pearson’s R) ranges from -1 to 1. The +/sign gives the direction of the relationship
and the # gives the strength.
For example a positive R indicates that as
one variable increases so is the other
variable. And a negative R indicates that as
one variable increases the other variable
decreases.
Although mathematics behind Correlation Coefficient (i.e. R) is very complicated we should
not have much problems with the intuitions.
•
•
R=0.95 Very strong, near perfect positive correlation
Intuition: Given X one can offer a very good prediction for
Y
R=0.95
40
30
20
10
0
0
•
•
R=0.75 Fairly good positive correlation
Intuition: Given X we can still offer a decent prediction for
Y, although not as good as in the case R=0.95
2
4
6
8
10
R=0.75
15
10
5
0
0
•
•
•
•
R=-0.5 Decent, not too strong, negative correlation
Intuition The negative part relates to the fact that we have
negative trend, while 0.5 part tells us that knowing X
coordinate does not help too much in predicting the Ycoordinate.
R=0.0032 No correlation (near zero correlation)
Intuition: Having information about the X-coordinate does
not help us at all predicting what the Y-coordinate would
be.
2
4
6
8
10
6
8
10
8
10
R=-0.5
6
4
2
0
0
2
4
R=0.0032
8
6
4
2
0
0
2
4
6
Breakdown of R^2
from 0 to 0.2
(poor)
from 0.2 to 0.4
(decent)
from 0.4 to 0.6
(good)
from 0.6 to 0.85
(very good)
from 0.85 to 1
(excellent)
Why R-square and not just R?
Good question. When dealing with line fitting, the slope of the line is telling us the direction
(negative slope means negative trend), and for this reason, someone, long time ago, decided
that we should square R and keep it as an information regarding the fit.
Problems?
Yes, by squaring the quantities we lose the intuition. For example, both R-squares in our
earlier example April/June and May/June were very bad (0.11 and 0.2 respectively).
However, these were R squared, so the actual R’s are 0.33 and 0.44 ( taking the square root
of 0.11 and 0.2 respectively) which could be characterized as decent.
Conclusion:
We can see that for May/June the R-Square is twice as big as it is for April/June. This is
expected since months May and June are next to each other and information from one helps
us predict the other (i.e. X is connected to Y).
Although mathematics behind Correlation Coefficient (i.e. R) is very complicated we should
not have much problems with the intuitions.
•
•
R=0.95, R^2 = 0.9025 Very strong, near perfect positive
correlation
Intuition: Given X one can offer a very good prediction for
Y. R^2 = 0.9025
R=0.95
40
30
20
10
0
0
•
•
R=0.75 , R^2 = 0.5625 Fairly good positive correlation
Intuition: Given X we can still offer a decent prediction for
Y, although not as good as in the case R=0.95
2
4
6
8
10
R=0.75
15
10
5
0
•
0
Breakdown of R^2
from 0 to 0.2
(poor)
from 0.2 to 0.4
(decent)
from 0.4 to 0.6
(good)
from 0.6 to 0.85
(very good)
from 0.85 to 1
(excellent)
2
4
6
8
10
Although mathematics behind Correlation Coefficient (i.e. R) is very complicated we should
not have much problems with the intuitions.
•
•
R=-0.5, R^2 = 0.25 Decent, not too strong, negative
correlation
Intuition The negative part relates to the fact that we have
negative trend, while 0.5 part tells us that knowing X
coordinate does not help too much in predicting the Ycoordinate.
R=-0.5
6
4
2
0
0
•
•
R=0.0032, R^2 = 0.00001024 No correlation (near zero
correlation)
Intuition: Having information about the X-coordinate does
not help us at all predicting what the Y-coordinate would
be.
Breakdown of R^2
from 0 to 0.2
(poor)
from 0.2 to 0.4
(decent)
from 0.4 to 0.6
(good)
from 0.6 to 0.85
(very good)
from 0.85 to 1
(excellent)
4
6
8
10
8
10
R=0.0032
8
6
4
2
0
0
•
2
2
4
6
• The R-square has to do with the scatter points.
In short, it captures how much randomness is
in the data. This is very commonly used
Statistical quantity and since it is related to
randomness, it makes sense to touch on
Probability Theory.
•
• Attendance quiz 2
• Q2: If a person tosses three coins, what is the probability
that a number of heads is equal to 2?
•
•
• The R-square has to do with the scatter points. In short, it
captures how much randomness is in the data. This is very
commonly used Statistical quantity and since it is related to
randomness, it makes sense to touch on Probability Theory.
•
•
Probability Theory
•
Q1: If a person tosses three coins, what is the probability that a number of heads is
equal to 2?
•
A1: First we must discuss the possible combinations of all outcomes, also called
sample space.
There are eight different ways for three coins to come up:
Sample space: {(H,H,H), (H,H,T),(H,T,H),(H,T,T), (T,H,H), (T,H,T),(T,T,H),(T,T,T)}, where
T stands for Tail and H for Head.
Now, list the outcomes of interest i.e the outcome with two heads!
There are three sure particular outcomes: {(H,H,T),(H,T,H),(T,H,H)} have two heads.
Thus the probability is 3-out-of-8 = 3/8.
•
•
•
•
•
•
•
• The R-square has to do with the scatter points. In short, it
captures how much randomness is in the data. This is very
commonly used Statistical quantity and since it is related to
randomness, it makes sense to touch on Probability Theory.
•
•
Probability Theory
•
Q2: If a person tosses three coins, what is the probability that there are more Tails
than Heads?
•
A2: We just learned that there are 8 possible outcomes and they are: {(H,H,H),
(H,H,T),(H,T,H),(H,T,T), (T,H,H), (T,H,T),(T,T,H),(T,T,T)}, where T stands for Tail and H
for Head.
•
•
•
•
•
Now list the outcomes of interest where there are more Tails than Heads.
And here they are: {(H,T,T), (T,H,T),(T,T,H),(T,T,T)}.
Clearly we have 4 such cases, thus the probability is
4-out-of-8 = 4/8 = 1/2.
•
•
•
Probability Theory
•
Q3: If a person rolls two dice, what is the probability that the sum of the numbers
is equal to 4?
A3: First we must discuss the space/possible combinations of all outcomes, also
called sample space. In this case, it consists of all the pairs of the following type:
•
•
•
•
•
•
•
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(1,6)
(2,1)
(2,2)
(2,3)
(2,4)
(2,5)
(2,6)
(3,1)
(3,2)
(3,3)
(3,4)
(3,5)
(3,6)
(4,1)
(4,2)
(4,3)
(4,4)
(4,5)
(4,6)
(5,1)
(5,2)
(5,3)
(5,4)
(5,5)
(5,6)
(6,1)
(6,2)
(6,3)
(6,4)
(6,5)
(6,6)
The sample space yields 36 possible outcomes.
Next we check for which outcomes we have that the sum of the numbers equal to
4.
They are:{(1,3),(2,2),(3,1)}.
Thus the answer is 3-out-of-36=3/36=1/12.
•
Probability Theory
• Q4: If a person rolls two dice, what is the probability that the
number on the first die is three times as large as the number on
the second die?
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(1,6)
(2,1)
(2,2)
(2,3)
(2,4)
(2,5)
(2,6)
(3,1)
(3,2)
(3,3)
(3,4)
(3,5)
(3,6)
(4,1)
(4,2)
(4,3)
(4,4)
(4,5)
(4,6)
(5,1)
(5,2)
(5,3)
(5,4)
(5,5)
(5,6)
(6,1)
(6,2)
(6,3)
(6,4)
(1,1)
(6,6)
• A4: We just need to list the cases of interest:
• {(3,1), (6,2)}, and observe that there are just two such cases.
• Thus the probability is 2/36=1/18.
•
Recall last lecture: Correlation, R
•
The actual mathematical formula for
R is complicated and beyond scope
of this book (judge for yourself)
•
•
•
•
•
R_square:
Is just the square of the number R which
stands for Correlation coefficient between
variables in X and Y axis.
This is a well known and all too often abused
quantity. This number R measures the
direction and strength of the linear
“connection”, “relationship” or “correlation”
between two quantities.
R( Pearson’s R) ranges from -1 to 1. The +/sign gives the direction of the relationship
and the # gives the strength.
For example a positive R indicates that as
one variable increases so is the other
variable. And a negative R indicates that as
one variable increases the other variable
decreases.
(course video)
• The left chart indicates a positive linear relationship. It shows that as the # of
hours a student spent on watching the course video the exam marks will increase.
• The right chart indicates a negative linear relationship. It shows that as the
unmanaged stress increases the student’s exam marks will decrease.
• If a researcher was investigating the relationship between several factors that may
influence students’ marks in an exam he/she may want to use a Correlation Matrix.
• A Correlation Matrix is a table showing the correlation coefficients of variables in a
Correlational study.
Correlation Matrix
Question 1: What is the correlation coefficient of completed readings and unmanaged stress? What
does it mean?
Answer: - 0.65. It means that the higher a student’s unmanaged stress the less the student will be
able to complete the course readings.
Question 2: Which coefficient gives close to the precise prediction?
Answer: -0.93
Question 3: Which correlation is small enough that present not much interest to the researcher?
Answer: 0.29
Question 4: Which correlation have the same strength?
Answer: +0.65 and – 0.65
Question 5: Looking at these R, what could a student do to get a high mark in exam?
Answer: The student should work on completing quizzes and view course videos.
Correlation Matrix
•
Task 1 Correlation Matrix. Click on Data Data AnalysisCorrelation tabs
respectively. Then click on Input Range space and highlight the desired data. (If
Data Analysis tab is missing you need to install it; see Lecture 1)
Correlation Matrix
Interpretation:
• The numbers represent the R’s.
• Here, at a glance, we can investigate all
the relations Month_X vs Month_Y.
• For example, the strongest correlation of
0.63 is between July and August.
• However, the June vs October
correlation is 0.53 which is rather
unexpected. These two months are
separated by 120 days and there are no
apparent reasons for such a strong
correlation. And it is exactly these
unexpected situations that are worth
investigating.
• Warning: Correlation matrix gives us R’s
not R-squares!!
•
•
•
•
•
•
•
•
Q1: On the above charts, what does each dot represent?
A: A year
Q2: What is the interpretation of 0.1079?
A: For each tornado in June we expect to observe about 0.1 tornado in October
(or maybe easier to understand: for TEN observed tornadoes in June we expect
to see ONE tornado in October)
Q3: What, if any, meaning could we attribute to 22.42?
A: If there are ZERO tornado in July we expect to see 22.42 tornadoes in August.
Q4. Which of the two charts offers a better fit? (and why?)
A: July-August, because its R-square is larger.
Breakdown of R^2
from 0 to 0.2
(poor)
from 0.2 to 0.4
(decent)
from 0.4 to 0.6
(good)
from 0.6 to 0.85
(very good)
from 0.85 to 1
(excellent)
Caution!
• Correlation does not indicate causation.
• Correlation only establishes the strength of
the existing relationship; it reflects the
amount of variability that is shared
between two variables.
• Only a well-designed experiment with
proper control groups can prove causation.
• (Not covered in this course)
Example 1
• One study in Victorian England showed a strong correlation between
people wearing top hats, and their life expectancy. This relationship was
shown to be very strong (high ‘r’).
• Does this mean that had Queen Victoria provided free top-hats for all, the
life expectancy in England would have shot up?
• There is a confirmed correlation. However, there is NO causation. That is,
wearing top hats does not cause people to live longer.
So, what’s going on here?
– Answer: There is a lurking variable! In this case, there is the lurking
variable is income. People with higher incomes could afford doctors
and medicines. These were in no way a given in Victorian England!
– So in this case, while there is correlation between top-hats and life
expectancy, there is no causation.
– However, there would be a causal relationship between Income and
life expectancy.
Another Caution!
• Large R-square does not guarantee that the X
and Y are indeed related. For this particular
issue, Statistics offers a different number. One
that will tell us how likely or what is the
probability, that X and Y are related. And
again, speaking of probability, let us practice a
bit more.
• Probability Theory
• Q1: If a person picks randomly a Month from the above
Tornado data, what is the probability that this will be a
summer month? Summer months: {June, July, August}
•
•
•
•
•
•
•
A1: There are 12 different months.
Three of them are summer months {June, July, August}.
Thus the probability is
3-out-of-12 = 3/12 = 1/4.
• Probability Theory
•
Q2: If a person picks randomly a Year from the above Tornado data, what is the
probability that this year will be prior to 1960?
•
•
•
•
A2: There are 45 years in Tornado data, from 1950 till 1994.
And, exactly ten of those years are prior 1960. That is the years {1950,1951,…,1959}.
Thus the probability is
10-out-of-45 = 10/45 = 2/9.
• Probability Theory
•
Q3. If a person picks a pair of Months, where the first one is always chronologically
before the second; how many different pairs are there?
•
A3. Just count the entrees in the Correlation table in Task 1 (and do not count the
diagonal, all “1” entrees). There are 66 of them.
• Probability Theory
•
Q4. Pick randomly any two (different) months. What is the probability that both
months are summer months?
•
A4. From the Table, we can see that one could end up with sample space = 66 of
two different months.
All we need now is to check which of these satisfy the criteria: summer months.
And here they are: {(June, July), (June, August),(July, August)}. There are three such
combinations, thus the probability is
3/66 = 1/22.
•
•
•
Task 1: The file is: US_CRime .txt. And for the Scatter plot, X-axis contains
Robberies in New York while Y-axis contains Robberies in California. Obtain the
trend line. Answer the questions:
•
•
•
•
•
•
•
•
•
Q1 What does each dot represent: a
year, a crime or a state?
A: A year
Q2 What is the slope of the trend
line and interpret the slope based on
the chart?
A: 0.6727
A: For each additional robbery
occurrence in NY an additional
0.6726 robberies are expected in CA.
Q3 What is the highest number of
robberies recorded for California?
(an
approximate
answer
is
acceptable)
A: About 130000
Q3 What is the approximate average
number of robberies for New York?
A: About 60000-80000,
Task 2: Open Tornadeos .txt. The two months (month X and month Y) for
which we have the strongest correlation are presented (in chronological
order). Answer the questions:
•
•
•
•
•
•
•
Q1: Each dot represents: A year, a
tornado or a month?
A: A year
Q2 What is the slope of the line and
its interpretation?
A: 0.401
A: For each additional sighting of
tornado in July there will be an
increase of 0.401 sightings of
tornado in August or rather for 10
additional sightings of tornadoes in
July there will be an increase of 4
tornadoes sightings in August.
Q3 The Y-intercept is 22.426. What is
the meaning of this number?
A: If a year has ZERO tornado in July
we expect to see 22.4 tornadoes in
August.
How did we choose July vs August?
Why not June vs August or May vs October?
•
The Correlation table gave us
simultaneous glimpse at all of the 66
connections!
•
Every pair month X vs month Y is
easily accessible
•
There are few promising leads like
June vs October and August vs
October.
•
The highest correlation was
observed for July vs August
Jan
Feb
Mar
Apr
May
June
July
Aug
Sept
Oct
Nov
Dec
Jan
Feb Mar Apr May June July Aug Sept Oct Nov Dec
1
0.01
0.23
0.31
0.29
0.25
0.13
0.01
0.39
0.05
0.17
0.33
1
0.21
-0.22
0.05
0.26
0.15
-0.03
-0.01
0.12
-0.06
0.08
1
0.27
0.38
0.28
0.03
-0.05
0.15
0.01
0.01
0.01
1
0.37
0.33
-0.02
0.29
0.11
0.46
-0.04
0.15
1
0.45
0.11
0.13
0.16
0.09
0.12
0.36
1
0.59
0.43
0.33
0.55
0.34
0.21
1
0.63
0.41
0.44
0.37
0.08
1
0.27
1
0.56 0.42
1
0.24 0.3 0.01
1
-0.13 0.27 -0.1 0.17
1
Task 3: The month that has the highest correlation with the total annual
tornadoes is presented. Answer the questions
•
•
•
•
•
Q1. This observed positive trend
means that if we observe more
tornadoes for this given month we
will also see more tornadoes for the
whole year. Yes or No?
A: YES
Q2. This observed positive trend
means that as years go by we see
more and more tornadoes. Yes or
No?
A: NO
Q3. What is the month that has this
strongest correlation with the
annual number of tornadoes?
A: June
Tornadoes in US
y = 14.395x + 414.56
R² = 0.5965
1400
Annual tornadoes
•
1200
1000
800
600
400
200
0
0
10
20
30
June tornadoes
40
50
How did we choose June as the month that best predict total amount?
Why not August or May or October?
•
•
•
•
•
It is again The Correlation table that
does the trick.
We only need to add another
column (i.e. total number of
tornados).
Warning: although Excel created all
the correlations we are only
interested in the first column since it
relates to our question: “Finding a
month that is most correlated with
the total number”.
Clearly the month is June and R=0.86
Task 4: Which two months (month X
and month Y) have the strongest
NEGATIVE correlation?
Total
Jan
Feb
Mar
Apr
May
June
July
Aug
Sept
Oct
Nov
Dec
Total
1
0.42
0.17
0.43
0.54
0.68
0.86
0.6
0.54
0.51
0.55
0.4
0.35
Jan
Feb Mar Apr May June July Aug Sept Oct
1
0.01
0.23
0.31
0.29
0.25
0.13
0.01
0.39
0.05
0.17
0.33
1
0.21
-0.22
0.05
0.26
0.15
-0.03
-0.01
0.12
-0.06
0.08
1
0.27
0.38
0.28
0.03
-0.05
0.15
0.01
0.01
0.01
1
0.37
0.33
-0.02
0.29
0.11
0.46
-0.04
0.15
1
0.45
0.11
0.13
0.16
0.09
0.12
0.36
1
0.59
0.43
0.33
0.55
0.34
0.21
1
0.63
0.41
0.44
0.37
0.08
Nov
1
0.27
1
0.56 0.42
1
0.24 0.3 0.01
1
-0.13 0.27 -0.1 0.17
Dec
1
Task 4: For Tornado data the two months (month X and month Y) with the
strongest NEGATIVE correlation are presented. Answer the questions:
•
•
•
•
•
•
•
•
Q1: R square is positive but we have
chosen the most negative correlation. How
come?
A: Correlation R=-0.22 but R square is
always positive. The sign of R is consistent
with the sign of the slope.
Q2: If month X has 50 tornadoes what is
the prediction for Y (give exact number
using the line equation)
A: 80.05 tornadoes in April.
From: -0.742(50)+117.15 = 80.05
Q3: Based on this analysis, would you say
that this prediction is very reliable,
somewhat reliable or not reliable?
A: not very reliable prediction.
Q4 Why?
A: R-square is small and data on chart looks
very random. R^2 = 0.0498 and R,
Correlation =-0.22 is rather weak so # of
tornadoes in Feb is not a good predictor of
# of tornadoes in April.
Tornadoes in US
y = -0.742x + 117.15
R² = 0.0498
300
April tornadoes
•
250
200
150
100
50
0
0
20
40
60
80
February tornadoes
R^2 breakdown
from 0 to 0.2
(poor)
from 0.2 to 0.4
(decent)
from 0.4 to 0.6 (good)
from 0.6 to 0.85 (very good)
from 0.85 to 1
(excellent)
100
Task 5: Compare the # of tornadoes of the months of Jan, Feb,
Mar, Apr, May and June. Which of these months is the most
correlated to Jan?
•
•
•
How would you go about performing this task?
One way is to produce a scatter plot of Jan vs each month with trend line and r square. This
way we will end up producing 5 charts before we can find which month is highly correlated
with Jan. Is there a better way?
Yes create a correlation matrix and find the month that is most correlated with Jan. Then
produce the chart, trend line and r square and investigate further.
Total
Jan
Feb
Mar
Apr
May
June
July
Aug
Sept
Oct
Nov
Dec
Total
1
0.42
0.17
0.43
0.54
0.68
0.86
0.6
0.54
0.51
0.55
0.4
0.35
Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec
1
0.01
0.23
0.31
0.29
0.25
0.13
0.01
0.39
0.05
0.17
0.33
1
0.21
-0.22
0.05
0.26
0.15
-0.03
-0.01
0.12
-0.06
0.08
1
0.27
0.38
0.28
0.03
-0.05
0.15
0.01
0.01
0.01
1
0.37
0.33
-0.02
0.29
0.11
0.46
-0.04
0.15
1
0.45
0.11
0.13
0.16
0.09
0.12
0.36
1
0.59
0.43
0.33
0.55
0.34
0.21
1
0.63
0.41
0.44
0.37
0.08
1
0.27
1
0.56 0.42
1
0.24 0.3 0.01
1
-0.13 0.27 -0.1 0.17
1
Jan
Feb
Mar
Apr
May
June
Jan
1
Feb
0.010717
Mar
0.225182 0.209338
Apr
0.308953 -0.22309 0.270381
May
0.289943 0.045726 0.378989 0.374468
June
0.251311 0.257427 0.277147 0.332702 0.45283
1
1
1
1
The month highly correlated with January
is April because of R = 0.308953 or 0.31!
1
More on Probability
Q: If one randomly picks two different
months from the Tornado data list. What
is the probability that they both are
winter months. The first month is always
chronologically before the second
(Dec.Jan.Feb)
A: 3/66=1/22 Just observe that on the Table
there are 66 different 2-month pairs and
only 3 of these pairs satisfy the
requirement (Jan-Dec, Feb-Dec, JanFeb)
Q: If one randomly picks two different
months from the Tornado data list. What
is the probability that their Correlation
will be negative. Hint: Check the
correlation table.
A: 9/66=3/22. Again, out of 66 (2-months
pairs) only 9 have negative correlation.
Total
Jan
Feb
Mar
Apr
May
June
July
Aug
Sept
Oct
Nov
Dec
Total
1
0.42
0.17
0.43
0.54
0.68
0.86
0.6
0.54
0.51
0.55
0.4
0.35
Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec
1
0.01
0.23
0.31
0.29
0.25
0.13
0.01
0.39
0.05
0.17
0.33
1
0.21
-0.22
0.05
0.26
0.15
-0.03
-0.01
0.12
-0.06
0.08
1
0.27
0.38
0.28
0.03
-0.05
0.15
0.01
0.01
0.01
1
0.37
0.33
-0.02
0.29
0.11
0.46
-0.04
0.15
1
0.45
0.11
0.13
0.16
0.09
0.12
0.36
1
0.59
0.43
0.33
0.55
0.34
0.21
1
0.63
0.41
0.44
0.37
0.08
1
0.27
1
0.56 0.42
1
0.24 0.3 0.01
1
-0.13 0.27 -0.1 0.17
1
More on Probability
Q: If a person rolls two dice, what is the
probability that the sum of the numbers
is equal to 11?
A. If one lists all possible 2-die outcomes
{1,1),(1,2),…..,(6,6)}
one
gets
36
combinations. However, only these two
pairs {(5,6),(6,5)} yield the desired sum
of 11. Thus the answer is 2/36 = 1/18
Q: If a person tosses three coins what is the
probability that he will not have any
Tails?
A. If one lists all possible 3-coins
outcomes:{(H,H,H),(H,H,T),(H,T,H),(T,H,H),
(T,H,T),(T,T,H),(H,T,T),(T,T,T)} one realizes
that there are 8 such outcomes and only
one-out-of-eight has no Tails. Thus the
answer is 1/8.
© Copyright 2026 Paperzz