Chapter 5: Joint Probability Distributions

Chapter 5: Joint Probability
Distributions
Outline – Jointly Distributed
Random Variables
Outline – Expected values,
covariance and correlation
Outline – Statistics and their
distributions
Outline – Distribution of the
Sample Mean
Outline – The Distribution of
Linear Combinations
Joint Distributions
1. So far we have studied probability models for a single discrete or
continuous random variable.
2. In many practical cases it is appropriate to take more than one
measurement of a random observation. For example:
1. Height and weight of a medical subject.
2. Grade on quiz 1, quiz 2, quiz 3 of a math student.
3. The case by case result of 12 independent measurements of air
quality in 524 W 59 Street.
3. How are these variables related? That is: What is their joint
probability distribution?
4. The air quality type of situation is very important and is the
foundation of much of inferential statistics.
Joint Probability Mass Function
Joint Probability Density Function
Joint Distributions
Joint Distributions
Example
An insurance company sells both homeowners policies and
auto policies. The deductibles on the homeowner’s policy is
variable Y, and X for auto.
y
P(x,y)
x
Py
100
250
0
0.2
0.05
0.25
Px
100
0.1
0.15
0.25
200
0.2
0.3
0.5
0.5
0.5
The joint probability mass function p(x,y) gives the
probability X=x and Y=y for each x,y pair. For example
p(100,200) = .20.
The marginal probability mass function of X is given, for
each x by
. For example
Example (continued)
•
Note in this example:
§ px(100) = p(100,0)+p(100,100)+p(100,200)
= .20+.10+.20 = .50
§ Similarly px(250) = .50
§ py(0) = .20+.05 = .25
§ py(100) = .10+.15 = .25
§ py(200) = .20 + .30 = .50
§ The px and py functions are called the marginal
probability mass functions of X and Y respectively.
Fubini’s Theorem
The next example is also a two-variable situation but has a continuous
density function. To calculate probabilities we give a result for
calculating double integrals.
Fubini’s Theorem Example
Joint Density Functions
• Working with joint densities uses the same
type of operations as working with joint
probability mass functions except we
replace summing of the joint probabilities
with multiple dimension integration.
• We will see that this is only reasonable with
very simple problems unless the random
variables are independent.
Independent Random Variables
Example 5.08
Suppose X1 and X2 represent the lifetimes of two components
independent of one another. X1 is exponential with parameter λ1
and X2 is exponential with parameter λ2. Then the joint pdf is
given by:
and 0 otherwise.
Suppose λ1=1/1000 and λ2=1/1200. Then the probability that both
lifetimes are at least 1500 hours equals:
P(1500 £ X 1 ,1500 £ X 2 ) = P(1500 £ X 1 ) P(1500 £ X 2 ) = e
1500 1500
1000 1200
e
= (.2231)(.2865) = .0639
Using R this is (1-pexp(1500,1/1000))*(1-pexp(1500,1/1200))
Independent Random Variables
• Note from the previous example that if X and Y are
independent random variables then problems which are of
the form X in some region and Y in some region can be
solved by multiplying the X probability by the Y
probability. This is much easier than working with general
joint distributions.
• If X and Y are independent continuous random variables
with built-in R distributions, like the exponential random
variables on the previous slide, then we can generally use
the built in cdf’s (pexp in this case) to calculate
probabilities without explicit integration at all.
• If X and Y and not independent then it makes sense to talk
about conditional distributions.
Independent Random Variables – More than Two
Conditional distributions
Conditional Distribution – Discrete - Deductible Example
An insurance company sells both homeowners policies and
auto policies. The deductibles on the homeowner’s policy is
variable Y, and X for auto.
y
P(x,y)
x
Py
100
250
0
0.2
0.05
0.25
Px
100
0.1
0.15
0.25
200
0.2
0.3
0.5
0.5
0.5
More on Independent Random Variables
If X and Y are independent random variables then the
condition distribution of Y given X does not depend upon X
and the conditional distribution of X given Y does not depend
upon Y.
This is a direct consequence of the fact that when X and Y
are independent:
Independent Random Variables Characterization
If X and Y are independent random variables then any
probability of the form P(X ≤ a and Y≤ b) will equal the
product P(X ≤ a)×P(Y≤ b). This is reasonably simple to
understand in the discrete case and in the continuous case we
can look at the integral:
Independent Random Variables Example
Suppose Xi, i = 1..5 is the amount of Nitrous Oxide emissions
from a randomly (and independently) chosen Edsel engine (g/gal)
and each Xi has a Weibull distribution with shape parameter a =
2 and scale parameter b = 10. What is the probability that the
maximum of the 5 emissions is ≤ 12?
Note: Suppose Y is the maximum. Then Y≤12 if and only if each
Xi ≤12. By independence and identical distribution this is
Note that the solution .258729 is just .76307225.
Dependent Random Variables
When random variable are not independent then we call them
dependent. It two random variables are dependent then we
might want to have an idea of how closely they are related.
This will lead us to the concepts of covariance and correlation.
Expected value
Correlation
Discrete Example
An insurance company sells both homeowners policies and
auto policies. The deductibles on the homeowner’s policy is
variable Y, and X for auto.
y
P(x,y)
x
Py
100
250
0
0.2
0.05
0.25
Px
100
0.1
0.15
0.25
200
0.2
0.3
0.5
0.5
0.5
The joint probability mass function p(x,y) gives the
probability X=x and Y=y for each x,y pair. For example
p(100,200) = .20.
The marginal probability mass function of X is given, for
each x by
. For example
Example-2
Marginal distributions for x and y
x
100
250
y
0
100
200
Px(x)
.5
.5
Py(y)
.25
.25
.5
Example 2-3
Covariance and Correlation
Definition of a statistic
Distribution of a Statistic - Example
There are two traffic lights on a commuter's route to and from
work. Let X1 be the number of lights at which the commuter must
stop on his way to work, and X2 be the number of lights at which he
must stop when returning from work. Suppose that these two
variables are independent, each with the probability mass function
given by:
(so X1, X2 is a random sample of size n = 2).
a)
b)
c)
d)
e)
f)
Determine the pmf of T0=X1+X2.
Calculate the mean of X1.
Calculate the mean of T0.
Calculate the variance of X1.
Calculate the variance of T0.
How are the various means and variances related?
Distribution of a Statistic – Example
Distribution of a Statistic – Example
Distribution of a Statistic – Example Solution
Distribution of a Statistic - Sum
• Notice that in the last example X1 and X2 were
independent and T0 = X1+ X2
• Since X1 and X2 have identical distributions they have
the same expected values and the same variances.
• Our numerical results show that
– E(X1+X2) = E(X1) + E(X2) and
– V(X1+X2) = V(X1) + V(X2)
• The first relationship holds whenever E(X1) and E(X2)
exist.
• The second relationship holds whenever X1 and X2 are
independent.
Random samples
Simulation experiments
Simulation Normal Distribution
We use the rnorm() function to simulate a sample of size 20 from a Normal(65,3)
distribution. We assume that John Jay female student heights are normally
distributed with mean = 65” and sd = 3”. We then calculate the mean and standard
deviation of the sample.
We simulated the sample twice. Note the two means are different but ‘near’ 65
and the two standard deviations are different but ‘near 3.’
Simulation Normal Distribution - 2
1.
2.
3.
4.
If we perform the previous simulation many times then how will the generated
means and standard deviations behave? That is, what are their distributions?
Will the means cluster around the population mean of 65?
What if we look only at the means? Will the standard deviation of the means
generated equal the original population standard deviation or will they be
different?
We examine these questions by writing an R program that performs the
simulation many thousands of times and looks at the results.
Steps in a simulation experiment
Simulation Normal Distribution - 3
Simulation Normal Distribution - 4
The following code will simulate the selection of a random sample
of size n = 20 from a Normal(65,3) distribution k=50000 times,
determining the mean of the sample and storing the results in a
vector. The code plots a histogram of the 50000 means and then
calculates the mean of the means and the standard deviation of the
means. Is this standard deviation close to 3?
k <- 50000
n <- 20
mns <- numeric(k)
for (i in 1:k) mns[i] <- mean(rnorm(n,65,3))
hist(mns)
mean(mns)
sd(mns)
3/sqrt(n)
Simulation Normal Distribution - 5
We run the code next.
1. Note from the histogram that the distribution of the means
looks bell shaped (normal) and is centered at 65, the true mean.
2. The simulated mean of the means is almost 65 the true mean.
3. The simulated standard deviation of means does not equal to
the population standard deviation of 3, but is approximately the
population sd divided by sqrt(20) where 20 is the sample size.
Simulating a sample mean from a
Weibull
Simulating a sample mean from a
Weibull (cont’d)
Characteristics of the simulated
values
Density plot and histogram
Normal probability plot of
sample means
Multiple sample sizes
Histograms of means of different
sizes of samples
Densities of means of different
sizes of samples
Example 5.23 – Simulating from
a skew distribution
Simulation
Histograms of means from a lognormal distribution
Densities of means from a lognormal distribution
Properties of sample mean and
sample sum
Sampling Distribution of Sample Mean and Sample Sum
Thus, the sampling (i.e., probability) distribution of is centered
precisely at the mean of the population from which the sample has
been selected. Also the distribution becomes more concentrated
about µ as the sample size n increases.
In marked contrast, the distribution of To becomes more spread out
as n increases.
Averaging moves probability in toward the middle, whereas totaling
spreads probability out over a wider and wider range of values.
The standard deviation
is often called the
standard error of the mean; it describes the magnitude of a
typical or representative deviation of the sample mean from
the population mean.
Example 24
In a notched tensile fatigue test on a titanium specimen, the
expected number of cycles to first acoustic emission (used to
indicate crack initiation) is µ = 28,000, and the standard deviation
of the number of cycles is σ = 5000.
Let X1, X2, . . . , X25 be a random sample of size 25, where each Xi
is the number of cycles on a different randomly selected
specimen.
Then the expected value of the sample mean number of cycles
until first emission is E( )=µ = 28,000, and the expected total
number of cycles for the 25 specimens is
E(To) = n σ = 25(28,000) = 700,000
Example 24
The standard deviation of
of To are
(standard error of the mean) and
If the sample size increases to n = 100, E( ) is unchanged,
but = 500, half of its previous value (the sample size must
be quadrupled to halve the standard deviation of ).
Central Limit Theorem
Central Limit Theorem Example Uniform[-1,1] Distribution
The following R code will run 50000 simulations of calculating the mean from
uniform[-1,1] samples of sizes 1-6. The uniform[-1,1] distribution has an expected
value of 0 and a variance of 1/3. Notice how the distribution of the means becomes
more and more bell-shaped as the sample size increases.
Central Limit Theorem Example Uniform[-1,1] Distribution
Note that as the sample size increases, the sampling distribution
of the means becomes more bell-like and more concentrated. In
this case we didn’t need an n of 30 for ‘large sample.’
Central Limit Theorem Example Uniform[-1,1] Distribution
Now let’s look at the means and standard deviations of the samples.
First the code:
Central Limit Theorem Example Uniform[-1,1] Distribution
Here is the resulting data frame. Notice that the sample standard
deviation is close to (sqrt(3)/3)/sqrt(n). Note the sqrt(3)/3 is the
standard deviation of the original uniform distribution and that the
sample mean ex is always near 0 – the true uniform mean.
The Case of the Normal Distribution
The Case of the Normal Distribution
The Case of the Normal Distribution
Figure 5.14 illustrates the proposition.
A normal population distribution and sampling distributions
Figure 5.14
Example 25
The time that it takes a randomly selected rat of a certain subspecies
to find its way through a maze is a normally distributed rv with µ=
1.5 min and σ = .35 min. Suppose five rats are selected.
Let X1, . . . , X5 denote their times in the maze. Assuming the Xi’s to
be a random sample from this normal distribution, what is the
probability that the total time To = X1 + . . . + X5 for the five is
between 6 and 8 min?
Example 25
Thus, To has a normal distribution with
= nµ = 5(1.5) = 7.5
and
variance
= n𝛔 2 = 5(.1225) = .6125, so
= √.6125=.783
Using R, P( 6 ≤ To ≤ 8) =
We also could standardize To: subtract
and divide by
:
Note Rounding
Linear Combinations and their
means
Variances of linear combinations
The difference between random
variables
The difference between random variables-Example
• Let X1 and X2 represent the number of gallons of
gasoline sold at JJ Gasoline on Monday and Tuesday
respectively. Suppose X1 and X2 are independent
normally distributed RV’s with respective means of 1500
and 1200 gallons and respective standard deviations of
100 and 80 gallons. Let TS = X1+X2 be the total sold and
let TD = X1 – X2 be the difference sold between Monday
and Tuesday.
• By the previous results:
– TS is normal with mean 1500+1200 = 2700 and standard
deviation sqrt(1002+802)=sqrt(16400) = 128.0625.
– TD is normal with mean 1500-1200 = 300 and standard
deviation sqrt(1002+802)=sqrt(16400) = 128.0625.
• Let’s simulate this in R to see if this makes sense.
The difference between random variables-Example-p2
The difference between random variables-Example-p3
It is not the CLT demonstrated here as we are not dealing with different
means. It is the fact that sums and differences of Normal random
variables are normal.
Example 29
A gas station sells three grades of gasoline: regular, extra, and super.
These are priced at $3.00, $3.20, and $3.40 per gallon, respectively.
Let X1, X2, and X3 denote the amounts of these grades purchased
(gallons) on a particular day.
Suppose the Xi’s are independent with E(X1) = 1000, E(X2) = 500,
E(X3)= 300, 𝛔1 = 100, 𝛔 2 = 80, and 𝛔3 = 50.
Example 29
The revenue from sales is Y = 3.0X1 + 3.2X2 + 3.4X3, and
E(Y) = 3.0E(X1) + 3.2E(X2) + 3.4E(X3) = 3*1000+3.2*500+3.4*300
= $5620
= 3.02*1002 + 3.22*802 + 3.42*502
The Case of Normal Random
Variables
Example 31
The total revenue from the sale of the three grades of gasoline on a
particular day was Y = 3.0X1 + 3.2X2 + 3.4X3, and we calculated E(Y)=
5620 and (assuming independence) SD(Y) = 429.46. If the Xis are
normally distributed, the probability that revenue exceeds 4500 is
Or without normalizing:
Note rounding again
Example 31-Continued-Simulate using R
Example 31-Continued-Simulate using R-Histogram
R code – picking elements from a vector

Download Report

Chapter 5: Joint Probability Distributions

Paperzz.com

Your Paperzz