Chapter 5: Joint Probability Distributions Outline – Jointly Distributed Random Variables Outline – Expected values, covariance and correlation Outline – Statistics and their distributions Outline – Distribution of the Sample Mean Outline – The Distribution of Linear Combinations Joint Distributions 1. So far we have studied probability models for a single discrete or continuous random variable. 2. In many practical cases it is appropriate to take more than one measurement of a random observation. For example: 1. Height and weight of a medical subject. 2. Grade on quiz 1, quiz 2, quiz 3 of a math student. 3. The case by case result of 12 independent measurements of air quality in 524 W 59 Street. 3. How are these variables related? That is: What is their joint probability distribution? 4. The air quality type of situation is very important and is the foundation of much of inferential statistics. Joint Probability Mass Function Joint Probability Density Function Joint Distributions Joint Distributions Example An insurance company sells both homeowners policies and auto policies. The deductibles on the homeowner’s policy is variable Y, and X for auto. y P(x,y) x Py 100 250 0 0.2 0.05 0.25 Px 100 0.1 0.15 0.25 200 0.2 0.3 0.5 0.5 0.5 The joint probability mass function p(x,y) gives the probability X=x and Y=y for each x,y pair. For example p(100,200) = .20. The marginal probability mass function of X is given, for each x by . For example Example (continued) • Note in this example: § px(100) = p(100,0)+p(100,100)+p(100,200) = .20+.10+.20 = .50 § Similarly px(250) = .50 § py(0) = .20+.05 = .25 § py(100) = .10+.15 = .25 § py(200) = .20 + .30 = .50 § The px and py functions are called the marginal probability mass functions of X and Y respectively. Fubini’s Theorem The next example is also a two-variable situation but has a continuous density function. To calculate probabilities we give a result for calculating double integrals. Fubini’s Theorem Example Joint Density Functions • Working with joint densities uses the same type of operations as working with joint probability mass functions except we replace summing of the joint probabilities with multiple dimension integration. • We will see that this is only reasonable with very simple problems unless the random variables are independent. Independent Random Variables Example 5.08 Suppose X1 and X2 represent the lifetimes of two components independent of one another. X1 is exponential with parameter λ1 and X2 is exponential with parameter λ2. Then the joint pdf is given by: and 0 otherwise. Suppose λ1=1/1000 and λ2=1/1200. Then the probability that both lifetimes are at least 1500 hours equals: P(1500 £ X 1 ,1500 £ X 2 ) = P(1500 £ X 1 ) P(1500 £ X 2 ) = e 1500 1500 1000 1200 e = (.2231)(.2865) = .0639 Using R this is (1-pexp(1500,1/1000))*(1-pexp(1500,1/1200)) Independent Random Variables • Note from the previous example that if X and Y are independent random variables then problems which are of the form X in some region and Y in some region can be solved by multiplying the X probability by the Y probability. This is much easier than working with general joint distributions. • If X and Y are independent continuous random variables with built-in R distributions, like the exponential random variables on the previous slide, then we can generally use the built in cdf’s (pexp in this case) to calculate probabilities without explicit integration at all. • If X and Y and not independent then it makes sense to talk about conditional distributions. Independent Random Variables – More than Two Conditional distributions Conditional Distribution – Discrete - Deductible Example An insurance company sells both homeowners policies and auto policies. The deductibles on the homeowner’s policy is variable Y, and X for auto. y P(x,y) x Py 100 250 0 0.2 0.05 0.25 Px 100 0.1 0.15 0.25 200 0.2 0.3 0.5 0.5 0.5 More on Independent Random Variables If X and Y are independent random variables then the condition distribution of Y given X does not depend upon X and the conditional distribution of X given Y does not depend upon Y. This is a direct consequence of the fact that when X and Y are independent: Independent Random Variables Characterization If X and Y are independent random variables then any probability of the form P(X ≤ a and Y≤ b) will equal the product P(X ≤ a)×P(Y≤ b). This is reasonably simple to understand in the discrete case and in the continuous case we can look at the integral: Independent Random Variables Example Suppose Xi, i = 1..5 is the amount of Nitrous Oxide emissions from a randomly (and independently) chosen Edsel engine (g/gal) and each Xi has a Weibull distribution with shape parameter a = 2 and scale parameter b = 10. What is the probability that the maximum of the 5 emissions is ≤ 12? Note: Suppose Y is the maximum. Then Y≤12 if and only if each Xi ≤12. By independence and identical distribution this is Note that the solution .258729 is just .76307225. Dependent Random Variables When random variable are not independent then we call them dependent. It two random variables are dependent then we might want to have an idea of how closely they are related. This will lead us to the concepts of covariance and correlation. Expected value Correlation Discrete Example An insurance company sells both homeowners policies and auto policies. The deductibles on the homeowner’s policy is variable Y, and X for auto. y P(x,y) x Py 100 250 0 0.2 0.05 0.25 Px 100 0.1 0.15 0.25 200 0.2 0.3 0.5 0.5 0.5 The joint probability mass function p(x,y) gives the probability X=x and Y=y for each x,y pair. For example p(100,200) = .20. The marginal probability mass function of X is given, for each x by . For example Example-2 Marginal distributions for x and y x 100 250 y 0 100 200 Px(x) .5 .5 Py(y) .25 .25 .5 Example 2-3 Covariance and Correlation Definition of a statistic Distribution of a Statistic - Example There are two traffic lights on a commuter's route to and from work. Let X1 be the number of lights at which the commuter must stop on his way to work, and X2 be the number of lights at which he must stop when returning from work. Suppose that these two variables are independent, each with the probability mass function given by: (so X1, X2 is a random sample of size n = 2). a) b) c) d) e) f) Determine the pmf of T0=X1+X2. Calculate the mean of X1. Calculate the mean of T0. Calculate the variance of X1. Calculate the variance of T0. How are the various means and variances related? Distribution of a Statistic – Example Distribution of a Statistic – Example Distribution of a Statistic – Example Solution Distribution of a Statistic - Sum • Notice that in the last example X1 and X2 were independent and T0 = X1+ X2 • Since X1 and X2 have identical distributions they have the same expected values and the same variances. • Our numerical results show that – E(X1+X2) = E(X1) + E(X2) and – V(X1+X2) = V(X1) + V(X2) • The first relationship holds whenever E(X1) and E(X2) exist. • The second relationship holds whenever X1 and X2 are independent. Random samples Simulation experiments Simulation Normal Distribution We use the rnorm() function to simulate a sample of size 20 from a Normal(65,3) distribution. We assume that John Jay female student heights are normally distributed with mean = 65” and sd = 3”. We then calculate the mean and standard deviation of the sample. We simulated the sample twice. Note the two means are different but ‘near’ 65 and the two standard deviations are different but ‘near 3.’ Simulation Normal Distribution - 2 1. 2. 3. 4. If we perform the previous simulation many times then how will the generated means and standard deviations behave? That is, what are their distributions? Will the means cluster around the population mean of 65? What if we look only at the means? Will the standard deviation of the means generated equal the original population standard deviation or will they be different? We examine these questions by writing an R program that performs the simulation many thousands of times and looks at the results. Steps in a simulation experiment Simulation Normal Distribution - 3 Simulation Normal Distribution - 4 The following code will simulate the selection of a random sample of size n = 20 from a Normal(65,3) distribution k=50000 times, determining the mean of the sample and storing the results in a vector. The code plots a histogram of the 50000 means and then calculates the mean of the means and the standard deviation of the means. Is this standard deviation close to 3? k <- 50000 n <- 20 mns <- numeric(k) for (i in 1:k) mns[i] <- mean(rnorm(n,65,3)) hist(mns) mean(mns) sd(mns) 3/sqrt(n) Simulation Normal Distribution - 5 We run the code next. 1. Note from the histogram that the distribution of the means looks bell shaped (normal) and is centered at 65, the true mean. 2. The simulated mean of the means is almost 65 the true mean. 3. The simulated standard deviation of means does not equal to the population standard deviation of 3, but is approximately the population sd divided by sqrt(20) where 20 is the sample size. Simulating a sample mean from a Weibull Simulating a sample mean from a Weibull (cont’d) Characteristics of the simulated values Density plot and histogram Normal probability plot of sample means Multiple sample sizes Histograms of means of different sizes of samples Densities of means of different sizes of samples Example 5.23 – Simulating from a skew distribution Simulation Histograms of means from a lognormal distribution Densities of means from a lognormal distribution Properties of sample mean and sample sum Sampling Distribution of Sample Mean and Sample Sum Thus, the sampling (i.e., probability) distribution of is centered precisely at the mean of the population from which the sample has been selected. Also the distribution becomes more concentrated about µ as the sample size n increases. In marked contrast, the distribution of To becomes more spread out as n increases. Averaging moves probability in toward the middle, whereas totaling spreads probability out over a wider and wider range of values. The standard deviation is often called the standard error of the mean; it describes the magnitude of a typical or representative deviation of the sample mean from the population mean. Example 24 In a notched tensile fatigue test on a titanium specimen, the expected number of cycles to first acoustic emission (used to indicate crack initiation) is µ = 28,000, and the standard deviation of the number of cycles is σ = 5000. Let X1, X2, . . . , X25 be a random sample of size 25, where each Xi is the number of cycles on a different randomly selected specimen. Then the expected value of the sample mean number of cycles until first emission is E( )=µ = 28,000, and the expected total number of cycles for the 25 specimens is E(To) = n σ = 25(28,000) = 700,000 Example 24 The standard deviation of of To are (standard error of the mean) and If the sample size increases to n = 100, E( ) is unchanged, but = 500, half of its previous value (the sample size must be quadrupled to halve the standard deviation of ). Central Limit Theorem Central Limit Theorem Example Uniform[-1,1] Distribution The following R code will run 50000 simulations of calculating the mean from uniform[-1,1] samples of sizes 1-6. The uniform[-1,1] distribution has an expected value of 0 and a variance of 1/3. Notice how the distribution of the means becomes more and more bell-shaped as the sample size increases. Central Limit Theorem Example Uniform[-1,1] Distribution Note that as the sample size increases, the sampling distribution of the means becomes more bell-like and more concentrated. In this case we didn’t need an n of 30 for ‘large sample.’ Central Limit Theorem Example Uniform[-1,1] Distribution Now let’s look at the means and standard deviations of the samples. First the code: Central Limit Theorem Example Uniform[-1,1] Distribution Here is the resulting data frame. Notice that the sample standard deviation is close to (sqrt(3)/3)/sqrt(n). Note the sqrt(3)/3 is the standard deviation of the original uniform distribution and that the sample mean ex is always near 0 – the true uniform mean. The Case of the Normal Distribution The Case of the Normal Distribution The Case of the Normal Distribution Figure 5.14 illustrates the proposition. A normal population distribution and sampling distributions Figure 5.14 Example 25 The time that it takes a randomly selected rat of a certain subspecies to find its way through a maze is a normally distributed rv with µ= 1.5 min and σ = .35 min. Suppose five rats are selected. Let X1, . . . , X5 denote their times in the maze. Assuming the Xi’s to be a random sample from this normal distribution, what is the probability that the total time To = X1 + . . . + X5 for the five is between 6 and 8 min? Example 25 Thus, To has a normal distribution with = nµ = 5(1.5) = 7.5 and variance = n𝛔 2 = 5(.1225) = .6125, so = √.6125=.783 Using R, P( 6 ≤ To ≤ 8) = We also could standardize To: subtract and divide by : Note Rounding Linear Combinations and their means Variances of linear combinations The difference between random variables The difference between random variables-Example • Let X1 and X2 represent the number of gallons of gasoline sold at JJ Gasoline on Monday and Tuesday respectively. Suppose X1 and X2 are independent normally distributed RV’s with respective means of 1500 and 1200 gallons and respective standard deviations of 100 and 80 gallons. Let TS = X1+X2 be the total sold and let TD = X1 – X2 be the difference sold between Monday and Tuesday. • By the previous results: – TS is normal with mean 1500+1200 = 2700 and standard deviation sqrt(1002+802)=sqrt(16400) = 128.0625. – TD is normal with mean 1500-1200 = 300 and standard deviation sqrt(1002+802)=sqrt(16400) = 128.0625. • Let’s simulate this in R to see if this makes sense. The difference between random variables-Example-p2 The difference between random variables-Example-p3 It is not the CLT demonstrated here as we are not dealing with different means. It is the fact that sums and differences of Normal random variables are normal. Example 29 A gas station sells three grades of gasoline: regular, extra, and super. These are priced at $3.00, $3.20, and $3.40 per gallon, respectively. Let X1, X2, and X3 denote the amounts of these grades purchased (gallons) on a particular day. Suppose the Xi’s are independent with E(X1) = 1000, E(X2) = 500, E(X3)= 300, 𝛔1 = 100, 𝛔 2 = 80, and 𝛔3 = 50. Example 29 The revenue from sales is Y = 3.0X1 + 3.2X2 + 3.4X3, and E(Y) = 3.0E(X1) + 3.2E(X2) + 3.4E(X3) = 3*1000+3.2*500+3.4*300 = $5620 = 3.02*1002 + 3.22*802 + 3.42*502 The Case of Normal Random Variables Example 31 The total revenue from the sale of the three grades of gasoline on a particular day was Y = 3.0X1 + 3.2X2 + 3.4X3, and we calculated E(Y)= 5620 and (assuming independence) SD(Y) = 429.46. If the Xis are normally distributed, the probability that revenue exceeds 4500 is Or without normalizing: Note rounding again Example 31-Continued-Simulate using R Example 31-Continued-Simulate using R-Histogram R code – picking elements from a vector
© Copyright 2026 Paperzz