No blanks

Prediction Intervals for random variable X when  is unknown and  2 is known 

We know how to give a probabilistic interval for X when X  N  ,  2 and both  and  2 are known. For example, if X = weight of a chocolate bar in ounces, and we know that X  N  8.1,.25  , so   .5 , then we could say that the probability is .95 that X falls in the interval 8.1  1.96(.5) , or , in general,   z 2 . Source of uncertainty about the value of X when  known: the fact that X is a random variable. We just learned how to use data to find a confidence interval when we don’t know  , so that we must estimate  using X n , and when  2 is known, and 1.
X i , i  1,..., n represent a random sample from the population of interest (iid, remember?) 2. Either 

2.1. X i  N  ,  2 , n any size, or 

2.2. X i  Other  ,  2 , n large enough to apply CLT 3. Confidence level is C %  100 1    % , where 0    1 . Suppose we took a sample of 10 chocolate bars and got a sample mean of X n  8.2 . A confidence interval for  is 8.2  1.96(.5) , or, in general, x  z
X
2
n
. Source of uncertainty about the value of  : the fact that we estimated it using X n . 1 But suppose we still want a prediction interval for X , the random variable, not just a confidence interval for  , the population mean. Sources of uncertainty about the value of X when  unknown: 1. The fact that X is a random variable. 2. The fact that we must estimate the mean  using X n . To find a prediction interval for X when  unknown, notice that 

 
Var X  X n  Var ( X )  Var X n   2 
2
 1
  2 1   . n
 n
Wait! That formula assumes the two RV’s being added are independent! Are they? Yes. Remember that the X 1 ,  X n represent a random sample from the population of interest (independent and identically distributed). Think of the plain‐old X as X n 1 . It’s independent of the other X i and identically distributed. So for a symmetric prediction interval for X , with a confidence level of C, we use: x  z 2 X 1 
1
. n
We still say: We are C% confident that X will fall between x  z 2 X 1 
1
1
and x  z 2 X 1  . n
n
Why? 


Probabilistic statements about X do make sense, because X is a random variable. But we can’t make a probabilistic statement about a random variable unless we know its distribution. o The name or pdf/pmf of the distribution, in this case, Normal. o The value of  . o The value of  2 . We don’t know all three of these things for X , so we can only make a confidence statement. 2 Finishing the chocolate bar example: To calculate a 95% prediction interval for X , we calculate: 8.2  1.96(.5) 1 
1
  7.17,9.23 . 10
We say: We are 95% confident that the weight of a randomly selected candy bar will fall between 7.17 and 9.23 ounces. If we want more precision, 


we can reduce the width of this interval somewhat by increasing the sample size, but increasing the sample size isn’t as effective in reducing the width of a prediction interval as it is for a confidence interval, because we still have the variability of the random variable itself, which we can’t divide by n . 3