5.1 Lecture 16 Expected Values Related to n

5.1
Lecture 16 Expected Values Related to n-Dimensional Random Variables
Recall the properties below.
Property (P1) E ( aX  bY  c )  aE ( X )  bE (Y )  c
Property (P2) Var (aX  bY  c)  a 2Var ( X )  b 2Var (Y )  2abCov( X , Y )
Property (P3) Cov(aX  bY  c,W )  aCov( X ,W )  bCov(Y ,W )
These properties can be applied to n-D random variables
Example 1 Let { X k }nk 1 be a collection of iid data collection variables for X having mean  X and standard deviation  X .
n


(a)The most common estimator of  X is  X  1  X k  X . Use (P1) to show that E( X )   X (i.e. to show that
n

k 1
 X is an unbiased estimator for  X ).
Solution: Define the random variable W   X k . Then from (P1) we have E  1 W    1  E (W ) . From this same
k 1
 n    n 
n
n
n

property, we also have E (W )  E   X k    E ( X k )    X  n  X . Hence, E( X )   X .

 k 1
n

k 1
k 1



1
1
1
Note that we could have also written E (  )  E  1
 X k   E    X k      E ( X k )     X   X .
X
n
n
k 1
n

n
 k 1  n 

k 1
n
n
k 1
n
(b) Use (P2) to show that  2 X   X2 / n .
Solution: Since { X k }nk 1 are iid then, Cov( X j , X k )  0 for j  k , and so (P2) is: Var (aX  bY )  a 2Var( X )  b 2Var(Y ) .


2
1 n
 1
 n
 1
X

Var

 Xk    



k
 n k 1
 n
 k 1
 n
 2  Var (  X )  Var
X
2
 1
 
Var ( X k )  

n
k 1
n
2
n

k 1
2
X
  X2 / n .
(c) Suppose that  X  10 . Find the smallest sample size, n, that would result in   X  0.1 .
Solution:  2 X   X2 / n  n   X2 /  2 X  102 / 0.12  104 . □
Remark 1 The relation  2 X   X2 / n is, arguably, the most important relation in statistics, because it allows one to

determine the sample size needed to achieve a specified level of uncertainty,   X , for the estimator  X . □
5.2
Example 2.4 Let X denote the act of recording whether or not a randomly selected part from a production line meets

[ X  0] or does not meet [ X  1] specifications. Your job is to arrive at an estimate of Pr[ X  1]  p .
(a) Specify the data collection variables, and how you will use them to estimate p.
Solution: Let { X k }nk 1 are iid Ber ( p ) , so that  X k  p and  X k 
n
p(1  p) . Let p   X  1  X k . Then
n
k 1
n
 p  p and  p2  p(1  p) / n . Also, since np   X k = Y ~ bino (n, p ) , it follows that p  Y / n is just a scaled
k 1
binomial random variable.
(b) Find the ‘worst case’ value of n such that  p  0.02 .
Solution: Since  X2  p  p 2 , setting d X2 / dp  0  1  2 p gives p  0.5 . Hence, the biggest that  X2 can be is
 X2  p  p 2  0.5  0.52  0.25 , so that the biggest that  X can be is  Xmax  0.5 .

Using the ‘worst case’  Xmax  0.5 gives n   Xmax /  p
  0.5 / 0.02
2
2
 625 .
(c) Suppose that we know for a fact that p could not possibly be greater than 0.2 (because if it were, we would be bankrupt
by now!). Now the ‘worst case’ value for  X becomes  Xmax 

n   Xmax /   X
  0.4 / 0.02
2
2
pmax (1  pmax )  0.2(0.8)  0.4 , so that
 400 . Remark: With that little bit of extra thought in (c), we were able to reduce the
needed sample size from 625 to 400. Suppose that the cost of inspecting each sample is $50. Then we have saved the
company $11,250 
□
Example 3 Wind turbine energy systems use feedback control in an attempt to maintain a constant turbine speed. Any
information that can be used in relation to the wind properties can only enhance the performance of this control system.

Here, we will estimate the mean wind speed based on n samples of wind speed. Let X  ( X 1 , X 2 ,  , X n ) denote the nD data collection action associated with measurements spaced T seconds apart. The sampling interval, T, will be chosen to


be large enough that the components of X can be assumed to be uncorrelated. We will assume that X has a normal
distribution, and that the marginal pdf of each component is also normally distributed. Finally, we will assume that they
have one and the same pdf; that is, they each have the same mean and standard deviation. Then { X k }nk 1 is a collection of

iid N ( X ,  X ) random variables. Our estimator of the unknown mean,  X , will be  X  X , which has a normal pdf
with mean   X   X and standard deviation   X   X / n .


Suppose that for n = 25, we obtained a sample mean  X  19.0 mph, and a sample standard deviations  X  2.5 mph.

(a) Give your best ‘guess’ of the pdf of  X , including numerical values for all the pdf parameters.


Solution: We know that  X ~ Normal(  X ,  X / n ) . Hence, we will assume that  X ~ Normal(19.2, 2.5 / 25 ) , or

 X ~ Normal(19.0, 0.5) .

[NOTE: Here, we have made the assumption that  X   X . Clearly, this is not the case. To assume that a random
variable is equal to a non-random parameter is to totally ignore the essence of this course. Even so, one could argue:
5.3
What’s the alternative? It’s better than nothing, right? And I would agree- however, ONLY if one clearly acknowledges
the ‘stretch’ of such an assumption. And, by acknowledging it, one is essentially asking for a better alternative. That
alternative will be covered in Chapter 6 notes. ]
(b) Suppose that the turbine has a high-speed cut-off clutch that disengages the turbine shaft from the transmission at
averaged wind speeds above 20 mph. Use your result in (a) to estimate the probability that the clutch will disengage.

Solution: Pr[  X  20]  1  F X (20) = 1-normcdf(20 , 19 , 0.5) = 0.0228.

(c) The goal of this part is to find the smallest sample size, n, such that Pr[|  X   X | 1]  .001. In words, we want the

probability that  X  X deviates from the unknown true mean,  X by a significant amount (in this case, over one mile
per hour) to be small (in this case, no greater than 0.001).
Solution: Even though our focus is on  X , it is necessary to address  X in order to solve this problem. We will assume,

again, that  X is known. Recall that  X ~ Normal (  X ,  X / n ) . Hence, using equivalent events:
  X   X
1 .
.001  Pr[|  X   X | 1]  Pr 


  X / n  X / n 


Let Z 


X  X
. From (P1): E ( Z )  E   X   X
 / n
X / n
 X

From (P2): Var ( Z )  Var  X   X
 / n
 X
(3.1)
 E (  X )   X  X   X


0.

X / n
X / n

2
  1 
n




   / n  Var (  X   X )   2 Var (  X )  1
X
  X



Hence, since  X ~ Normal (  X ,  X / n ) , we have Z ~ Normal(0,1) . In (3.1) let z 
1
. Then (3.1) becomes
X / n
Pr[| Z | z ]  Pr[ Z  z ]  Pr[ Z   z ]  2Pr[ Z   z ] where the rightmost equality follows from the fact that the bell curve
is symmetric (draw a picture).
2Pr[ Z   z ]  .001  Pr[ Z   z ]  0.0005  z   FZ1 (0.0005) = - norminv(.0005)=3.29.
Hence, z 
1
 n  ( X z ) 2  (2.5  3.29) 2  67.67 , or nmin  68 measurements.
X / n
Comment In this problem we assumed that the time interval, T, between wind measurements was large enough to assume
that the data collection random variables are iid. Suppose that our assumption holds for T=5 seconds. Then we would
need to collect data for almost 4 minutes prior to obtaining a reliable estimate of  X to be used to decide whether or not to
disengage the clutch. During such a long period of time, if indeed the true mean wind speed increased, the higher load on
the wind turbine blades and transmission (as well as the electrical load) could result in serious problems. And so, the
comfort afforded by statistical reliability with a large sample may be offset by the discomfort caused by the increase in
likelihood of a mechanical/electrical failure. The moral of the story? The real world can present far greater problems than
those encountered in convenient textbook settings. Nonetheless, the academic setting is needed to begin to appreciate and
identify specific assumptions that may not hold in the real world. It is also a logical setting to expand upon. □
Example 4 In monitoring the blood pressure of the heart of a person who has had heart-related surgery, a device is used to
continuously monitor systolic blood pressure [ http://en.wikipedia.org/wiki/Blood_pressure ]. If a 1-minute average of this
pressure goes above a specified level, a warning signal is emitted. Let W = the act of measuring the 1-minute average
systolic pressure at any chosen instant in time. Let X = the act of measuring the 1-minute average systolic pressure of the
5.4
heart in good condition at any instant, and let Y = the act of measuring the 1-minute average systolic pressure of the heart
in bad condition at any instant. In this setting, we desire to know whether W = X or W = Y.
Since the average of a sequence of measurements is being used to make this determination, this suggests that we are, in
fact, attempting to determine whether W   X or W  Y . There is no explicit assumption concerning the structure of
either f X (w) or fY (w) . In fact, if the sample size, n, is assumed to be sufficiently large such that the Central Limit
Theorem (CLT) applies, then these pdf’s will both have a bell-shaped or normal pdf. Since the normal pdf is parameterized
by its mean and variance, our determination presumes that the means of these two pdf’s will differ. Since we are using
only the average, we will make the following second assumption:
X
 Y .
(A2)
We will now proceed to consider two special cases. In both cases we will assume that we know the numerical value of the
parameter  X . In the first case we will also assume that we know the numerical value of Y . In the second case we will
loosen this to the assumption that, while we do not know the numerical value of Y , we do know that Y   X .
Case 1: Suppose that X and Y both have a normal pdf and that  X   Y  3 . Furthermore, suppose that  X  125 and that
Y  135 . The two pdf’s are shown below.
PDFs of X and Y
0.14
0.12
0.1
Z
f (x)
0.08
0.06
0.04
0.02
0
110
115
120
125
130
135
Blood Pressure (x)
140
145
150
Figure 4.1 pdf’s of X and Y. The dashed line at 130 represents the threshold that will be used to decide whether we think
that a given measurement, w, is associated with X or with Y.
Suppose that the threshold level 130 has been chosen as the basis for deciding whether a measurement, z, belongs to X or
to Y. We can formalize this decision process as the following hypothesis test:
Ho: Z
  X versus H1: Z
 Y
In order do decide whether to announce the null hypothesis, Ho, or the alternative hypothesis, H1, our decision rule is:
If z  130 , then announce that we think Ho is the case. If z  130 , then announce that we think H1 is the case. There are
two types of errors that we can make:
Type 1 error: We announce H1 when Ho is true. Type 2 error: We announce Ho when H1 is true.
We are now in a position to compute the probability of each type of error:
Pr[Type 1 Error}: This type of error is premised on the assumption that Ho is true. Stated another way, it is premised on
the fact that we have only the black pdf in Figure 5.7. Now, since the event that we (wrongly) announce that we believe
that H1 is true, is the open interval (130, ) , it follows immediately that the probability that we are wrong is
5.5
Pr[ Z  130]  Pr(130, ) . Graphically, this is simply the area above this interval. Numerically, this probability is
computed from Matlab as: Pr = 1 - normcdf(130,125,3); Pr =
0.0478.
Pr[Type 2 Error]: This error is premised on the fact that H1 is the truth; that is, that we are dealing with the red pdf in
Figure 5.7. Since we will wrongly announce that we think H1 is true, in this case, the probability of this error is
Pr[ Z  130]  Pr( ,130)  .0478 .
Since, typically, we choose Ho to be the default hypothesis, it follows that the ‘burden of proof’ lies with proving that H1
is true. For this reason, Ho is termed the null hypothesis, and H1 is termed the alternative hypothesis. By announcing that
we think H1 is true, we are setting off an ‘alarm’ that signals that the default hypothesis Ho is wrong. For this reason, the
Type I error is often called the false alarm error. Consequently, the corresponding probability is called the false alarm
probability.
Case 2: Again, suppose that X and Y both have a normal pdf and that  X   Y  3 . Furthermore, suppose that  X  125 ,
but that all we know now is that Y  125 . We then have the following hypothesis test:
Ho: W   X
versus
H1: W  Y   X
There are two ways that we can obtain a threshold, call it wth, such that if a measurement of W is greater than that value,
we will announce H1. The first way is to simply choose it, just as we did in Case 1 above. Were we to choose it to be 130,
then we would have the same false alarm probability as we had in that case. Note, however, that we cannot compute a
Type 2 error, since we do not know the value of the mean under the hypothesis H1. While this may seem like a detriment,
it also allows us to consider a variety of possible numerical values for the mean under this alternative hypothesis. For
example, suppose that we want to know our chances of wrongly announcing Ho when the person’s mean blood pressure is
very high, say 140. Visually, this amounts to shifting the red pdf in Figure 5.7 to the right by an amount equal to 5. Then
W  Y ~ N (140,3)
Using this shifted pdf, the Type 2 error is Pr[Y  130] = normcdf(130,140,3) = 0.00043.
The second way to arrive at a threshold is to specify the value of the false alarm probability. Suppose that it is required
that a false alarm occurs with probability 0.001; that is, Pr[ X  wth ]  0.001 . The Matlab command norminv(.999,125,3)
will give Pr[W  wth ] ; that is, it will compute the inverse function of FW ( wth )  .999 . Since Ho is the truth,
Z  X ~ N (125,3) . And so, this command gives 134.27.
(b)Suppose that instead of a 1-minute average, we use a 10-minute average. Then we have  X   Y  3 / 10  0.95 . For
each case above, repeat the analysis.
Case 1: Now we have X ~ N (125,0.95) and Y ~ N (135,0.95) .
Pr[Type 1 Error}: Pr[ X  130] =1 – normcdf(130 , 125 , 0.95) = 7(10-8)
Pr[Type 2 Error}: Pr[Y  130] = normcdf(130 , 135 , 0.95) = 7(10-8)
Case 2: Suppose that we require a false alarm Pr[ X  wth ]  10 4 . Then wth = norminv(1-10^-4 ,125 ,0.95) = 128.5. □