13 13.1 Continuous random variables Introduction We continue with the study of random variables, but now consider the case where the range RX of a random variable X is either the real line R, or an interval (or collection of intervals) contained in R. Examples of continuous random variables are • the total rainfall (in cm) in Sheffield tomorrow; • your current blood pressure level (in millimetres of mercury); • the time in seconds for a runner to complete the London Marathon. (In practice, measurements of these quantities would actually be discrete, but it’s usually more convenient (and harmless) to treat them as continuous). 13.2 Why make the distinction between continuous and discrete random variables? Let’s try to write down a probability mass function for a continuous random variable. Consider again the spinning roulette wheel example from section 1: Example 47. A (European) roulette wheel is spun. Consider the angle between the horizontal axis (viewing the wheel from above) and a line from the centre of the wheel that bisects the zero on the wheel. Assuming that any angle is equally likely, (i) what it the probability that this angle is π/2? (ii) What is the probability that this angle is between π/2 and π? 66 Denote the random angle by X. This is a continuous random variable with range RX = [0, 2π]. We assume all possible values of X to be ‘equally likely’, so that P (X = a) = P (X = b) for any a, b ∈ [0, 2π]. If we write P (X = x) = k for some constant value k, what value would k be? Now, P {X ∈ [0, 2π]} = 1, so X X ? 1 = P {X ∈ [0, 2π]} = P (X = x) = k. x∈Rx x∈Rx But there are (uncountably) infinitely many different x in the set RX , so how can we sum k an infinite number of times and get 1? (And how would we even write down a sum of uncountably many values?) Clearly, this isn’t going to work. In fact, we’ve already dealt with this problem earlier in the course, when we considered the problem of a randomly generated angle, and defined probability as a measure. We will repeat the discussion here and extend it to consider continuous random variables in general. 13.3 Probability measures for continuous random variables Recall that in section 7.2 we said that a probability mass function for a discrete random variable X could be thought of as defining a probability measure mX on RX , such that for a set of interest A we have P (X ∈ A) = mX (A). We consider how to do something similar in a continuous setting. For the example above, we want our range to be RX = [0, 2π]. We don’t want one part of the circle to be favoured over any other, so for two intervals of the same length [a, a + w] and [b, b + w] (with a, a + w, b, b + w ∈ RX ) we would like the probabilities that X is in each of them to be the same. We can achieve this by considering the measure mX defined by b−a mX ([a, b]) = . 2π 67 (6) (This is the Lebesgue measure, divided by 2π). Assume that this measure defines the distribution of X, so that P (X ∈ A) = mX (A). Then, as we wanted, any two intervals of the same width as above will be equally likely: P (X ∈ [a, b]) = P (X ∈ [a + w, b + w]) = w . 2π However, any single value has zero probability: P (X = a) = mX ([a, a]) = a−a = 0. 2π We can now answer the questions in Example 47 without difficulty. We have P (X = π/2) = 0 and P (π/2 ≤ X ≤ π) = P ([π/2, π]) = π − π/2 = 0.25. 2π One way to think of the measure defined in (6) as an area under a curve. If we draw the ‘curve’ y = 1/(2π) for x ∈ [0, 2π], and y = 0 otherwise, then mX ([a, b]) is the area under the curve between x = a and x = b. An illustration is given in Figure 10. Again, as the area between x = a and x = a is zero, this reinforces the point that P (X = a) = 0. The ‘area under the curve’ interpretation suggests that we can construct other valid probability measures, by drawing different curves. We can consider any function f (x) with f (x) ≥ 0 for all x, such that the total area under the curve is 1. Another example is given in Figure 11, for a random variable X with range RX = [0, 10], using the curve f (x) = 3x(10 − x) , 500 68 0.3 0.2 y 0.1 0.0 0 1 2 3 x 4 5 6 Figure 10: P (X ∈ [π/2, π]) is given by the area under the curve y = 1/(2π) between x = π/2 and x = π. for x ∈ [0, 10], and f (x) = 0 otherwise. This measure will give more probability to X lying in the interval [4, 6] than the interval [0, 2], say, even though the two intervals have the same width. Based on this, we make the following definition. Definition 27. A probability density function (p.d.f. for short) fX is a function such that both fX (x) ≥ 0 for all x, and Z ∞ fX (t) dt = 1. −∞ A random variable X with p.d.f. fX has the property that Z b P (a ≤ X ≤ b) = fX (t) dt. a We have exactly the same definition as in Section 7.3 for the cumulative 69 0.3 0.25 y 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 x 6 7 8 9 10 11 Figure 11: P (X ∈ ([2, 4]) is given by the area under the curve y = − 3x(x−10) between x = 2 500 and x = 4. distribution function: FX (x) := P (X ≤ x), but for a continuous random variable, this is calculated as an area under a curve instead of a summation (see (5) on page 36 in Section 7.3): Z x FX (x) = fX (t)dt, −∞ where fX (x) is the p.d.f. of X. So, whereas a discrete random variable has a probability mass function, a continuous random variable has a probability density function. From the definitions, it follows that d FX (x) = fX (x). dx In summary, for the distribution of a continuous random variable, we use 70 ‘area under a curve’ as a probability measure, and the curve that we choose for a particular random variable X is called the probability density function of X. Note that the condition Z ∞ fX (t)dt = 1, −∞ applies because this integral represents P (−∞ < X < ∞). Note also that Z a P (X = a) = P (a ≤ X ≤ a) = fX (x)dx = 0, a and that P (X < a) = P (X ≤ a) − P (X = a) = P (X ≤ a). Example 48. Let X be a random variable with p.d.f. given by fX (x) = ke−x for x ∈ [0, 2], and 0 otherwise. 1. Find the value of k. 2. Derive the cumulative distribution function 3. Calculate the probability that X ≤ 0.5. 14 Expectation and variance of continuous random variables Definition 28. For a continuous random variable X, the expectation of g(X), for some function g defined on RX , is defined as Z ∞ E{g(X)} := g(x)fX (x)dx. −∞ 71 Setting g(X) = X gives ∞ Z E{X} = xfX (x)dx, −∞ and we again use the notation µX := E(X), with µX referred to as the mean of X. Variance has the same definition as before, but it is now calculated using integration Var(X) := E{(X − µX )2 } Z ∞ = (x − µX )2 fX (x)dx. −∞ Note: for a continuous random variable X, the identity Var(X) = E(X 2 ) − E(X)2 still holds, so, as with discrete random variables, we can calculate Var(X) by calculating E(X) and E(X 2 ). All the properties of expectation and variance that we met in Section 8 still hold: E(aX + b) = aE(X) + b, E(X + Y ) = E(X) + E(Y ), Var(aX + b) = a2 Var(X), and, if X and Y are independent, E(XY ) = E(X)E(Y ), Var(X + Y ) = Var(X) + Var(Y ). 72 Example 49. Calculate the expectation and variance of the random variable defined in Example 48. 15 15.1 Standard continuous probability distributions The exponential distribution We consider three standard probability distributions for continuous random variables: the exponential distribution, the uniform distribution, and the normal distribution. The exponential distribution is used to represent a ‘time to an event’. Examples of ‘experiments’ that we might describe using a exponential random variable are • a patient with heart disease is given a drug, and we observe the time until the patient’s next heart attack; • a new car is bought and we observe how many miles the car is driven before it has its first breakdown. Definition 29. If a random variable X has an exponential distribution, with rate parameter λ, then its probability density function is given by fX (x) = λe−λx , for x ≥ 0, and 0 otherwise. We write X ∼ Exp(rate = λ), or just X ∼ Exp(λ), to mean “X has an exponential distribution with rate parameter λ”. 73 Theorem 21. Cumulative distribution function of an exponential random variable If X ∼ Exp(λ), then FX (x) = 1 − e−λx . We can see that limx→∞ F (X)(x) = 1 (so that “FX (∞) = 1”), so that, as required of a p.d.f., Z ∞ Z ∞ fX (x)dx = λe−λx dx = 1. −∞ 0 2 1 1.8 0.9 1.6 0.8 1.4 0.7 1.2 0.6 F X (x) f X (x) We plot both the p.d.f. and c.d.f. of an Exp(2) random variable in Figure 12 1 0.5 0.8 0.4 0.6 0.3 0.4 0.2 0.2 0.1 0 0 1 2 x 3 0 4 0 1 2 x 3 4 Figure 12: The p.d.f. (left plot) and c.d.f. (right plot) of an exponential random variable X with rate parameter λ = 2. Theorem 22. (Expectation and variance of an exponential random variable) 74 If X ∼ Exp(λ), then 1 , λ 1 Var(X) = 2 . λ Theorem 23. (The ‘lack of memory’ property of an exponential random variable) E(X) = If X ∼ Exp(λ), then P (X > x + a|X > a) = P (X > x). In other words, exponential random variables have the interesting property that they ‘forget’ how ‘old’ they are. If the lifetime of some object has an exponential distribution, and the object survives from time 0 to time a, it will ‘carry on’ as if it was starting at time 0. Example 50. A computer is left running continuously until it first develops a fault. The time until the fault, X, is to be modelled with an exponential distribution. The expected time until the first fault is 100 days. 1. If X ∼ Exp(λ), determine the value of λ. What is the standard deviation of X? 2. What is the probability that is the computer develops a fault within the first 100 days? 3. If the computer is still working after 100 days, what is the probability that it will still be working after 150 days? Example 51. Suppose the number of earthquakes Nt in an interval [0, t] has a P oisson(φt) distribution, for any value of t. Recall that if X ∼ P oisson(λ), e−λ λx pX (x) = P (X = x) = x! 75 Let T be the time until the first earthquake. What is the cumulative distribution function of T ? What is the distribution of T ? 15.2 The uniform distribution The uniform distribution is used to describe a random variable that is constrained to lie in some interval [a, b], but has the same probability of lying in any interval contained within [a, b] of a fixed width. The uniform distribution is an important concept in probability theory, but it is less useful for modelling uncertainty in the real world; it is not often plausible in real situations that all intervals of the same width are equally likely. Definition 30. If a random variable X has a uniform distribution over the interval [a, b], then its probability density function is given by fX (x) = 1 , b−a for x ∈ [a, b], and 0 otherwise. We write X ∼ U [a, b], to mean “X has a uniform distribution over the interval [a, b].” Theorem 24. (Cumulative distribution function of a uniform random variable) If X ∼ U [a, b], then for x ∈ [a, b] FX (x) = x−a . b−a Plotting the c.d.f. between x = a and x = b will give a straight line, joining the points (a, 0) and (b, 1). We plot the p.d.f. and c.d.f. in Figure 13. 76 0.15 1 0.8 f X (x) F X (x) 0.1 0.6 0.4 0.05 0.2 0 10 15 x 0 20 10 15 x 20 Figure 13: The p.d.f. (left plot) and c.d.f. (right plot) of a uniform random variable X over the interval [10, 20]. Theorem 25. (Expectation and variance of a uniform random variable) If X ∼ U [a, b], then a+b , 2 (b − a)2 Var(X) = . 12 E(X) = Example 52. Let X ∼ U [−1, 1]. Calculate E(X), Var(X) and P (X ≤ −0.5|X ≤ 0). 77 15.3 The standard normal distribution The normal distribution is very important distribution in both probability and statistics. Before studying it, we first introduce the Gaussian integral: Z ∞ √ 2 (7) e−x dx = π. −∞ (A proof is given in Applebaum (2008), though you will need to understand changes of variables within double integration). We first define the standard normal distribution, before considering the more general case. Definition 31. If a random variable Z has a standard normal distribution, then its probability density function is given by 2 1 z fZ (z) = √ exp − . 2 2π We write Z ∼ N (0, 1), to mean “Z has the standard normal distribution.” We can use (7) to confirm that this is a valid p.d.f. √ Starting dxwith the √ Gaussian integral, we make the substitution x = z/ 2, so that dz = 1/ 2: (7) immediately gives Z ∞ 1 2 √ e−x dx = 1 π −∞ and the substitution then gives Z ∞ 1 − z2 √ e 2 dz = 1, 2π −∞ 78 The p.d.f. is plotted in Figure 14 (left plot). Note the distinctive ‘bell-shaped’ curve and the symmetry about z = 0. 15.3.1 The cumulative distribution function of a standard normal random variable The c.d.f. is z 2 1 t √ exp − FZ (z) = P (Z ≤ z) = dt. 2 2π −∞ Z However, we can’t evaluate this integral analytically, and have to use numerical methods. There are various statistical tables available that give the value of FZ (z) for different z, but these have now largely been superseded by modern computing packages, and we will see how to calculate FZ (z) using R in Section 15.4.3. The c.d.f. is plotted below (right plot). The notation Φ is commonly used to represent the cdf, and φ to represent the pdf: 2 Z z 1 t √ exp − Φ(z) := P (Z ≤ z) = dt 2 2π −∞ 2 1 z φ(z) := √ exp − . 2 2π Then d Φ(z) = φ(z). dz Theorem 26. (Relationship between Φ(z) and Φ(−z).) Φ(−z) = 1 − Φ(z). 79 0.5 1 0.45 0.4 0.8 0.35 F Z (z) f Z (z) 0.3 0.25 0.2 0.6 0.4 0.15 0.1 0.2 0.05 0 ï5 0 0 ï5 5 z 0 z 5 Figure 14: The p.d.f. (left plot) and c.d.f. (right plot) of a standard normal random variable. Note the ‘bell shape’ of the pdf; the p.d.f. is sometimes referred to as the ‘bell-shaped curve’. This can be seen in Figure 15. We denote the quantile function by Φ−1 . If we want z such that P (Z ≤ z) = α, then we write Φ(z) = α z = Φ−1 (α). Theorem 27. (Expectation and variance of a standard normal random variable) If Z ∼ N (0, 1), then E(Z) = 0 Var(Z) = 1. 80 0.5 0.45 0.4 0.35 f Z (z) 0.3 0.25 0.2 0.15 0.1 0.05 0 ï4 ï3 ï2 ï1 0 z 1 2 3 4 Figure 15: As fZ is a symmetric about z = 0 the area under the curve between −∞ and −z is the same as the area under the curve between z and ∞. 15.4 The normal distribution: the general case The standard normal distribution is one example of the family of normal distributions, in which the mean is 0 and the variance is 1, but, in general, normal random variables can have any values for the mean and variance (though variances cannot be negative, of course). Normal distributions are used very widely in many situations, for example: • many physical characteristics of humans and other animals, for example the distribution of heights of females in a particular age group, can be well represented with a normal distribution; • scientists often assume that ‘measurement errors’ are normally distributed; 81 • normal distributions are commonly used in finance to model changes in stock prices (though not always sensibly!). Some idea why the normal distribution is so important will be given later in the course in the section on the Central Limit Theorem. Definition 32. If a random variable X has a normal distribution with mean µ and variance σ 2 , then its probability density function is given by 1 1 2 fX (x) = √ exp − 2 (x − µ) 2σ 2πσ 2 We write X ∼ N (µ, σ 2 ), to mean “X has a normal distribution with mean µ and variance σ 2 .” Immediately, we can see that by setting µ = 0 and σ 2 = 1 in Definition 32, we get the standard normal p.d.f. in Definition 31. Theorem 28. (Definition of a general normal random variable via transformation of a standard normal random variable) Let Z ∼ N (0, 1), and define X = µ + σZ. Then E(X) = µ, Var(X) = σ 2 and X ∼ N (µ, σ 2 ), 15.4.1 Summary It’s worth stating again the relationship between a standard normal random variable Z and a ‘general’ normal random variable X. 82 • Given Z ∼ N (0, 1), we can obtain X ∼ N (µ, σ 2 ) by transforming Z: X = µ + σZ. • Given X ∼ N (µ, σ 2 ), we can obtain Z ∼ N (0, 1) by transforming X: Z= X −µ , σ and we refer to transforming X to get a standard N (0, 1) random variable as standardising X. Traditionally, we would calculate the c.d.f. of X via standardising and using the Φ(z) function: x−µ X −µ x−µ =Φ , P (X ≤ x) = P ≤ σ σ σ where Φ(z) is given in statistical tables for various values of z. As discussed before, statistical tables have become largely obsolete given computer packages such as R, although the technique of standardising is still computationally useful. 15.4.2 Visualising the mean and variance By plotting the density function, we can see the effect of changing the value of µ and σ 2 and so interpret these parameters more easily. Starting with the mean, we see, in Figure 16 that if X ∼ N (µ, σ 2 ), then the maximum of the p.d.f. is at x = µ. If we change µ whilst leaving σ 2 unchanged (Figure 16, top plot), the p.d.f. ‘shifts’ along the x-axis, but the shape of the p.d.f. is unchanged. 83 The variance parameter σ 2 determines how ‘spread out’ the p.d.f. is. If we increase σ 2 , whilst leaving µ unchanged (Figure 16, bottom plot), the peak of the p.d.f. is in the same place, but we get a flatter curve. This is to be expected, remembering that random variables with larger variances are more likely to be further away from their expectations. f X (x) 0.4 N (0, 1) 0.3 N (2, 1) 0.2 0.1 0 ï5 ï4 ï3 ï2 ï1 0 x 1 2 3 4 5 6 0.4 f X (x) N (0, 1) 0.3 0.2 N (0, 4) 0.1 0 ï6 ï4 ï2 0 x 2 4 6 Figure 16: Top plot: pdfs for the N (0, 1) and N (2, 1) distributions. Bottom plot: pdfs for the N (0, 1) and N (0, 4) distribution. 15.4.3 The normal distribution in R R will calculate the p.d.f., c.d.f. and quantile functions, and will also generate normal random variables. Note that in R, we specify the standard deviation rather than the variance. • Calculate the pdf: dnorm(x,mu,sigma) Example: calculate fX (2) when X ∼ N (1, 4). 84 > dnorm(2,1,2) [1] 0.1760327 • Calculate the cdf: pnorm(x,mu,sigma) Example: calculate FX (−1) = P (X ≤ −1) when X ∼ N (1, 4). > pnorm(-1,1,2) [1] 0.1586553 • Invert the c.d.f. to find the α quantile: qnorm(alpha,mu,sigma) Example: if X ∼ N (0, 1), what value of x satisfies the equation FX (x) = P (X ≤ x) = 0.95? > qnorm(0.95,0,1) [1] 1.644854 Check: > pnorm(1.644854,0,1) [1] 0.95 • Generate m random observations from a normal distribution: rnorm(m,mu,sigma) Example: generate 3 random observations from the N (15, 25) distribution. > rnorm(3,15,5) [1] 6.985971 20.671469 11.637691 Example 53. If X ∼ N (3, 4), what is the 25th percentile of the distribution of X? Use Φ−1 (0.75) = 0.67. 85 Example 54. (from Ross, 2010). An expert witness in a paternity suit testifies that the length, in days, of human gestation is approximately normally distributed, with mean 270 days and standard deviation 10 days. The defendant has proved that he was out of the country during a period between 290 days before the birth of the child and 240 days before the birth of the child, so if he is the father, the gestation period must have either exceeded 290 days, or been shorter than 240 days. How likely is this? 15.4.4 The two-σ rule For a standard normal random variable Z, P (−1.96 ≤ Z ≤ 1.96) = 0.95. Since E(Z) = 0 and Var(Z) = 1, the probability of Z being within two standard deviations of its mean value is approximately 0.95 (ie P (−2 ≤ Z ≤ 2) = 0.9545 to 4 d.p.). We illustrate this in Figure 17. If we now consider any normal random variable X ∼ N (µ, σ 2 ), the probability that it will lie within a distance of two standard deviations from its mean is approximately 0.95. This is straightforward to verify: P (|X − µ| ≤ 1.96σ) = P (µ − 1.96σ ≤ X ≤ µ + 1.96σ) X −µ = P −1.96 ≤ ≤ 1.96 σ = P (−1.96 ≤ Z ≤ 1.96) = 0.95, (with P (|X − µ| ≤ 2σ) = 0.9545 to 4 d.p.). 86 0.45 0.4 0.35 f X (x) 0.3 0.25 0.2 0.15 0.1 0.05 0 ï4 ï3 ï2 ï1 0 x 1 2 3 4 Figure 17: The p.d.f. of a N (0, 1) random variable (mean 0, standard deviation 1). There is a 95% probability that a normal random variable will lie within 1.96 standard deviations of its mean. In Statistics, there is a convention of using 0.05 as a threshold for a ‘small’ probability (more of this in Semester 2), though the choice of 0.05 is arbitrary. However, the two-σ rule is an easy to remember fact about normal random variables, and can be a useful yardstick in various situations. 87
© Copyright 2024 Paperzz