CHAPTER 1 Preliminaries of Probability 1. Transformation of densities Exercise 1. If X has cdf FX (x) and g is increasing and continuous, then Y = g (X) has cdf FY (y) = FX g 1 (y) for all y in the image of y. If g is decreasing and continuous, the formula is FY (y) = 1 FX g 1 (y) Exercise 2. If X has continuous pdf fX (x) and g is increasing and di¤ erentiable, then Y = g (X) has pdf fX g 1 (y) fX (x) = 0 fY (y) = 0 g (g 1 (y)) g (x) y=g(x) for all y in the image of y. If g is decreasing and di¤ erentiable, the formula is fX (x) g 0 (x) fY (y) = : y=g(x) Thus, in general, we have the following result. Proposition 1. If g is monotone and di¤ erentiable, the transformation of densities is given by fY (y) = fX (x) jg 0 (x)j y=g(x) Remark 1. Under proper assumptions, when g is not injective the formula generalizes to X fX (x) : fY (y) = jg 0 (x)j x:y=g(x) Remark 2. A second proof of the previous formula comes from the following characterization of the density: f is the density of X if and only if Z E [h (X)] = h (x) f (x) dx R for all continuous bounded functions h. Let us use this fact to prove that fY (y) = fX (x) jg 0 (x)j y=g(x) is the density of Y = g (X). Let us compute E [h (Y )] for a generic continuous bounded functions h. We i ii 1. PRELIMINARIES OF PROBABILITY have, from the de…nition of Y and from the characterization applied to X, Z E [h (Y )] = E [h (g (X))] = h (g (x)) f (x) dx: R Let us change variable y = g (x), under the assumption that g is monotone, bijective and di¤ erentiable. We have x = g 1 (y), dx = jg0 (g 11 (y))j dy (we put the absolute value since we do not change the extreme R of integration, but just rewrite R ) so that Z Z 1 h (y) f g 1 (y) h (g (x)) f (x) dx = dy: 0 (g 1 (y))j jg R R If we set fY (y) := fX (x) jg 0 (x)j y=g(x) we have proved that Z E [h (Y )] = h (y) fY (y) dy R for every continuous bounded functions h. By the characterization, this implies that fY (y) is the density of Y . This proof is thus based on the change of variable formula. Remark 3. The same proof works in the multidimensional case, using the change of variable formula for multiple integrals. Recall that in place of dy = g 0 (x)dx one has to use dy = jdet Dg (x)j dx where Dg is the Jacobian (the matrix of …rst derivatives) of the transformation g : Rn ! Rn . In fact we need the inverse transformation, so we use the corresponding formula 1 dx = det Dg 1 (y) dy = dy: jdet Dg (g 1 (y))j With the same passages performed above, one gets the following result. Proposition 2. If g is a di¤ erentiable bijection and Y = g (X), then fY (y) = fX (x) jdet Dg (x)j : y=g(x) Exercise 3. If X (in Rn ) has density fX (x) and Y = U X, where U is an orthogonal linear transformation of Rn (it means that U 1 = U T ), then Y has density fY (y) = fX U T y : 1.1. Linear transformation of moments. The solution of the following exercises is based on the linearity of expected value (and thus of covariance in each argument). X Exercise 4. Let X = (X1 ; :::; Xn ) be a random vector, A be a n d matrix, Y = AX. Let X X = X 1 ; :::; n be the vector of mean values of X, namely i = E [Xi ]. Then Y is the vector of mean values of Y , namely Y i := A X = E [Yi ]. Exercise 5. Under the same assumptions, if QX and QY are the covariance matrices of X and Y , then QY = AQX AT : 2. ABOUT COVARIANCE MATRICES iii 2. About covariance matrices The covariance matrix Q of a vector X = (X1 ; :::; Xn ), de…ned as Qij = Cov (Xi ; Xj ), is symmetric: Qij = Cov (Xi ; Xj ) = Cov (Xj ; Xi ) = Qji and non-negative de…nite: T x Qx = n X Qij xi xj = i;j=1 Cov (Xi ; Xj ) xi xj = i;j=1 0 1 n n X X = Cov @ xi Xi ; xj Xj A = V ar [W ] i=1 Pn n X n X Cov (xi Xi ; xj Xj ) i;j=1 j=1 where W = i=1 xi Xi . The spectral theorem states that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e1 ; :::; en of Rn where Q takes the form 0 1 0 0 1 Qe = @ 0 ::: 0 A : 0 0 n Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors. Since the covariance matrix Q is also non-negative de…nite, we have i 0; i = 1; :::; n: Remark 4. To understand better this theorem, recall a few facts of linear algebra. Rn is a vector space with a scalar product h:; :i, namely a set of elements (called vectors) with certain operations (sum of vectors, multiplication by real numbers, scalar product between vectors) and properties. We may call intrinsic the objects de…ned in these terms, opposite to the objects de…ned by means of numbers, with respect to a given basis. A vector x 2 Rn is an intrinsic object; but we can write it as a sequence of numbers (x1 ; :::; xn ) in in…nitely many ways, depending on the basis we choose. Given an orthonormal basis u1 ; :::; un , the components of a vector x 2 Rn in this basis are the numbers hx; uj i, j = 1; :::; n. A linear map L in Rn , given the basis u1 ; :::; un , can be represented by a matrix of components hLui ; uj i. We shall write y T x for hx; yi (or hy; xi). Remark 5. After these general comments, we see that a matrix represents a linear transformation, given a basis. Thus, given the canonical basis of Rn , that we shall denote by u1 ; :::; un , given the matrix Q, it is de…ned a linear transformation L from Rn to a Rn . The spectral theorem states that there is a new orthonormal basis e1 ; :::; en of Rn such that, if Qe represents the linear transformation L in this new basis, then Qe is diagonal. Remark 6. Let us recall more facts about linear algebra. Start with an orthonormal basis u1 ; :::; un , that we call canonical or original basis. Let e1 ; :::; en be another orthonormal basis. The vector u1 , in iv 1. PRELIMINARIES OF PROBABILITY the canonical basis, has components 0 1 1 B 0 C C u1 = B @ ::: A 0 and so on for the other vectors. Each vector ej has certain components. Denote by U the matrix such that its …rst column has the same components as e1 (those of the canonical basis), and so on for the other columns. We could write U = (e1 ; :::; en ). Also, Uij = eTj ui . Then 0 1 1 B 0 C C UB @ ::: A = e1 0 and so on, namely U represents the linear map which maps the canonical (original) basis of Rn into e1 ; :::; en . This is an orthogonal transformation: U Indeed, U same: 1 1 = UT : maps e1 ; :::; en into the canonical basis (by the above property of U ), and U T does the 0 1 0 eT1 e1 B eT e1 C B 2 C B U T e1 = B @ ::: A = @ eTn e1 and so on. 1 1 0 C C ::: A 0 Remark 7. Let us now go back to the covariance matrix Q and the matrix Qe given by the spectral theorem: Qe is a diagonal matrix which represents the same linear transformation L in a new basis e1 ; :::; en . Assume we do not know anything else, except they describe the same map L and Qe is diagonal, namely of the form 0 1 0 0 1 Qe = @ 0 ::: 0 A : 0 0 n Let us deduce a number of facts: i) from basic linear algebra we know the relation Qe = U T QU ii) the diagonal elements iii) j 0, j = 1; :::; n. j are eigenvalues of L, with eigenvectors ej 3. GAUSSIAN VECTORS v 0 1 1 B 0 C C To prove (ii), let us write the vector Le1 in the basis e1 ; :::; en : ei is the vector B @ ::: A, the map 0 L is represented by Qe , hence Le1 is equal to 0 1 0 1 0 1 1 1 1 B 0 C B 0 C B C C=B C = 1B 0 C Qe B @ ::: A @ ::: A @ ::: A 0 0 0 which is 1 e1 in the basis e1 ; :::; en . We have checked that Le1 = 1 e1 , namely that 1 is an eigenvalue and e1 is a corresponding eigenvector. The proof for 2 , etc. is the same. To prove (iii), just see that, in the basis e1 ; :::; en , eTj Qe ej = j : But eTj Qe ej = eTj U T QU ej = v T Qv 0 where v = U ej , having used the property that Q is non-negative de…nite. Hence j 0. 3. Gaussian vectors Recall that a Gaussian, or Normal, r.v. N f (x) = p ; 1 2 2 2 is a r.v. with probability density ! jx j2 exp : 2 2 We have shown that is the mean value and 2 the variance. The standard Normal is the case = 0, 2 = 1. If Z is a standard normal r.v., then + Z is N ; 2 . We may give the de…nition of Gaussian vector in two ways, generalizing either the expression of the density or the property that + Z is N ; 2 . Let us start with a lemma. Lemma 1. Given a vector = ( 1 ; :::; n ) and a symmetric positive de…nite n (namely v T Qv > 0 for all v 6= 0), consider the function ! (x )T Q 1 (x ) 1 exp f (x) = p 2 (2 )n det(Q) n matrix Q where x = (x1 ; :::; xn ) 2 Rn . Notice that the inverse Q 1 is well de…ned for positive de…nite matrices, (x )T Q 1 (x ) is a positive quantity, det(Q) is a positive number. Then: i) f (x) is a probability density; ii) if X = (X1 ; :::; Xn ) is a random vector with such joint probability density, then is the vector of mean values, namely i = E [Xi ] and Q is the covariance matrix: Qij = Cov (Xi ; Xj ) : vi 1. PRELIMINARIES OF PROBABILITY Proof. Step 1. In this step we explain the meaning of the expression f (x). We have recalled above that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e1 ; :::; en of Rn where Q takes the form 1 0 0 0 1 Qe = @ 0 ::: 0 A : 0 0 n Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors. See above for more details. Let U be the matrix introduced there, such that U 1 = U T . Recall the relation Qe = U T QU . Since v T Qv > 0 for all v 6= 0, we deduce v T Qe v = v T U T Q (U v) > 0 for all v 6= 0 (since U v 6= 0). Taking v = ei , we get i > 0. Therefore the matrix Qe is invertible, with inverse given by 0 1 1 0 0 1 Qe 1 = @ 0 ::: 0 A : 1 0 0 n It follows that also Q, being equal to U Qe U T (the relation Q = U Qe U T comes from Qe = U T QU ), is invertible, with inverse Q 1 = U Qe 1 U T . Easily one gets (x )T Q 1 (x ) > 0 for x 6= . Moreover, det(Q) = det (U ) det (Qe ) det U T = 1 n because det(Qe ) = and det (U ) = 1 n 1. The latter property comes from 1 = det I = det U T U = det U T det (U ) = det (U )2 (to be used in exercise 3). Therefore det(Q) > 0. The formula for f (x) is meaningful and de…nes a positive function. Step 2. Let us prove that f (x) is a density. By the theorem of change of variables in multidimensional integrals, with the change of variables x = U y, Z Z f (x) dx = f (U y) dy Rn Rn because jdet U j = 1 (and the Jacobian of a linear transformation is the linear map itself). Now, since U T Q 1 U = Qe 1 , f (U y) is equal to the following function: ! T 1 (y (y ) Q ) 1 e e e exp fe (y) = p 2 (2 )n det(Qe ) where e = UT : 3. GAUSSIAN VECTORS Since (y T e) 1 Qe (y e) = vii n X (yi i i=1 and det(Qe ) = 1 n, we get n Y 1 p fe (y) = 2 i=1 (yi exp i 2 e )i ) ( ( e )i )2 2 i ! : Namely, fe (y) is the product of n Gaussian densities N (( e )i ; i ). We know from the theory of joint probability densities that the product of densities is the joint density of a vector with indepenR dent components. Hence f (y) is a probability density. Therefore f (y) dy = 1. This proves n e e R R f (x) dx = 1, so that f is a probability density. Rn Step 3. Let X = (X1 ; :::; Xn ) be a random vector with joint probability density f , when written in the original basis. Let Y = U T X. Then (exercise 3) Y has density fY (y) given by fY (y) = f (U y). Thus ! n Y (yi ( e )i )2 1 p : exp fY (y) = fe (y) = 2 i 2 i i=1 Thus (Y1 ; :::; Yn ) are independent N (( e )i ; E [Yi ] = ( i) r.v. and therefore e )i ; Cov (Yi ; Yj ) = ij i : From exercises 4 and 5 we deduce that X = U Y has mean X =U Y and covariance QX = U QY U T : Since Y = e and e = U T we readily deduce we get QX = Q. The proof is complete. X = UUT = . Since QY = Qe and Q = U Qe U T Definition 1. Given a vector = ( 1 ; :::; n ) and a symmetric positive de…nite n n matrix Q, we call Gaussian vector of mean and covariance Q a random vector X = (X1 ; :::; Xn ) having joint probability density function ! 1 (x )T Q 1 (x ) f (x) = p exp 2 (2 )n det(Q) where x = (x1 ; :::; xn ) 2 Rn . We write X N ( ; Q). The only drawback of this de…nition is the restriction to strictly positive de…nite matrices Q. It is sometimes useful to have the notion of Gaussian vector also in the case when Q is only non-negative de…nite (sometimes called degenerate case). For instance, we shall see that any linear transformation of a Gaussian vector is a Gaussian vector, but in order to state this theorem in full generality we need to consider also the degenerate case. In order to give a more general de…nition, let us take the idea recalled above for the 1-dimensional case: a¢ ne transformations of Gaussian r.v. are Gaussian. viii 1. PRELIMINARIES OF PROBABILITY Definition 2. i) The standard d-dimensional Gaussian vector is the random vector Z = (Z1 ; :::; Zd ) d Y z2 with joint probability density f (z1 ; :::; zd ) = p (zi ) where p (z) = p12 e 2 : i=1 ii) All other Gaussian vectors X = (X1 ; :::; Xn ) (in any dimension n) are obtained from standard ones by a¢ ne transformations: X = AZ + b where A is a matrix and b is a vector. If X has dimension n, we require A to be d dimension n (but n can be di¤ erent from d). n and b to have The graph of a standard 2-dimensional Gaussian vector is 0.15 0.10 z 0.05 -2 0.00 -2 0 0 2 2 y x and the graph of the other Gaussian vectors can be guessed by linear deformations of the base plane xy (deformations de…ned by A) and shift (by b). For instance, if 2 0 0 1 A= matrix which enlarge the x axis by a factor 2, we get the graph 0.15 z 0.10 0.05 -4 0.00 -4 -2 -2 0 0 2 2 4 4 y x First, let us compute the mean and covariance matrix of a vector of the form X = AZ + b, with Z of standard type. From exercises 4 and 5 we readily have: 3. GAUSSIAN VECTORS Proposition 3. Mean ix and covariance Q matrix of a vector X of the previous form are given by =b Q = AAT : When two di¤erent de…nitions are given for the same object, one has to prove their equivalence. If Q is positive de…nite, the two de…nition aim to describe the same object, but for Q non-negative de…nite but not strictly positive de…nite, we have only the last de…nition, so we do not have to check any compatibility. Proposition 4. If Q is positive de…nite, then de…nitions 1 and 2 are equivalent. More precisely, if X = (X1 ; :::; Xn ) is a Gaussian random vector with mean and covariance Q in the sense of de…nition 1, then there exists a standard Gaussian random vector Z = (Z1 ; :::; Zn ) and a n n matrix A such that X = AZ + : p One can take A = Q, as described in the proof. Vice versa, if X = (X1 ; :::; Xn ) is a Gaussian random vector in the sense of de…nition 2, of the form X = AZ + b, then X is Gaussian in the sense of de…nition 1, with mean and covariance Q given by the previous proposition. Proof. Let us prove the …rst claim. Let us de…ne p p Q = U Qe U T p where Qe is simply de…ned as 0 p 1 0 0 1 p ::: p0 A : Qe = @ 0 0 0 n We have p and because p p Q 2 =U p Qe Qe = Qe . Set Q p T p =U Qe U T U Qe T UT = U p Qe U T = U Z= p Q p 1 p Qe Qe U T = p Q p Qe U T = U Qe U T = Q X p where notice that Q is invertible, from its de…nition and the strict positivity of Gaussian. Indeed, from the formula for the transformation of densities, fZ (z) = fX (x) jdet Dg (x)j z=g(x) i. Then Z is x 1. PRELIMINARIES OF PROBABILITY p 1 ; hence det Dg (x) = det Q = p 1 p ; therefore n 1 p p T n Yp Qz + Q 1 Qz + 1 fZ (z) = exp ip n 2 (2 ) det(Q) i=1 ! p p T Qz Q 1 Qz 1 zT z 1 =p exp exp =p n n 2 2 (2 ) (2 ) where g (x) = p Q 1 x ! p which is the density of a standard Gaussian vector. From the de…nition of Z we get X = QZ + , so the …rst claim is proved. The proof of the second claim is a particular case of the next exercise, that we leave to the reader. Exercise 6. Let X = (X1 ; :::; Xn ) be a Gaussian random vector, B a n m matrix, c a vector of Rm . Then Y = BX + c is a Gaussian random vector of dimension m. The relations between means and covariances is Y =B X +c and covariance QY = BQX B T : Remark 8. We see from the exercise that we may start with a non-degenerate vector X and get a degenerate one Y , if B is not a bijection. This always happens when m > n. Remark 9. The law of a Gaussian vector is determined by the mean vector and the covariance matrix. This fundamental fact will be used below when we study stochastic processes. Remark 10. Some of the previous results are very useful if we want to generate random vectors according to a prescribed Gaussian law. Assume we have prescribed mean and covariance Q, ndimensional, and want to generate a random sample (x1 ; :::; xn ) from such N ( ; Q). Then we may generate n independent samples z1 ; :::; zn from the standard one-dimensional Gaussian law and compute p Qz + p where z = (z1 ; :::; zn ). In order to have the entries of the matrix p p Q, if the software does p not provide them (certain software do it), we may use the formula Q = U Qe U T . The matrix Qe is obvious. In order to get the matrix U recall that its columns are the vectors e1 ; :::; en written in the original basis. And such vectors are an orthonormal basis of eigenvectors of Q. Thus one has to use at least a software that makes the spectral decomposition of a matrix, to get e1 ; :::; en .
© Copyright 2026 Paperzz