Chapter 2: Review of Probability

Chapter 2: Review of Probability
Karim Naguib
Boston University
September 5, 2013
Karim Naguib (BU)
Chapter 2
September 5, 2013
1 / 18
Random Variables
Outcomes are the mutually exclusive potential results of a random
process (e.g. roll of a die, flip of a coin)
The sample space is the set of all possible outcomes
Each outcome has its own probability
An event is a subset of the sample space
A random variable is a mapping from a random outcome to a
numerical value (denoted by an uppercase letter, e.g., X ). There are
two types of random variables
discrete random variables take discrete values (integers)
continuous random variables take continuous values (real numbers)
Karim Naguib (BU)
Chapter 2
September 5, 2013
2 / 18
Probability Distribution
For discrete random variables we have a probability mass function
(pmf) that gives the probability of every possible outcome. Denoted
by Pr (X = x) = P(x)
For continuous random variables we have a probability density
function (pdf).
I
The area under this function between any two values is the probability
of the random variable to take on a value between these two values.
Z b
Pr (a < X < b) =
f (x)dx
a
I
Typically denoted by f (x)
The cumulative distribution function (cdf) give the probability
that a random variable is less than or equal to a particular value.
Typically denoted by F (x)
Pr (X ≤ x) = F (x)
Karim Naguib (BU)
Chapter 2
September 5, 2013
3 / 18
Mean and Variance
The expected value or mean of a discrete random variable X with
sample space {x1 , . . . , xk } is
X
µX = E [X ] =
xi P(xi )
i
Karim Naguib (BU)
Chapter 2
September 5, 2013
4 / 18
Mean and Variance
The expected value or mean of a discrete random variable X with
sample space {x1 , . . . , xk } is
X
µX = E [X ] =
xi P(xi )
i
The variance is
σX2 = Var (X ) = E [(X − µX )2 ] =
X
(xi − µX )2 P(xi )
i
and standard deviation is the square root of the variance
Karim Naguib (BU)
Chapter 2
September 5, 2013
4 / 18
Mean and Variance of a Linear Function
Consider a linear function
Y = a + bX
where a and b and fixed variables.
µY = E [Y ] = a + bE [X ] = a + bµx
σY2 = Var (Y ) = b2 Var (X ) = b2 σX2
Karim Naguib (BU)
Chapter 2
September 5, 2013
5 / 18
Joint and Marginal Distributions
Let X and Y be two discrete random variables
The joint probability distribution of X and Y is
Pr (X = x, Y = y )
which is the probability of X = x and Y = y .
Karim Naguib (BU)
Chapter 2
September 5, 2013
6 / 18
Joint and Marginal Distributions
Let X and Y be two discrete random variables
The joint probability distribution of X and Y is
Pr (X = x, Y = y )
which is the probability of X = x and Y = y .
Given the joint probability Pr (X = x, Y = y ) the marginal
probability distribution of Y is
X
Pr (Y = y ) =
Pr (X = xi , Y = y )
i
Karim Naguib (BU)
Chapter 2
September 5, 2013
6 / 18
Conditional Distributions
Let X and Y be two discrete random variables
The probability of Y = y conditional on X = x is
Pr (Y = y |X = x) =
Karim Naguib (BU)
Chapter 2
Pr (X = x, Y = y )
Pr (X = x)
September 5, 2013
7 / 18
Conditional Distributions
Let X and Y be two discrete random variables
The probability of Y = y conditional on X = x is
Pr (Y = y |X = x) =
Pr (X = x, Y = y )
Pr (X = x)
The conditional expectation (or mean) of Y given X is
X
E [Y |X = x] =
yi Pr (Y = yi |X = x)
i
Karim Naguib (BU)
Chapter 2
September 5, 2013
7 / 18
Conditional Distributions
Let X and Y be two discrete random variables
The probability of Y = y conditional on X = x is
Pr (Y = y |X = x) =
Pr (X = x, Y = y )
Pr (X = x)
The conditional expectation (or mean) of Y given X is
X
E [Y |X = x] =
yi Pr (Y = yi |X = x)
i
The law of iterated expectations states
X
E [Y ] =
E [Y |X = xi ]Pr (X = xi ) = E [E [Y |X ]]
i
Karim Naguib (BU)
Chapter 2
September 5, 2013
7 / 18
Conditional Distributions
Let X and Y be two discrete random variables
The probability of Y = y conditional on X = x is
Pr (Y = y |X = x) =
Pr (X = x, Y = y )
Pr (X = x)
The conditional expectation (or mean) of Y given X is
X
E [Y |X = x] =
yi Pr (Y = yi |X = x)
i
The law of iterated expectations states
X
E [Y ] =
E [Y |X = xi ]Pr (X = xi ) = E [E [Y |X ]]
i
The conditional variance of Y given X is
X
Var (Y |X = x) =
[yi − E (Y |X = x)]2 Pr (Y = yi |X = x)
i
Karim Naguib (BU)
Chapter 2
September 5, 2013
7 / 18
Independence
Definition
Two random variables X and Y are independent if and only if
Pr (Y = y |X = x) = Pr (Y = y )
or
Pr (X = x, Y = y ) = Pr (X = x)Pr (Y = y )
Karim Naguib (BU)
Chapter 2
September 5, 2013
8 / 18
Covariance and Correlation
Definition
The covariance between two random variables is
Cov (X , Y ) = σXY = E [(X − µX )(Y − µY )]
XX
=
(xi − µX )(yj − µY )Pr (X = xi , Y = yj )
i
j
and the correlation between them is
Corr (X , Y ) = p
Karim Naguib (BU)
Cov (X , Y )
Var (X )Var (Y )
Chapter 2
=
σXY
σX σY
September 5, 2013
9 / 18
Mean and Variance of Sums of Random Variables
Let X and Y be two random variables
The mean of their sum is
E [X + Y ] = E [X ] + E [Y ] = µX + µY
The variance of their sum is
Var (X + Y ) = Var (X ) + Var (Y ) + 2Ċ ov (X , Y ) = σX2 + σY2 + 2σXY
Karim Naguib (BU)
Chapter 2
September 5, 2013
10 / 18
Introduction to Random Sampling and Sample Distribution
A population is the set of all possible entities (e.g. individuals, firms)
that are of interest. Typically, we assume that their number N
approaches infinity
The statistical procedures we are interested in learning involve making
inference about the population using a sample collected by simple
random sampling.
A sample is composed of n observations denoted as X1 , X2 , . . . , Xn ,
which we model as collection of n random variables
We often assume that the sample random variables X1 , . . . , Xn are
independent and are drawn from an identical distribution—they are
said to be independently and identically distributed (i.i.d.).
Karim Naguib (BU)
Chapter 2
September 5, 2013
11 / 18
Sampling Distribution of the Sample Mean
The sample mean (which is a random variable) is
X̄ =
1X
Xi
n
i
Since X̄ is a random variable it has its own sampling distribution
I
The mean of X̄ is
E [X̄ ] =
I
1X
E [Xi ] = µX
n
The variance of X̄ is
Var (X̄ ) =
1 X
1 XX
σX2
Var
(X
)
+
Cov
(X
,
X
)
=
i
i
j
n2
n2
n
i
If Xi , . . . , Xn are i.i.d. from N(µX , σX2 ) then X̄ has the finite sample
distribution
X̄ ∼ N(µx , σX2 /n)
Karim Naguib (BU)
Chapter 2
September 5, 2013
12 / 18
Large Sample Approximations to Sampling Distributions
Unfortunately unless Xi are normally distributed it is difficult for us to
characterize the finite sample distribution of X̄ .
We rely on its asymptotic distribution which approximate the exact
distribution as n → ∞.
Karim Naguib (BU)
Chapter 2
September 5, 2013
13 / 18
Law of Large Numbers and Consistency
Definition
The law of large numbers states that as n → ∞, X̄ will approach
µX .
This property is referred to as consistency
Karim Naguib (BU)
Chapter 2
September 5, 2013
14 / 18
LLN Example
Suppose Y takes on 0 and 1 (a Bernoulli random variable) with probability
Pr (Y = 1) = p = 0.78 and Pr (Y = 0) = 1 − p = 0.22
Then
E [Y ] = p × 1 + (1 − p) × 0 = p = 0.78
Var (Y ) = p(1 − p) = 0.1716
Consider a sample with n = 2
Pr (Ȳ = 0) = 0.222 = 0.0484
Pr (Ȳ = 0.5) = 2 × 0.22 × 0.78 = 0.3432
Pr (Ȳ = 1) = 0.782 = 0.6084
Karim Naguib (BU)
Chapter 2
September 5, 2013
15 / 18
Karim Naguib (BU)
Chapter 2
September 5, 2013
16 / 18
Central Limit Theorem
Definition
The central limit theorem states that the distribution of X̄ when X does
not have a normal distribution is approximately N(µX , σX2 /n).
Karim Naguib (BU)
Chapter 2
September 5, 2013
17 / 18
Karim Naguib (BU)
Chapter 2
September 5, 2013
18 / 18