Some Essentials of Probability

Some Essentials of Probability
Introduction to Econometrics
Spring 2012
Ken Simons
Key Concepts from Probability
1.
2.
3.
4.
5.
6.
Probabilities and distributions
Expected values; variances, moments
Multiple random variables
The normal and other distributions
Sample averages
Large sample distribution
1. Random Variables
• Random variables vary randomly
– Multiple outcomes possible
• Outcomes are finest-grain (indivisible)
– An event is one or more outcomes
– The sample space is all possible outcomes
• Probability of each outcome is its chance of happening
(with many trials, proportion of times it occurs)
• Types of random variables:
– Discrete: e.g. Heads, Tails; or 0, 1, 2, 3, …
– Continuous: any values in a range, e.g. [0,1] any # from 0 to 1
Probability Distributions
• For a discrete random variable,
– Probability distribution gives probability for each
outcome
!0.2 if x = 0
p(x) = "
#0.8 if x = 1
– X is called a Bernoulli random variable: outcomes are 0 and 1
– p(x) is the Bernoulli distribution: tells probability of each outcome
• E.g.,
• Or, a table listing all outcomes and their probabilities
• Note that the probabilities add to 1
– Cumulative distribution function (c.d.f.) is probability
that X≤x
Probability Density Functions
• For a continuous random variable,
– Probability density function gives probability
for each outcome
2
*
• E.g.,
ƒ(x) =
1
!X
1 $ x # µx '
exp , # &
/
)
2"
,+ 2 % ! x ( /.
– X is called a normal random variable
– ƒ(x) is the normal distribution
"
• Total probability is 1, because #!" ƒ(x)dx = 1
– Cumulative distribution function (c.d.f.) is
probability that X≤x
The next two slides
each present one half
of Figure 2.2.
2. Moments
E(X) Expected value or mean of X is average outcome (in infinitely many trials)
! 2X " Var(X) Variance of X is E[(X-E(X))2 ]; a measure of how much x varies
! X Standard deviation of X is Var(X); a "typical" deviation from mean
More generally: E(X r ) r th moment of X is average value of X r
E[(X-E(X))r ] r th central moment of X, average value of (X-E(X))r
Discrete r.v.
Continuous r.v.
k
$
E(X)=# x i pi
E(X)= % xƒ(x)dx
! 2X = # (x i & E(X))2 pi
! 2X =
! X = Var(X)
! X = Var(X)
E(X ) = # x 2i pi
E(X 2 ) =
i=1
k
i=1
2
k
-$
i=1
%
$
-$
(x & E(X))2 ƒ(x)dx
%
$
-$
x 2 ƒ(x)dx
If Y=a+bX, where a and b are constants, then
E(Y)=a+bE(X),
! 2Y =b2 ! 2X
If X = winnings from: flip a coin,
get $4 if heads, $0 if tails 2
Another
way to compute ! X :
What are:
2
2
2), E[(X-E(X))
2]
E(X),! Var(X),
= E(Xσ2X),&E(X
E(X)
X
3. Multiple Random Variables
• Joint probability distribution for outcomes of 2 or
more random variables together
– E.g., temperature (X) and precipitation (Y)
Y = 0 (Dry) Y = 1 (Snow /Rain)
X = 0 (Warm)
0.2
0.1
• E.g., probabilities:
X = 1 (Cold)
0.4
0.3
– Marginal probability distribution is then another name
for probability distribution of one of the variables
• E.g., Pr(X=0)=0.3, Pr(X=1)=0.7
!
• In general: Pr(Y = y) = ! Pr(X = x i ,Y = y)
• Continuous
i=1
• as discrete, but (a) substitute “p.d.f.” for “probability
distribution,” (b) Pr(Y = y) = " Pr(X = x,Y = y)dx
#
!"
Conditional distributions
• With 2 random variables, X and Y
– If know X takes on a particular value, x, but don’t
know the value of Y
• Conditional distribution of Y given X=x
then tells the probability of each outcome for Y
•
– Pr(Y = y | X = x) =
•
Pr(X = x,Y = y)
Pr(X = x)
• Conditional expectation (or variance, etc.) of Y given X=x
then tells the expectationk (etc.) for Y
– E.g., E(Y | X = x) = ! y i Pr(Y = y i | X = x)
i=1
– Example:
Y = 0 (Dry) Y = 1 (Snow /Rain) What are:
Pr(Y=0 | X=0)
X = 0 (Warm)
0.2
0.1
Pr(Y=1 | X=0)
X = 1 (Cold)
0.4
0.3
E(Y | X=0)
Var(Y | X=0)
σY| X=0
E(Y2 | X=0)
Independence
• Independent random variables:
– Value taken by 1 variable is unrelated to the
value taken by another variable
– Mathematically:
• Pr(Y=y | X=x) = Pr(Y=y), or
• Pr(X=x,Y=y) = Pr(X=x) × Pr(Y=y)
Covariance and Correlation
Covariance : a measure of how much 2 variables vary together
Cov(X,Y) = ! XY = E[(X " µ X )(Y " µ Y )]
where µ X = E(X), µ Y = E(Y)
k
!
= # # (x j " µ X )(y i " µ Y )Pr(X = x j,Y = y i )
i=1 j=1
Correlation : a similar, but unit - free, measure
Cov(X,Y)
!
corr(X,Y) =
= XY
Var(X) $ Var(Y) ! X ! Y
this is a number from -1 to +1
X and Y are uncorrelated if corr(X,Y) = 0
If X and Y are independent, then Cov(X,Y) = 0 and corr(X,Y) = 0
0
20
CI
40
60
Investment % of GDP versus GDP, 1985, Different Nations
0
5000
10000
RGDPL
Estimated Correlation: +0.50
15000
20000
0
20
CI
40
60
Investment % of GDP versus Consumption % of GDP, 1985
20
40
60
CC
Estimated Correlation: -0.53
80
100
20
40
CC
60
80
100
Consumption % of GDP vs. Population, 1985
0
200000
400000
600000
POP
Estimated Correlation: -0.05
800000
1000000
Key Concept 2.3
4. The Normal and Other
Distributions
Four PDFs We May Use & Their Relationships
Normal with mean µ and variance ! 2 : x ~ N(µ,! 2 ), ƒ x (x 0 ) =
2
2
1
e #[x 0 #µ ] / 2!
2"!
Standard normal has mean 0 and variance 1: x ~ N(0,1)
x-µ
Standardization : if x ~ N(µ,! 2 ), then
~ N(0,1)
!
Multiple independent standard normal random variables : x ~ N(0,In )
#1
+% m
(
#m / 2 (m / 2)#1 #x 0 / 2
x0
e
if x 0 > 0
Chi - square with m degrees of freedom : x ~ $ 2 (m), ƒ x (x 0 ) = , '&( 2 #1)!*) 2
-.0
otherwise
Note E(x)=m, Var(x)=2m
0 m + 13
/2
5 0 x 2 3 #(n +1)/ 2
1 2 4 1+
Student' s t with m degrees of freedom : x ~ t(m), ƒ x (x 0 ) =
(n")1/ 2 /(m/2) 21
m 54
/[(m + n) /2]mm / 2n n / 2
x(m / 2)#1
F with m and n degrees of freedom : x ~ F(m,n), ƒ x (x 0 ) =
6
/(m/2) /(n /2)
(mx + n)(m +n )/ 2
Normal Probability Density Function
for a Single Variable
Relationships between the Four PDFs
Weighted sum of independent normal random variables is normal:
ax+by ~ N(aµ x + bµ y , a 2 ! 2x + b 2 ! 2y )
if x ~ N(µ x , ! 2x ), y ~ N(µ y , ! 2y ) (remember, covariance is 0 from independence)
Sum of squares of m independent standard normal random variables is " 2 (m):
x 2 +y 2 +z 2 +!~" 2 (m)
if x, y, z, ..., are independent and standard normal
If x ~ N(0,1) and y ~ " 2 (m), with x and y independent :
x / y / m ~t(m)
If x ~ " 2 (m) and y ~ " 2 (n), with x and y independent :
x/m
~F(m,n)
y/n
5. Sample Averages
Collect data from n entities, Y1 , Y2 , ..., Yn
Suppose Y1 , Y2 , ..., Yn are independent
and that they all have the same distribution
(called independent and identically distributed, i.i.d.).
Sample average:
1
1 n
Y = (Y1 + Y2 + ! + Yn ) = ! Yi
n
n i=1
Since each Yi is a random number, so is Y.
Questions :
What is E(Y)?
What is Var(Y)?
What is " Y ?
If Yi ~N(µ Y ," 2Y ), what precisely is the pdf of Y ?
H int s :
Use the relations in Key Concept 2.3.
1
It might help to think of n=2, so Y = (Y1 + Y2 ), first.
2
Recall that the weighted sum of independent normal random variables is normal.
answers
1
1 n
Y = (Y1 + Y2 + ! + Yn ) = ! Yi
n
n i=1
1 n
1 n
E(Y)= ! E(Yi ) = ! µ Y = µ Y where µ Y = E(Yi ) for all i
n i=1
n i=1
"1 n %
" n 1 %
"1
%
1
Var(Y) = Var $ ! Yi ' = Var $ ! Yi ' = Var $ Y1 + Y2 + !'
n
#n
&
# n i=1 &
# i=1 n &
1
1
1
Use what formulas
Var(Y
)
+
Var(Y
)
+
!
+
2
Cov(Y
,
Y
)
+
!
1
2
1
2
in key concept 2.3?
n2
n2
n2
1
1
Covariances zero
= 2 Var(Y1 ) + 2 Var(Y2 ) + ! + 0
for independent r.v.’s
n
n
1
1
= 2 ( 2Y + 2 ( 2Y + ! where ( 2Y = Var(Yi ) for all i
n
n
" 1 2 % ( 2Y
=n $ 2 ( Y ' =
n
#n
&
=
( Y = ( 2Y / n
If Yi ~N(µ Y ,( 2Y ), since the sum of normal random variables is
a normal random variable, Y is a normal random variable.
The mean of Y is µ Y , and the variance of Y is ( 2Y / n.
Hence: Y~N(µ Y ,( 2Y / n).
6. Large Sample Distribution
What if the Yi variables are not normally distributed? Just
assume they are collected by random sampling (this ensures
they are i.i.d.) and have a finite variance.
Law of Large Numbers: as n ! ", Y converges in probability
to the mean µ Y . I.e., the probability of any difference between
Y and µ Y becomes zero.
To be precise: lim Pr ob(| Y-µ Y | > #) = 0 for every #>0.
n!"
Hence Y is called consistent for µ Y .
Central Limit Theorem: as n ! ", Y converges in distribution
to the normal distribution; specifically,
to the distribution N(µ Y ,$ 2Y ).
n(Y-µ Y ) converges
Gaussian Large Sample Distribution
• This central limit theorem is used repeatedly
• Tells typical dispersion in estimates we make
– Like here, the estimates will be averages (albeit special types of
averages)
• This works even if we don’t know true distributions of the
variables being averaged
– If we do know, can improve knowledge of dispersion
– Most texts focus on improved formulae assuming true distributions
are normal; our text avoids the assumption and considers large
samples only
You have learned (or reviewed):
1.
2.
3.
4.
5.
6.
Probabilities and distributions
Expected values; variances, moments
Multiple random variables
Normal and other distributions we’ll use
Sample averages & their distribution
Large sample distribution
What’s Next
• Do assignment for chapter 2 (“assignment ch2”)
– Due Thursday Feb. 2nd
– It’s long, so start now
• Next Monday Jan. 30th, lab session using Stata
• Thursday Feb. 2nd, start chapter 3: statistics