Bernoulli, Binomial, Trinomial, Multinomial Bernoulli, Binomial

Bernoulli, Binomial, Trinomial, Multinomial
Bernoulli, Binomial, Trinomial, Multinomial
Binomial
X # Successes in n Bernoulli trials (i.e., counts in 1 of 2 allelic/genotypic categories)
Bernoulli RV
X 0,1
X 0, n
P( X 1) p
n
f ( x) P( X x) p x (1 p) n x
x
n!
p x (1 p) n x
x !(n x)!
f ( x) p x (1 p)1 x
1
x E ( X ) x f ( x)
x E ( X ) np
x 0
Var ( X ) E X x 2
x
2
x2 Var ( X ) np(1 p)
• Note: X/n is the observed sample proportion, p̂ (which is the MLE for p)
Bernoulli, Binomial, Trinomial, Multinomial
f ( x1, x2 , x3 ) P( X 1 x1, X 2 x2 , X 3 x3 ) f ( x1, x2 ) Bernoulli, Binomial, Trinomial, Multinomial
n!
p x1 p x2 p x3
x1 ! x2 ! x3 ! 1 2 3
n!
p x1 p x2 (1 p1 p 2 )n x1 x2
x1 ! x2 !(n x1 x2 )! 1 2
f ( x1, x2 )
x1
x2
Multinomial (m categories)
x1 ... xm n
Trinomial (3 categories)
xAA x Aa xaa n
p1 ... pm 1
p AA p Aa paa 1
f ( x1 ,..., xm ) n!
p1x1
x1 ! xm !
p mxm
f ( xAA , x Aa , xaa ) n!
p AAxAA p AaxAa p aaxaa
xAA ! xAa ! xaa !
f (n1 ,..., nm ) n!
p1n1
n1 ! nm !
p mnm
f (nAA , nAa , naa ) n!
p nAA p nAa p naa
nAA !nAa !naa ! AA Aa aa
Marginal Distributions
•Since marginal counts for each cell (vs. “other”) are Binomial, we have
n!
f ( xAA , xAa , xaa ) p AAxAA p AaxAa p aaxaa
xAA ! xAa ! xaa !
•Summing over one (or more) of the categories gives a marginal distribution
for the remaining count(s).
•For the trinomial, collapsing two categories gives the Binomial Distribution –
with category i versus “other”.
f ( xAA , x Aa aa ) E ( X i ) npi
Var ( X i ) npi (1 pi )
X p (1 pi )
Var ( pˆ i ) Var i i
n
n
n!
p AAxAA p AaxAaaaaa
xAA ! xAa aa !
n!
p xAA 1 p AA
xAA !(n xAA )! AA
n x AA
Maximum Likelihood Estimation (Binomial)
f ( x) Multinomial Moments (Expected Cell Counts & Var)
n!
p x (1 p)n x
x !(n x)!
L( p ) Æ
n!
p x (1 p)n x
x !(n x)!
•The maximum likelihood estimate (MLE) is the value of the parameter that
maximizes the likelihood (or the log-likelihood - which is easier to work with).
l ( p) ln L( p) ln n! ln[ x!(n x)!] x ln[ p] (n x)ln[1 p]
d l ( p ) x (n x)
0
dp
p
1 p
Æ
pˆ x is the MLE for p
n
Maximum Likelihood Estimation (Trinomial)
f ( x1 , x2 , x3 | p1 , p2 , p3 ) n!
p1x1 p 2x2 p3x3
x1 ! x2 ! x3 !
Æ L( p1 , p2 , p3 ) n!
p1x1 p 2x2 p3x3
x1 ! x2 ! x3 !
•In order to maximize the (log-) likelihood, we need to take into account the
constraint on the frequencies (p1+ p2+ p3=1) and use Lagrange multipliers
Æ maximize the Lagrangian function H(p1,p2,p3, λ),
which incorporates the constraint
H ( p1 , p2 , p3 , ) ln L( p) 1 pi i H
H
H
H
0,
0,
0,
0
p1
p2
p3
Æ a set of equations to solve for pˆ i
Æ
pˆ i xi
n
Taylor Series Expansion of a function f(x)
The Delta Method for var[f(X)]
•It is based on Taylor series expansions for the mean and variance of a
function of a Random Variable.
•It is necessary because there is no general theoretical expression for the
variance of most functions of a mean (e.g., the inverse of a mean, or the ratio
of two means) and other statistics.
•Estimators for many genetic parameters involve ratios of multinomial
frequencies, meaning that exact expressions for variances can not be found.
The Delta Method for var[f(X)]
(T )
y f ( x ) f ( x ) x x E (Y ) f ( x )
2
x x 2!
2
(T )
f ( x) f (a) f (a) x a f (a)
2
x a 2!
(T )
f ( x) f (a) f (a) x a •The Delta Method uses a TS expansion with a=μx , the mean of X
(T )
y f ( x) f ( x ) f ( x ) x x The Delta Method for var[f(X)]
f (n)
n
(x ) x x n!
E f ( X ) E f ( x ) f ( x )( x x ) H .O.T .
Var (Y ) Var f ( X ) E Y E (Y ) •Taylor Series expansion of f(·) about the value x=a
Var (Y ) Var f ( X ) f ( x ) Var ( X )
2
The Delta Method for var[T=f(X1, X2,…)]
The Delta Method
•For a function of a single variable, we use a TS expansion about x=μx , the
mean of the RV X
•For 2 dimensions [T=f(x1,x2)]
(T )
y f ( x) f ( x ) f ( x ) x x 2
to get
Var (Y ) Var f ( X ) f ( x ) Var ( X )
2
•For k dimensions
Let T f ( x1 ,
•For T=f(x1,x2), we use a TS expansion about the point (x,y)=(μx1 , μx2)
(T )
T f ( x1 , x2 ) f ( x1 , x2 ) f
f
x1 x1 x2 x2
x1
x2
to get
2
2
f f f f
Var (T ) Var ( X 1 ) Cov( X 1 , X 2 )
Var ( X 2 ) 2
x1 x2
x1 x2 , xk )
T
Var (T ) i xi
2
T T
Cov( xi , x j )
Var ( xi ) i j i xi x j
2
f f f f
Var (T ) Var ( X 1 ) Cov( X 1 , X 2 )
Var ( X 2 ) 2
x1 x2
x1 x2 Multinomial Covariances
The Delta Method for var[f(X,Y)]
•The covariance of counts (or proportions) is the expected value of the product
of deviations of the counts from their means:
Cov( X i , X j ) E ( X i E ( X i ) ( X j E ( X j ) E( X i , X j ) E( X i )E( X j )
npi p j
1
Cov( pˆ i , pˆ j ) pi p j
n
Y
Var X
y y
1
(Y y ) Var 2 ( X x ) x
x
x