Everything I need to know about Probability

Probability & Statistics Refresher
Everything I need to know about Probability and Statistics in one easy lecture
1. Introduction to random variables
A. What is a random variable?
A random variable can be viewed as the name of an experiment with a probabilistic outcome. It’s value is the
outcome of that experiment.
Tom Mitchell, 1997
A discrete random variable can assume only a countable number of values.
•Example:
Coin flip (heads/tails), Let’s make a deal (Door 1, 2 or 3), Die (1, 2, 3, 4, 5 or 6).
• A real (or continuous) random variable can assume a range of values, x a b .
Example: most sensor readings.
[Note: We’ll be dealing a lot with both types of random variables.]
B. Why should we care about random variables?
• Every sensor reading is a random variable (e.g. thermal noise, etc.).
• Many things in this world can be appropriately viewed as random events (e.g. your instructors’s choice of
marker colors, start time of lecture).
• There is some degree of uncertainty in almost everything we do.
C. Some synonyms for “random”
• stochastic
• nondeterministic
2. Discrete random variables
A. Notation
and A A A denote discrete random variables.
a "a denote possible values for corresponding discrete random variables
• Let a b c and a !
# A%B%C % $ and # A ' # A ' # A ' & , respectively. Hence, (P) A = a* denotes the probability that the random
variable # A assumes the values a .
#
• In the special case when the random variable A can assume only two values, (T)rue or (F)alse, denote
P+ A , as the probability that the random variable A is true, and denote . A or A as the probability that
the
random variable A is false.
0 1 3 2 4 6 7 8
9
• Let P A / B = P A B = P A 5 B = P A and B .
; < > = ? @ A
• Let P A : B = P A B = P A or B .
B
C
• Let P A = a B = b denote the conditional probability that A = a given that B = b . Often we use the
shorthand PD A BE when A and B are true/false random variables.
• Let A B C
1
1
1
2
2
2
3
3
3
B. Axioms of probability
F0 J PGIA H J 1
(1)
-1-
Probability & Statistics Refresher
(2)
(PQ True R = 1
(PS False T = F0
(PV A U BW = (YPAX-Z a+ (YP[]B\ – (P_ A ^ B`
(3)
C. Conditional probability
e d Bf (Yi]j k F
(Pb A Bc = (PA
----------------------- , P B 0
Pg]
Bh
From (5),
m n o p B r = Pt B Au PvIA w
P A l B = P A B Pq s
(4)
(5)
(Chain rule)
z { |I}
(Y~]
Equation (6) leads to Bayes Theorem:
x y
P B A P A
P A B = -------------------------------P B
KAssume that precisely one of # A # A # A mutually exclusive events can occur. Then,
(P A # A # A = (PA = 1
1
2
n
n
1
2
n
i
Land the general form of Bayes Theorem is given by,
B A PA
(P A B = P---------------------------------(YP]B -
(6)
(7)
(8)
i=1
i
Mwhere,
(YP]B
i
i
( (

=
PB A P A
(10)
n
j
(9)
j
ND. Other laws of probability
¡ ¢
• P A = 1 – P A
OEasy to prove. From (4),
¥
¦ A ¨ + Pª © A « – P® A ¬ A ¯
P A ¤ £ A = P§²
³ F
P A ± ° A = 0 (something can’t be true and false at the same time).
¶
·
P A µ ´ A = 1 (binary random variable has to be either true or false).
» ¼
1 = P¸ A¹ + P º A
À Á
P½ A¾ = 1 – P ¿ A
Æ
Ç É Ê
• PÂAÃ = P A Å Ä B + P A È B
Ì
Í Ï
Ð Ñ Ò CÔ
• P A Ë B Ë C = P A B Î C P B C PÓÕ
j=1
(11)
(12)
(14)
(13)
(15)
E. Independence
# A and B are independent if and only if PÖ A = a× = (PØ A = a B = bÙ , Ú aÜ Ûb . From this definition, we can
Peasily
derive the following properties of independent events:
(16)
ß à
PÝ]
BÞ = P B A
-2-
Probability & Statistics Refresher
âF.
(17)
(Pí A ì Bî = (YPAï-ð (YPñsBò
Example: ácoin toss
# A and B are conditionally independent given C if and only if
(Põ A = a B = b ó C = ôcö = (P÷ A = a C = cø , ù aúÛbú c
Example: ãTraveling salesman
Conditional independence
(18)
#A
a
B
Ûb
ôc
C
Figure 1
ä3.
Probability distributions
åFor a given discrete random variable û X , the probability distribution for X assigns a probability for each possible value or outcome of û X ,
(Pü X = xý = ÿpþ x , x
(19)
åFor n random variables, X û X û X , the joint probability distribution assigns a probability for all possiæble combinations of values,
(P X = x X = x X = x = p x x x , x x x
(20)
Example: çIf each random variable can assume one of k different values, then the joint probability distribution for n different random variables is fully specified by k values (mention Bayes nets).
èGiven two discrete random variables û X Y , the univariate probability distribution px is related to the joint
éprobability distribution px y by,
(21)
px = # px y!
A. Discrete random variables
1
1
1
2
2
2
n
n
n
1
2
n
1
2
n
n
"y
B. Real random variables
$ % ' & x ( , – ) + x + *
, - 2 /10
.
3
F x 1 F x 2 if x 1 x 2
5 6 = F0 , F 8:7 9 = 1
F –4
;
< F
P X=x = 0
A real-valued random variable X is characterized by a continuous probability distribution function,
(23)
(22)
F x = P X
(24)
êOften, however, the distribution is expressed as a probability density function (pdf
ë) p=x> ,
-3-
(25)
Probability & Statistics Refresher
IJ
pGxH = dF
dxx - ,
-------------pKxL M F0 , N x
S R pOx P d x = 1
Q
(26)
(27)
(28)
–
From (26),
TVF UxW
=
^ ] x pXZt Y d[t
–\
(29)
åFor n real-valued random variables, X ` û X `_ ` û X , the joint probability density function is given by,
pc x b x ba b x d e F0 , f x h x hg h x
(30)
v t i r pl x k
(31)
x kj k x m d x o d x on o d x = 1
u s qp
?Note: xspwxyq and px| {
x {z { x } are often referred to as the @likelihood of x and x x ~ x , respectively.
èGiven two continuous random v ariables û X Y , the univariate probability distribution px is related to the
A joint

probability distribution px y by,
(32)
px = px y d y
1
1
–
2
n
–
1
1
–
1
2
2
n
2
2
n
n
1
2
n
n
1
2
n

C. Normal (Gaussian) distribution
C
B
important continuous probability distribution is the Normal, or Gaussian, distribution. It is defined
æbyA very
the following probability density function (pdf),
B ¡
(33)
1 P
– x –
px = ----------------------------------- = N x
exp

–
2
2

2
2
£ . In multiple dimensions,
² µ x – ´ ¶ = B N »
1
·¹ ¼
p¤¦x¥ = ¨--------------------------------¬ ® - Pexp – 1--2- ± °
x ¹º
x – ¯ T ³ – 1 °
2§ © ª d «
2 1/2
˜
˜
˜ ¸
Mwhere ½ and x are d -dimensional vectors, and ¾ is a Zd ¿ d positive-definite matrix.
for specific values of the scalar parameters
¢
2
and
ÀvT Mv Á F0 Â Àv Ã F0
[Note: A square matrix M is positive definite if and only if,
Ä
OEquation (35) also implies that all the eigenvalues Å F0 and that ÆM is invertible.]
(34)
(35)
Example: DBelow, the Normal distribution is plotted in one and two dimensions.
i
4. Expected value, variance, standard deviation
ãThe Eexpected value for a discrete random variable is given by,
ÇÉE ÈºX Ê = Í xpËxÌ
]
The Fmean x = Î for a random variable X is given by,
A. Expected value for a discrete random variable
x
n
-4-
(36)
Probability & Statistics Refresher
0.4
0.3
0.2
0.15
0.1
0.1
0
×Ô
ØÕ
3
Ú
2
ÙÖ
2
0.05
0
FÛ
= 0Ü
1
0
1
= 1
2
3
áFigure 2
ÓÑ
F à
FßÞ
0
2
Ò
0
2
2
Ý
= 0
0
˜
F
= 10
01

(37)
x = 1-n- â x
Mæwhere x denotes the Ïith measurement of X . The mean and expected value for a random variable are related
by,
(38)
E ãº
X ä = lim
æ å x
n
n
i
i=1
i
n
n
ç
ìîí
ï ð Ïi .
ó
õ ÷ ø
ú 6 + ü51þ ý
E ñé
A ò = 1 ô 1 6 + 2 ö 1 6 + 3 ù 1 6 + 41û
6
÷
1 Ï
i = 3.5
E A = ÿ-6
ê
ë
Example: Let A denote a random variable which denotes the outcome of a die roll. Find E A .
Assume a fair die. then P A = i = p i = 1 6 ,
ÿ 6 + 61
é
(39)
6
(40)
i=1
The term “expected value” can be misleading, since the expected value of a random variable may not
Pe[Note:
ven be a possible value for the random variable itself.]
Example: You shake two 6-sided dice. What is the expected value of the lower of the two numbers ( A )?
(P A = ÿ6 = (P D = ÿ6 (P
D = ÿ6 = 1 36
( P A = ü 5 = ( P D = ü 5 (P D = ü5 a + ( P D = ü5 ( P D = ÿ 6 + ( P D = ÿ 6 ( P D = ü 5 Define two random variables D 1 and D 2 , which define the outcome of the die roll of dice 1 and 2.
1
2
(P A = ü5 = ÷3 36
2
1
1
2
ÐSimilarly,
(P A = i = % ! 7 – Ïi" a+ # ÿ6 – Ïi$ & ' ÷36 = ( 13 – 2iÏ ) * ÷36
Therefore,
Ç,E +A-
. 13 – 2iÏ / 191
0 Ïi----------------------÷36 = ÷-36----- = 2.52778
2
(42)
(43)
(44)
(45)
6
=
1
(41)
B. Expected value for a continuous random variable
ãThe Eexpected value for a continuous random variable is given by,
: 9 576 E 24
X 3 = 8 xp x d x
i=1
–
-5-
(46)
è
Probability & Statistics Refresher
Example: Ç,E <4X =
for the Normal distribution is
>
.
C. Variance and standard deviation
ãThe ;variance V ?4X @ of a random variable û X is defined by,
ACB G E D F H
Mwhere I = ÇKE JCX L . The standard deviation M
N = V O4X P
Because of (48), we often denote V Q 4
XR = S
V X = E X–
(47)
û
2
of a random variable X is defined by,
(48)
T
2.
Example: The variance of the Normal distribution is
2.
D. Covariance of two random variables
ô
U
(49)
W
X
_
Z
[
]
^
`
Ç
Y
Cov X V Y = E X – ] Y – \ "
Mwhere a ] = E b4X c and d " = E egY f . If X and Y are independent, then,
(50)
i j F
Cov X h Y = 0 k
n Ïml o
Example: The matrix
in the multivariate Normal distribution is the ôcovariance matrix. The i j
p
Pelement rtq s of the covariance matrix is given by,
(51)
u wyv x = ÇE { X – z | û X~– }
Mwhere is the Ïith element of the vector, and X is the Ïith random variable in the joint Normal distribution.
The covariance of two random variables X Y is defined by,
x
x
y
y
i j
i j
i
i
j
j
i
i
Consider, for example, the two plots below.
3
3
2
2
1
1

1
F
F
= 0
0
˜
2

3
3

1
2

3
1
0
1
2
3
F
= 10
01
Figure 3
3

1
÷
0

1
2
3
4
= -- 1 1 2
3 1 2 1

E. Properties of the expected value operator

• The expected value of a function g X is given by,
= ¦ g¡7x¢ ¤px£7¥
E g X
]x
(discrete random variable)
-6-
(52)
Probability & Statistics Refresher
© §¨ ª = ± ¯ g«7x¬ ¤px7® d x (continuous random variable)
°
• Linearity: Let ³
a ² b be constants and let X ´ Y be random variables. Then,
º µ¶ Û ·¹¸ »
¾ ¼½ ¿ Û Â À¹Á Ã
E af X + bg Y = aE f X + bE g Y
E g X
âF.
(53)
–
(54)
Consider a set of independent, identically distributed random variables û X Å û X ÅÆÄ Å û X governed by an arbi[trary probability distribution with mean Ç and finite variance È . Also, define the sample mean,

É
(55)
1
Ê
X
-n- X
Ì
Then, as n Ë , the distribution governing X approaches a Normal distribution with,
(56)
mean Í and
(57)
standard deviation ------Î -
Central limit theorem
2
1
2
n
n
n
i
i=1
n
n
This theorem often justifies the use of the Normal distribution in real life for large n .
-7-

Download Report

Everything I need to know about Probability

Paperzz.com

Your Paperzz