CSE 151 Machine Learning

CSE 151 Machine Learning
Instructor: Kamalika Chaudhuri
Probability Review
Probabilistic Events and Outcomes
Sample space: set of all possible outcomes of an experiment
Event: subspace of a sample space
Example:
Toss a coin two times. Sample space is {HH, HT, TH, TT}
A = {HH, HT} is the event that toss 1 is a H
Measure the temperature. Sample space is (−∞, ∞)
A = [32, 100] is the event that the temperature lies between
32 and 100 degrees F
Probability
We assign a number P(A) to all events A in a sample space Ω s.t.
Axiom 1. P (A) ≥ 0, for all A
Axiom 2. P (Ω) = 1
Axiom 3. If A1 , A2 , . . . are disjoint, then
P (∪i Ai ) =
�
P (Ai )
i
Such a function P(A) is called a probability distribution
Example:
Two tosses of a coin. P(HH) = P(HT) = P(TH) = P(TT) = 0.25
Independence and Conditional
Probability
Two events A and B are independent if P (A ∩ B) = P (A)P (B)
A set of events A1,..,Ak are independent if
P (∩ki=1 Ai ) =
k
�
P (Ai )
i=1
If P(B) > 0 then the conditional probability of event A given B is
P (A ∩ B)
P (A|B) =
P (B)
If A, B are independent, what is P(A|B) ?
Random Variables
A random variable X is a mapping that assigns a real number X(ω)
to each outcome ω in a probability space
X
Example:
Flip a coin two times, and let X be the number of heads
Measure the temperature two times and let X be the sum of
the measurements
We will look at discrete and continuous random variables
Cumulative Distribution
Functions
A cumulative distribution function (cdf) of a random variable X
is a function FX defined by:
FX (x) = P (X ≤ x)
Example:
X is the number of heads in two tosses of a fair coin
1
cdf
0.5
0
0
1
2
3
Probability Mass Functions
The probability mass function (pmf) of a discrete random
variable X is a function fX defined by:
fX (x) = P (X = x)
Example:
X is the number of heads in two tosses of a fair coin
What is fX?
fX(0) = 0.25,
fX(1) = 0.5, fX(2) = 0.25,
fX(i) = 0, otherwise
Examples of pmfs
Binomial distribution: Bin(n, p)
� �
n k
fX (k) =
p (1 − p)n−k , 0 ≤ k ≤ n
k
fX (k) = 0, otherwise
Exercise: How will you verify this is a pmf?
Exercise: How will you calculate the cdf?
Examples of pmfs
Geometric distribution: Geom(p)
fX (k) = p(1 − p)k−1 , k = 1, 2, 3, . . .
fX (k) = 0, otherwise
Exercise: How will you verify this is a pmf?
Exercise: How will you calculate the cdf?
Probability Density Functions
The probability density function (pdf) of a continuous random
variable X is a function fX such that:
1. fX (x) ≥ 0, for all x
2.
3.
�
∞
fX (x)dx = 1
−∞
For all a ≤ b, P (a ≤ X ≤ b) =
�
b
fX (x)dx
a
For a continuous random variable X, the cdf FX(x) is:
FX (x) =
�
x
fX (t)dt
−∞
Examples of pdfs
Uniform(a,b):
1
fX (x) =
,a ≤ x ≤ b
b−a
fX (x) = 0, otherwise
Exercise: Verify this is a pdf
Exercise: How will you calculate the cdf?
Examples of pdfs
Normal: N (µ, σ 2 )
1
−(x−µ)2 /2σ 2
fX (x) = √ e
σ 2π
Exercise: Verify this is a pdf
Standard normal: normal distribution with mean 0 and stdev 1
Φ(z)
: cdf for a standard normal
Properties:
1. If X is N(m, s2), then Z = (X - m)/s is a standard normal
2. If Xi is N(mi, si2), and Xis are independent, then
Z = X1 + .. + Xk is N(m1+..+mk, s12+..+sk2)
3. If Z is N(0,1) then m + sZ is N(m, s2)
Examples of pdfs
Exponential: Exp(β)
1 −x/β
fX (x) = e
,x > 0
β
fX (x) = 0, otherwise
Exercise: Verify this is a pdf
Bivariate Distributions
In the discrete case, a function f(x, y) is a joint mass
function for random variables (X,Y) if:
f(x, y) = Pr(X = x,Y = y)
Bivariate Distributions
In the continuous case, a function f(x, y) is a joint density
function for random variables (X,Y) if:
1. f (x, y) ≥ 0, for all x, y
2.
3.
�
∞
−∞
�
∞
f (x, y)dxdy = 1
−∞
For any set A, P ((X, Y ) ∈ A) =
� �
f (x, y)dxdy
A
Examples of joint pdfs
Uniform:
f (x, y) = 1, 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) = 0, otherwise
Exercise: Verify this is a pdf
Exercise: What is P(X > 0.5,Y < 0.5) ?
Examples of joint pdfs
Multivariate Normal:
1
1
� −1
f (x, µ, Σ) =
exp(−
(x
−
µ)
Σ (x − µ))
d/2
1/2
2
(2π) det(Σ)
Properties:
1. Standard multivariate normal has mean 0d and covariance Id
2. If Z is standard multivariate normal,
then X = m + S1/2Z is N(m, S)
More properties as exercise
Marginal Mass & Density Functions
In the discrete case, if random variables (X,Y) have a joint mass
function fXY, then the marginal mass function for X is:
fX (x) = P (X = x) =
�
P (X = x, Y = y) =
y
�
fXY (x, y)
y
What is the marginal mass function for Y?
In the continuous case, if random variables (X,Y) have a joint
mass function fXY, then the
� marginal density function for X is:
fX (x) =
fXY (x, y)dy
y
What is the marginal density function for Y?
Conditional Mass & Density
Functions
In the discrete case, the conditional mass function for X is:
P (X = x, Y = y)
fXY (x, y)
fX|Y (x|y) = P (X = x|Y = y) =
=
P (Y = y)
fY (y)
if fY(y) > 0
In the continuous case, the conditional density function for X is:
fXY (x, y)
fX|Y (x|y) =
fY (y)
if fY(y) > 0
Expectation
For a discrete random variable X, the expectation is defined as:
�
E[X] =
xfX (x)
x
For a continuous random variable X, the expectation is defined as:
E[X] =
�
∞
xfX (x)dx
−∞
Let Y = r(X). Then E[Y] can be computed as:
E[Y ] =
E[Y ] =
�
∞
r(x)fX (x)dx
−∞
�
x
r(x)fX (x)
continuous
discrete
Variance
For a random variable X, the variance is defined as:
2
V ar(X) = E[(X − E[X]) ]
Property:
E[(X − E[X])2 ] = E[X 2 ] − (E[X])2
Exercise: Prove the property
Independence of random
variables, Covariance
Random variables X and Y are independent if for any two
sets A and B,
P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B)
For two random variables X,Y,
cov(X,Y) = E( (X - E[X]) (Y - E[Y]) )
cov(X,Y) can also be written as:
cov(X,Y) = E(XY) - E(X) E(Y)
Property: If X and Y are independent, then
Var(X + Y) = Var(X) + Var(Y)
Cov(X,Y) = 0
Does the converse hold?

Download Report

CSE 151 Machine Learning

Paperzz.com

Your Paperzz