Lecture 1: Probability and Stochastic Process

Lecture 1: Probability and Stochastic Process
Kai Yu
Speech Lab
Department of Computer Science & Engineering
Shanghai Jiao Tong University
Autumn 2014
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
1 / 31
Table of Content
I
Probability basic
I
Discrete and continuous distribution
I
Statistics and property
I
Multiple random variable and joint probability
I
Stochastic process
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
2 / 31
Probability
Event, Experiment and Random Variable
Probability is a measure or estimation of how likely it
is that something will happen or that a statement is true.
I
Uncertainty comes from non-deterministic results of
experiments
I
Coutability is the key to differentiate Discrete and
Continuous
Events → Random Variable
Discrete Count gender of CS department in SJTU.
Continuous Measure morning arrival time of students
Q1: Must discrete variable be finite?
Q2: Can an event be both discrete and continuous?
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
3 / 31
Frequency Probability
Mathematical Interpretation
Probability of a random event denotes the relative
frequency of occurrence of an experiment’s outcome,
when repeating the experiment.
I
I
Measure theory is used to strictly define probability function
Prerequisite: well defined event set
I
I
I
Ω is an arbitrary non-empty set
F is a set of the subset of Ω
Definition and Property: Probability P is a function defined
on Ω, such that
1. Non-negative: 0S≤ P (A) ≤ 1 for any A ∈ F, P (∅) = 0
∞
2. Normalized: P ( i=1 Ai ) = P (Ω) = 1
3. Summation: if Ai ∈ F, i = 1, 2, · · · and Ai ∩ Aj = ∅ for any
i 6= j ,then
P(
N
[
i=1
Kai Yu
Ai )
=
N
X
P (Ai )
(1)
i=1
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
4 / 31
Probability Distribution - Discrete Random Variable
Probability Mass Function
Example: The throw of a dice. 6 possible events
Ω = {up, down, left, right, front, back}
mapped to values of a random variable x, x ∈ X
X = {1, 2, 3, 4, 5, 6}
I
I
No meaningful quantitative comparison for x
Representation: Probability Mass Function (PMF)
X
P (x) = 1, P (x) > 0 ∀x
x∈X
I
Discrete PMF is essentially a Lookup Table
x
1 2 3 4 5 6
P(x) 16 16 16 16 16 16
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
5 / 31
Discrete Probability Distribution
Bernoulli Distribution
Bernoulli Trial: Consider coin tossing, random variable x ∈ {0, 1}
can then be introduced, 1 denotes success (getting ”head”) and 0
for failure (”tail”).
Figure : toss a coin
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
6 / 31
Discrete Probability Distribution
Bernoulli Distribution
Bernoulli Trial: Consider coin tossing, random variable x ∈ {0, 1}
can then be introduced, 1 denotes success (getting ”head”) and 0
for failure (”tail”). Bernoulli distribution can then be defined as
x
P(x)
I
I
1
p
0
1-p
0 < p < 1 is the probability of success getting coin head
Formula of probability for getting k success from n trials
P (k) = C(n, k)pk (1 − p)n−k
n!
C(n, k) =
k!(n − k)!
Q1: What’s the formula for one-of-K scenario?
Q2:
What’s the difference between quantitative and
non-quantitative discrete variable?
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
7 / 31
Probability Distribution - Continuous Random Variable
Probability Density Function
Example: Measure exact height of people. Infinite possible events
Ω = {−∞, ∞}
naturally mapped to values of a random variable x, x ∈ {−∞, ∞}
I
I
Meaningless to consider probability of a ”particular” event as
it is infinitely small
Meaningful to consider PMF of events falling into a ”range”
and introduce Probability Density Function (pdf)
Z b
P (x ∈ (a, a + 4x))
P (x ∈ (a, b)) =
p(x)dx, p(a) ≈
4x
a
It’s simple to see that
Z
∞
p(x) > 0,
p(x)dx = 1
−∞
Q: What is the maximum value of a pdf?
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
8 / 31
Continuous Probability Distribution
Gaussian Distribution
A Gaussian (or Normal) distribution with mean µ and variance σ 2
has the form
n (x − µ)2 o
1
exp −
p(x) ≡ N (µ, σ 2 ) = √
2σ 2
2πσ 2
Figure : Guassian with mean 0 and variance 1.5
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
9 / 31
Continuous Probability Distribution
Gaussian Distribution
A Gaussian(or Normal)distribution with mean µ and variance σ 2
has the form
n (x − µ)2 o
1
p(x) ≡ N (µ, σ 2 ) = √
exp −
2σ 2
2πσ 2
I
If µ = 0 and σ = 1, it is called standard or unit Gaussian
distribution
I
Under mild condition, mean of a sufficiently large number of
independent random variables, will be approximately normally
distributed
I
Convenient for mathematical derivation
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
10 / 31
Useful Statistics - Mean and Variance
Concept of Expectation
Expectation is the average weighted value w.r.t probability
function. It is the result you would get by sampling a distribution
and computing an average.
Mean
Eq(x) [x] ≡ hxiq(x)
P
xP (x) q(x) ≡ P (x)
= µ = R x∈X
x xp(x)dx q(x) ≡ p(x)
Variance
Eq(x) [(x − µ)2 ] ≡ h(x − µ)2 iq(x) = σ 2 =
P
2
R x∈X (x −2 µ) P (x)
x (x − µ) p(x)dx
Note that
σ 2 = E[(x − µ)2 ] = E[x2 ] − (E[x])2
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
11 / 31
Mean and Variance
Calculation
Example
x
, 0≤x≤2
p(x) =
2
0, otherwise
2
2
R2
x2
x
x3
2
E [x] = 0 x −
dx =
−
=
2
2
6 0 3
3
2
3
2 R 2
x
x
x4
2
2
E x = 0 x −
dx =
−
=
2
3
8 0 3
Hence
2
u = E [x] =
3
2
2
2
σ = E[x ] − µ2 =
9
(
Kai Yu
1−
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
12 / 31
Useful Statistics - Moment
High-order statistics
General Form of Expectation
P
f (x)P (x) q(x) ≡ P (x) Discrete
E[f (x)] ≡ hf (x)iq(x) = R x∈X
x f (x)p(x)dx q(x) ≡ p(x) Continuous
Moment is high-order statistics. The nth moment of a probability
distribution is defined as
E[xn ] ≡ hxn iq(x)
I
Moment is often defined for continuous random variable
I
Central moment is moment of mean
I
Variance E[(x − µ)2 ] ≡ h(x − µ)2 iq(x) is the second-order
central moment of q(x)
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
13 / 31
Information and Entropy
Measure of uncertainty
Information: ”Message being conveyed”. It resolves uncertainty
Uncertainty: measured by probability of event occurrence and is
inversely proportional to that. The more uncertain an event is
more information is required to resolve uncertainty of that event.
I
Information in discrete variable x is: I(x) = − log2 P (x)
I
Entropy: Average information of the whole source is a
measure of overall uncertainty. Measured in bits.
H = E[− log2 P (x)] = −
X
P (x) log2 P (x)
x∈X
I
Entropy for continuous variable (measured in nats not bits)
Z ∞
H = E[− loge p(x)] = −
p(x) loge (p(x))dx
−∞
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
14 / 31
Statistics for Bernoulli and Gaussian Distribution
Dist.
mean
variance
3rd cen. mom.
entropy
Bernoulli B(1)
p
p(1 − p)
p(1 − p)(1 − 2p)
−q ln q − p ln p
Gaussian N (µ, σ 2 )
µ
σ2
0
1
2
ln
2πeσ
2
I
Gaussian distribution is the only absolutely continuous
distribution all of whose cumulants beyond the first two (i.e.,
other than the mean and variance) are zero
I
The continuous distribution with the maximum entropy for a
given mean and variance
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
15 / 31
Summation of Random Variables
Convolution
Given discrete variable Z = X + Y , Px (X) and Py (Y ), what’s the
distribution of random variable Pz (Z)?
X
X
Px (Z − Y )Py (Y ) =
Px (X)Py (Z − X)
Pz (Z) =
Y
X
This is known as convolution between Px (X) and Py (Y ). For
continuous case:
Z ∞
Z ∞
pz (z) =
px (z − y)py (y) dy =
px (x)py (z − x) dy
y=−∞
x=−∞
Convolution is a result of computing the area of overlap between
the two functions by sliding the mirror of one function over the
other.
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
16 / 31
Convolution
Example
Let X and Y be results of rolling two dices, Z is the sum, then
Pz (2) = Px (1)Py (1) =
1 1
1
× =
6 6
36
1
1 1 1 1
× + × =
6 6 6 6
36
3
Pz (4) = Px (1)Py (3) + Px (2)Py (2) + Px (3)Py (1) =
36
..
.
Pz (3) = Px (1)Py (2) + Px (2)Py (1) =
Z
P(Z)
2
3
4
5
6
7
8
9
10
11
12
1
36
2
36
3
36
4
36
5
36
6
36
5
36
4
36
3
36
2
36
1
36
Summation of two uniformly distributed variable results in a
triangle distribution after convolution.
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
17 / 31
Convolution
Visualization
Pz (Z) =
X
Y
Kai Yu
Px (Z − Y )Py (Y ) =
X
Px (X)Py (Z − X)
Y
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
18 / 31
Central Limit Theorem
I
Sum of random variable
I
I
I
p(x) depends on N
I
I
I
I
Kai Yu
un are i.i.d and p(un )
follow uniform distribution
[0, 1]
PN
x = n=1 un
N = 1 (top): uniform
N = 2 (middle): triangle
[0,2]
N = 1000 (bottom):
appr. Gaussian
Central limit theorem
As N → ∞, p(x) tends to
be Gaussian regardless of
p(u)
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
19 / 31
Joint Probability and Multivariate Distribution
I
Joint probability is the probability that all random variables
xi ∈ Xi , i = 1, · · · , d occur simultaneously
I
It is often notationally convenient to combine joint events into
vector x ∈ X , where




x1
X1




x =  ...  X =  ... 
xd
Xd
I
Multivariate distribution is then written as
X
P (x) > 0,
P (x) = 1
x∈X
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
20 / 31
Bayes’ Theorem and Important Concepts
Discrete distribution as example
Bayes’ Theorem gives the relationship between the probabilities
of two random variables
P (X|Y ) =
P (Y |X)P (X)
P (Y )
I
Prior or Marginal:
P (X) is the prior distribution or marginal distribution of X
as it does not take into account any information of Y
I
Posterior or Conditional:
P (X|Y ) describes the probability of X conditioned on Y , i.e.,
the probability of X given Y happens
I
Independent:
P (X, Y ) = P (X)P (Y )
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
21 / 31
Statistics for Multivariate Distributions
Continuous distribution as example
I
Mean: vector extention of scalar mean
Z
µ = E[x] =
x p(x) dx
x
I
Covariance: matrix extension due to inter-dimensional
correlation
Σ = E[(x − µ)(x − µ)> ] = E[xx> ] − µµ>


E[x21 ] − µ21
· · · E[x1 xd ] − µ1 µd


..
..
..
= 

.
.
.
2
2
E[xd x1 ] − µd µ1 · · ·
E[xd ] − µd
covariance matrix is always symmetric
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
22 / 31
More about Covariance: Correlation
A covariance matrix of variables x and y may written as
σxx σxy
Σ=
σxy σyy
where
σxy = E[(x − µx )(y − µy )] and σx2 = σxx .
The correlation coefficient,ρ,is defined as
ρ=
σxy
,
σx σy
−1 ≤ ρ ≤ 1
When ρ = 0 the two random variables are said to be uncorrelated.
Q: Uncorrelated = Independence?
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
23 / 31
Multivariate Gaussian Distribution
The multivariate form of Gaussian distribution with d dim. feature
is
1
1
p(x) =
− (x − µ)> Σ−1 (x − µ)
d
1 exp
2
(2π) 2 |Σ| 2
If the covariance matrix is diagonal, this expression may be
simplified to
p(x) =
Kai Yu
d
Y
(xi − µi )2 1
q
exp −
2σi2
2πσi2
i=1
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
24 / 31
Multivariate Gaussian Distribution
Figure : a 2D Gaussian
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
25 / 31
Multivariate Gaussian Distribution
Σ=
2 1
1 2
Σ=
3 0
0 1
Σ=
3 0
0 3
Figure : 2D-Guassians with different Σ
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
26 / 31
Properties of Multivariate Gaussians
I
The marginal distribution p(xi ) of any component is Gaussian.
I
The joint maginal distribution of any subset p(xi , xj , ..) is
Gaussian.
I
The conditional distribution p(xi |xj ) is Gaussian.
I
If x is Gaussian and y = Ax + b,then y is Gaussian with
Aµx and covariance AΣx A> .
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
27 / 31
Properties of Multivariate Gaussians
0.3
1
p(x|y=0.7)mean:0.6sigma:1.5
p(x)mean:0.5sigma:2
0.9
0.25
0.8
y=0.7
0.7
0.2
0.6
0.5
0.15
0.4
0.3
0.1
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.05
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
2 1
0.5
2D Gaussian with µ =
Σ=
0.5
1 2
The condition marginal distribution of a Guassian is also
Guassian(figure on the right).
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
28 / 31
Stochastic Process
Definition
I
I
Deterministic part in stochastic process: Time or space
index are deterministic
Stochastic process is in concept determined by joint
probability of random variables at any index
P (x1 , · · · , xN ) N → ∞
I
Stationary
I
Strict/strong: Joint distribution is time invariant
PX (xt1 , · · · , xtk ) = PX (xt1 +τ , · · · , xtk +τ )
I
Note τ does not affect PX which is not a function of time.
Weak/wide-sense: 1st and 2nd moments are invariant
E[x(t)]
E[x(t1 )x(t1 + τ )]
Kai Yu
= E[x(t + τ )] = µ ∀τ ∈ R
=
Σx (t1 , t1 + τ ) = Σ(τ )
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
29 / 31
Stochastic Process
Quasi-stationary and Ergodic
I
I
Weak stationary is not necessarily strict stationary and vice
versa. Weak stationary Gaussian process is also strict
stationary
Quasi-stationary: short-time or local stationary
E[x(t)] = µ(t), |µ(t)| ≤ C ∀t
E[x(t1 )x(t1 + τ )] = Σx (t1 , t1 + τ ), |Σx (t1 , t1 + τ )| ≤ C
N
1 X
Σx (t1 , t1 + τ ) = Σx (τ )
N →∞ N
lim
t=1
I
it is normally assumed that speech is stationary over a short
time interval (about 10-30 ms).
Ergodic: A dynamical system has the same behavior averaged
over time as averaged over the space of all the system’s states.
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
30 / 31
Stochastic Process
Example:Brownian Motion
A standard Wiener process
(often called Brownian motion) on the interval [0, T ]
is a random variable W (t)
that depends continuously
on t ∈ [0, T ] and satisfies the following: W (0) =
0√ and W (t) − W (s) ∼
t − sN (0, t − s)
1
0.5
W(t)
0
−0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
t
Kai Yu
Lecture 1: Probability and Stochastic Process
SJTU Speech Lab
31 / 31

Download Report

Lecture 1: Probability and Stochastic Process

Paperzz.com

Your Paperzz