Financial Calculus Probability Intro version 01 (Feb`10) Probability

Financial Calculus
Probability Intro
version 01 (Feb’10)
Probability History.
...
Aναyκη Anagke (destiny, fate, inevitability, necessity, ...) - mother of the
Moirae M oιραι
Khronos Xρoυoζ time, Aon Aιωυ eternity
Tyche T uχη good fortune of cities
...
Chu 1100
Bhaskara 1150
Stifel 1544
Tartaglia 1556
Cardano 1570
Pascal 1665
...
... Blaise Pascal (1623-1662) ... In 1654, prompted by a friend interested in
gambling problems, he corresponded with Fermat on the subject, and from that
collaboration was born the mathematical theory of probabilities. The friend
was the Chevalier de Méré, and the specific problem was that of two players
who want to finish a game early and, given the current circumstances of the
game, want to divide the stakes fairly, based on the chance each has of winning
the game from that point. From this discussion, the notion of expected value
was introduced. Pascal later (in the Pensées) used a probabilistic argument,
Pascal’s Wager, to justify belief in God and a virtuous life. The work done by
Fermat and Pascal into the calculus of probabilities laid important groundwork
for Leibniz’ formulation of the infinitesimal calculus. http://en.wikipedia.
org/wiki/Blaise_Pascal
...
The idea of probability as intensity of belief was introduced by John Maynard
Keynes in his Treatise on Probability (McMillan Publishing, 1921). ... Probability as relative frequency is the standard interpretation of probability in the physical sciences. Introduced by Richard Von Mises Wahrscheinlichkeitsrechnung,
Statistik unt Wahrheit (Vienna: Verlag von Julius Spring, 1928) .... Probability
as an axiomatic system formulated by Andrei N. Kolmogorov in Grundbegriffe
der Wahrscheinlichkeitsrechnung (Berlin: Springer, 1933) ... In economics and
finance theory, probability might have two different meanings: (1) as a descriptive concept and (2) as a determinant of the agent decision-making process.
As a descriptive concept, probability is used in the sense of relative frequency,
similar to its use in the physical sciences: the probability of an event is assumed
to be approximately equal to the relative frequency of its occurrence in a large
1
Financial Calculus
Probability Intro
version 01 (Feb’10)
number of experiments. There is one difficulty with this interpretation, which is
peculiar to economics: empirical data (i.e., financial and economic time series)
have only one realization. Every estimate is made on a single time-evolving series. If stationarity (or a well-defined time process) is not assumed, performing
statistical estimation is impossible. .... In making probability statements we
must distinguish between outcomes and events. Outcomes are the possible
results of an experiment or an observation, such as the price of a security at a
given moment. However, probability statements are not made on outcomes but
on events, which are sets of possible outcomes. ......... The axiomatic theory of
probability is based on three fundamental concepts: (1) outcomes, (2) events,
and (3) measure. The set of all possible outcomes is often written as the set Ω
...
....
2
Financial Calculus
Probability Intro
version 01 (Feb’10)
Sets. Algebra of sets. Measure.
....
Operations on sets - union, intersection, complementation ..................... de
Morgan rules, .......... power sets .... functions .... inverse image function ....
partitions ...... disjoint union A + B
Definition. Fix a set (a universe) Ω and consider a collection F of subsets
of the universe. This collection is a field of sets (or algebra of sets) if it is
closed under the basic set operations (union, intersection, complementation),
i.e., performing them we say in the collection. It is a σ-field if it is closed under
countable unions and intersections. The pair (Ω, F) where F is a σ-field on Ω
is called a measurable space and elements of the field are called measurable
sets.
Exercise. Let us have a finite partition A = {A1, . . . , An } of the universe.
Taking all possible unions of parts of the partition one obtains a (finite) field.
Conversely, any finite field is generated in this way by a partition consisting of
the atoms of the field.
... P(Ω) is a σ-field .... intersection of any amily of σ-fields is a σ-field .... for any
collection of subsets C we can consider all σ-fields that contain this collection
and take their intersection, in this way we obtain the smallest σ-field containing
the collection, the σ-field generated by C denoted σ(C). .... If F = σ(C) then
the sets in C are the generators of F. ... the σ-field on R generated by the
intervals is the Borel σ-field ....
Definition. A content µ is a nonnegative additive function on a field of sets
F, more precisely µ : F → [0, ∞], µ(∅) = 0, and µ(A + B) = µ(A) + µ(B). If
it is additive for countable disjoint collections then we say it is σ-additive. A
σ-additive content on a σ-field is called a measure. A measurable space with a
choice of measure is called a measure space. A measure P such that P (Ω) =
1 is called a probability measure. A measurable space with a probability
measure is called a probability space.
Exercise. Show that a content is monotone, i.e., A ⊆ B implies µ(A) ≤ µ(B).
Show that for any sets in the field one has µ(A) + µ(B) = µ(A ∪ B) + µ(A ∩ B)
(the inclusion-exclusion law).
Exercise. Let us have a finite field generated by a finite partition of atoms A =
{A1, . . . , An }. Show that for any choice of nonnegative numbers ai , i = 1, . . . , n
the assignment µ(Ai ) = ai extends to a content. Conversely, any content on a
finite field is obtained in this way.
......
Let α : R → R be an increasing function. Define α(−∞) = inf α and α(∞) =
sup α . It will be shown in the following exercises that λα ((a, b]) = α(b) − α(a)
gives a content on the field generated by intervals. ... Lebesgue content ....
The content λα on the field generated by intervals is σ-additive iff αis right
continuous .........
3
Financial Calculus
Probability Intro
version 01 (Feb’10)
.... convention ... (X ≤ x) stands for the set X −1 ((−∞, x]) = {ω ∈ Ω : X(ω) ≤
x} .... X ∈ B stands for the set X −1 (B) = {ω ∈ Ω : X(ω) ∈ B} ..... the
distribution function of the random variable X is F (x) = P (X ≤ x) ....
.........
Theorem. Every finite-additive content µ defined on a field A has a uniquely
determined measure extension to F = σ(A).
Borel measurable sets .....
.....
Definition. Given two measurable spaces (X, FX ) and (Y, FY ) a function f :
X → Y is measurable, or more precisely (FX , FY )-measurable, if the preimage
function sends measurable sets to measurable sets, i.e., if f −1 (B) ∈ FX for
every B ∈ FY . (It is a common convention that when we think of the reals as a
measurable space by default we assume (R, B), i.e. the reals equipped with the
Borel field B. Also, if (Ω, F) a measurable space, by a measurable function we
understand f : (Ω, F) → (R, B).) We can transfer measures from the source to
the target, i.e., if f : (X, FX ) → (Y, FY ) is measurable function and if moreover
the source (X, FX , µ) is a measure space then µf defined by µf (B) = µ(f −1 (B))
for B ∈ FY is a measure on Y (prove it) called the image of µ under f or the
distribution of f under µ.
....
4
Financial Calculus
Probability Intro
version 01 (Feb’10)
Probability. Random variables. Distribution functions.
...
...
Definition. Let (Ω, F, P ) be a probability space. Any F-measurable realvalued function X : Ω → R is called a random variable. The function F
defined by FX (x) = P (X ≤ x) for x ∈ R is called the cumulative distribution
function (c.d.f.), or simply distribution function (d.f.), of the random
´ x variable
X. Suppose that there is a function fX such that FX (x) = −∞ fX (u) du
or (FX (x))0 = fX (x) then the function fX is called the probability density
function of the random variable X.
....
Proposition. Every probability distribution function is monotone increasing,
right continuous, with assymptotes 0 at −∞ and 1 at ∞.
...
Note that for the left limit at a point we have F (a−) = P ((−∞, a)) ........
....
....
Proposition. Let X be a random variable, let g be strictly increasing on the
range of X, and defineY = g(X). Then the distribution and density function
d
are related by FY (y) = FX (g −1 (y)) and fY (y) = fX (g −1 (x)) g −1 (y). If g is
dy
d
strictly decreasing then FY (y) = 1−FX (g −1 (y)) and fY (y) = −fX (g −1 (x)) g −1 (y).
dy
proof. If g is strictly increasing then (X ≤ g −1 (y)) = (g(X) ≤ y). Hence
FY (y) = P (g(X) ≤ y) = P (X ≤ g −1 (y)) = FX (g −1 (y)).
Corollary. Let F be strictly increasing for those y for which 0 < F (y) < 1. If
U is uniformly distributed on [0,1] then Y = F −1 (U ) has distribution FY (y) =
F (y).
Corollary. Let U and V be independent and uniformly distributed on [0,1].
Then
√
√
X = −2 ln U cos 2πV and Y = −2 ln U sin 2πV
independent random variables with standard normal distiribution N (0, 1).
Corollary. Let X ∼ N (µ, σ 2 ), i.e., X a normal random
variable with parame√
ters (µ, σ 2 ), i.e., it has density function fX (x) = (σ 2π)−1 exp(−(x−µ)2 /2σ 2 ).
Then the random variable Z = (X − µ)/σ has a standard normal distribution,
i.e., Z ∼ N (0, 1).
....
5
Financial Calculus
Probability Intro
version 01 (Feb’10)
Examples. Uniform distribution on [0,1]:


0
F (x) = x


1
Unit normal distribution:
1
F (x) = √
2π
if x ≤ 0
if 0 ≤ x < 1
if 1 ≤ x
ˆ
x
e−u
2
/2
du
−∞
We write X ∼ N (µ, σ 2 ) to indicate that a variable has normal distibution with
parameters (µ, σ 2 ), i.e., it has density function
f (x) =
1
√
exp(−(x − µ)2 /2σ 2 ) .
σ 2π
It is straight forward to show that Z = (X − µ)/σ has a standard normal
distribution, i.e., Z ∼ N (0, 1).
...
....
6
Financial Calculus
Probability Intro
version 01 (Feb’10)
Expectation:
Definition. Let (Ω, F, P ) be a probability space. The space R(Ω) of random
variables (i.e., measurable functions) X : Ω → R is a vector space. (The
vector space structure on R(Ω) is induced from R pointwise.) The expectation
is a linear map E : R(Ω) → R defined
by averaging random variables over
´
the probability space, i.e., E[X] = Ω X(ω) dP (ω). Given a random variable
X : Ω → R denote its distribution function by FX (x) = P (X ≤ x) and its
d
density function fX (x) =
FX (x). Then the expectation can we equivalently
dx
´
´
described as E[X] = R x dFX (x) = R xfX (x) dx. For a measurable function g :
R → R composing with a random variable´ X we get another random
variable g ◦
´
X
the
expectation
of
which
is
E[g
◦
X]
=
g(X(ω))
dP
(ω)
=
g(x)
dFX (x) =
Ω
R
´
g(x)f
(x)
dx.
X
R
...
....
7
Financial Calculus
Probability Intro
version 01 (Feb’10)
Independence.
Definition. Let (Ω, F, P ) be a probability space. Two events A, B ∈ F are
independent if P (A ∩ B) = P (A)P (B).
Let C and D be subfamilies of F. The families C and D are said to be independent (with respect to P ) if P (A ∩ B) = P (A)P (B) for every choice
A ∈ C and B ∈ D. Two random variables X and Y are independent if
P (X ∈ B, Y ∈ B 0 ) = P (X ∈ B)P (Y ∈ B 0 ) for any B, B 0 ∈ B. In other words
two random variables X and Y are independent if σ(X) and σ(Y ) are independent. (Recall that the information set σ(X) of a random variable X is the
system of sets (X ∈ B) where B is an arbitrary Borel set, i.e., σ(X) = X −1 (B).
Conceptually, the information set σ(X) is the σ-field that is generated by all
events which can be observed through X.)
....
Proposition. Let (Ω, F, P ) be a probability space and X, Y two independent
random variables. Then
E(XY ) = E(X) E(Y )
proof. We illustrate the proof on the case of random
P variables with finite
number of values. In this case we can write X(ω) = x∈R x 1(X=x) (ω) and
this sum is finite.
P Similarly for Y . Note that 1A∩B = 1A 1B so for the product
we have XY = (x,y)∈R2 xy 1(X=x) 1(Y =y) . From the definition of expectation
E(1A ) = P (A) we have E(1(X=x) 1(Y =y) ) = P ((X = x) ∩ (Y = y)). Since
X and Y are independent we have that P ((X = x) ∩ (Y = y)) is equal to
P (X = x) P (Y = y) and hence E(1(X=x) 1(Y =y) ) = E(1(X=x) )E(1(Y =y) ). Now
taking expectation of XY and using the linearity of E and the above discussion
we obtain a sum over (x, y) that factors into a sum over x and over y
X
X
X
E(XY ) =
xy P (X = x, Y = y) =
x P (X = x)
y P (Y = y) = E(X) E(Y ) .
(x,y)∈R2
x∈R
y∈R
Corollary. Given independet random variables X and Y for the variance we
have
V ar(X + Y ) = V ar(X) + V ar(Y )
proof. From the definition we have V ar(X + Y ) = E((X + Y )2 − (E(X +
Y ))2 ). Expanding, using the linearity of expectation and using that the result
of taking expectation is a constant so taking expectation once more we obtain
the same constant we get V ar(X + Y ) = E(X 2 ) + 2E(XY ) + E(Y 2 ) − (EX)2 −
2(EX)(EY ) − (EY )2 . Now we use the independence so E(XY ) = (EX)(EY )
and we two terms in the above cancel so regrouping we are done.
...
8
Financial Calculus
Probability Intro
version 01 (Feb’10)
Random process. Bernoulli process. Random walk.
...
Definition. A random process is a parametrized random variable, usually the
parameter we think of as continuous or discrete time, so a process is a variable
evolving with time. Formalizing this we assume that we are given (Ω, F), i.e.,
a sample space Ω with algebra of events F, and an “interval of time” I (with a
natural algebra I of subsets of I). A process is a random variable X : Ω×I → R.
Time could be continuous with I being finite [0, T ] or infinite [0, ∞), or discrete
which again could be finite {0, 1, . . . , n} or infinite {0, 1, . . . }. A process could be
viewed either as a map t 7→ Xt from time to random variables (it is common to
denote Xt (ω) = X(ω, t)) or as a map from samples to paths ω 7→ (t 7→ Xt (ω)),
where for a fixed ω ∈ Ω the map t 7→ Xt (ω) is called a path.
...
http://en.wikipedia.org/wiki/Bernoulli_process (named after Swiss scientist and mathematician Jacob Bernoulli (Basel, 1654-1705), ... law of large
numbers, ..., grand-uncle of Hermann Hesse, author of Siddhartha,)
In probability and statistics, a Bernoulli process is a discrete-time stochastic
process consisting of a sequence of independent random variables taking values
over two symbols. Prosaically, a Bernoulli process is coin flipping several times,
possibly with an unfair coin.
...
In other words, a Bernoulli process is a sequence of independent identically distributed Bernoulli trials. The two possible values of each Xi are often called
"success" (or "arrival") and "failure", so that, when expressed as a number, 0 or
1, the value is said to be the number of successes on the ith "trial". The individual success/failure variables Xi are also called Bernoulli trials. Independence of
Bernoulli trials implies memorylessness property: past trials do not provide any
information regarding future outcomes. From any given time, future trials is
also a Bernoulli process independent of the past (fresh-start property). Random
variables associated with the Bernoulli process include
* The number of successes in the first n trials; this has a binomial distribution;
....
...
in probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n
independent yes/no experiments, each of which yields success with probability
p. Such a success/failure experiment is also called a Bernoulli experiment or
Bernoulli trial. In fact, when n = 1, the binomial distribution is a Bernoulli
distribution. ... In general, if the random variable K follows the binomial distribution with parameters n and p, we write K ∼ B(n, p). The probability of
getting exactly k successes in n trials is given by the probability mass function:
9
Financial Calculus
Probability Intro
version 01 (Feb’10)
fB(n,p) (k) = P (K = k) = nk pk (1 − p)n−k ... The cumulative distribution
function can be expressed as:
Pbxc F (x; n, p) = P (X ≤ x) = i=0 ni pi (1 − p)n−i .
where bxc is the "floor" under x, i.e. the greatest integer less than or equal to x.
... If X ∼ B(n, p) (that is, X is a binomially distributed random variable), then
the expected value of X is E[X] = np and the variance is V ar[X] = np(1 − p).
... If n is large enough, then the skew of the distribution is not too great. In this
case, if a suitable continuity correction is used, then an excellent approximation
to B(n, p) is given by the normal distribution N (np, np(1 − p)).
The approximation generally improves as n increases and is better when p is
not near to 0 or 1.
...
10
Financial Calculus
Probability Intro
version 01 (Feb’10)
Conditional Expectation.
...
...
Let (Ω, F, P ) be a probability space. Take an event B ∈ F with 0 < P (B) < 1.
For any event A ∈ F the conditional probability of A given B is defined to
be P (A|B) = P (A ∩ B)/P (B). Writing P (A ∩ B) as E[1A 1B ] we can extend from random variables of the form 1A to any random variable X and
1 ´
X dP . (Note that E[1A |B] = P (A|B).) Simidefine E[X|B] =
P (B) B
larly we can define the conditional expectation of X given the complement
´
1
B c as E[X|B c ] =
c X dP . Denote the σ-algebra generated by B as
c
P (B ) B
c
G = {∅, B, B , Ω}. This σ-algebra can be described also as the information set
of the random variable 1B , i.e., the σ-algebra generated by the random variable
1B , hence G = σ(1B ). Obviously G ⊂ F. We have defined in this way a new
random variable Y = E[X|G], called the conditional expectation of X given
G, as Y (ω) = E[X|B] if ω ∈ B and Y (ω) = E[X|B c ] if ω ∈´B c , or equivalently
´
Y = E[X|B] · 1B + E[X|B c ] · 1B c . It satisfies the condition C Y dP = C X dP
for all C ∈ G.
Definition. Let (Ω, F, P ) be a probability space, X a F-measurable random
variable on it, and G ⊂ F a σ-subalgebra. The conditional expectation
of X
´
given
G
is
a
G-measurable
random
variable
E[X|G]
such
that
E[X|G]
dP
=
C
´
X
dP
for
all
C
∈
G.
If
G
=
σ(Y
)
is
the
information
set
of
a
random
variable
C
Y , i.e., the σ-algebra generated by the random variable Y , then we denote
E[X|Y ] = E[X|σ(Y )].
....
From this property follow that: E[. . . |G] is a linear projection of the convex
cone of non-negative random variables on (Ω, F, P ) to the convex cone of nonnegative random variables on (Ω, G, P ). Moreover E[X] = E[E[X|G]]. ... The
defining property for non-negative integrable X is saying that´ the conditional
expectation is a density for the measure on G defined by C 7→ C X dP . ........
Proposition. Conditional expectation has the following properties:
• E[aX + bY |G] = aE[X|G] + bE[Y |G] (linearity)
• if X ≥ 0 then E[X|G] ≥ 0 (positivity)
• E[Y X|G] = Y E[X|G] if Y is G-measurable (taking out what is known)
• E(E(X|G)|H) = E(X|H) = E(E(X|H)|G) if H ⊂ G-measurable (tower
property)
....
11
Financial Calculus
Probability Intro
version 01 (Feb’10)
Exercise. (Bayes Theorem) Let us have a partition of the sample space Ω by a
collection of events {Bi ∈ F : i = 1, 2, . . . }, i.e., they are pairwise disjoint and
their union gives the total sample space, then
P (A) = P (A|B1 ) P (B1 ) + P (A|B2 ) P (B2 ) + . . .
for any event A.
....
12
Financial Calculus
Probability Intro
version 01 (Feb’10)
Law of large numbers. Central limit theorem.
...
Theorem (Law of Large Numbers). Let X1 , X2 , X3 , . . . be a sequence of
independent random variables having the same finite mean and variance, i.e.,
E(X1 )P= E(X2 ) = ... = µ < ∞ and V ar(X1 ) = V ar(X2 ) = ... = σ 2 < ∞. If
n
Sn = i=1 Xi denotes the partial sums and X n = Sn /n the sample mean then
the LLN states that the sample mean converges to the mean when the size of
the sample grows. More precisely limn→∞ P (|X n − µ| < ε) = 1 for every ε > 0.
... note that LLN is saying that in the limit of X n = Sn /n is not a random
variable but a fixed, non random, number .....
... first proved by James Bernoulli ... justification of the frequency interpretation
...
... a special case of the CLT is ...
Theorem (de Moivre-Laplace). Let Xn be a of binomial random variables of
distribution
B(n, p). Let Yn = (np(1−p))−1/2 (Xn − np). Then limn→∞ P (Yn ≤
´y
y) = −∞ (2π)−1/2 exp(−x2 /2) dx.
proof. Note that Xn is a sum of n independent Bernoulli trials X1 . The
expectation of a product of independent
variables is the product of the expecn
tations hence E(eaXn ) = E aX1 . Using this we can write the moment genn
p
. Since
erating function of Yn as E(etYn ) = E exp t(X1 − p)/ np(1 − p)
X1 takes the value 1 with probability p and 0 with probability (1 − p) for the
latter expression we get
!n
(1 − p)t
−pt
p exp p
+ (1 − p) exp p
.
np(1 − p)
np(1 − p)
Expanding the two exponentials we get
n
t2
+ O(n−3/2 )
E(etYn ) = 1 +
2n
and using the fact that limn→∞ 1 + an−1 + o(n−1 )
lim E(etYn ) = et
2
n
= ea finally we obtain
/2
n→∞
which is the moment generating function of a standard normal distribution.
QED
...
Theorem (Central Limit Theorem). Let X1 , X2 , X3 , . . . be a sequence
of independent and identically distributed random variables each having finite expectation µ and variance σ 2 . Denote the sum of the sample of the
13
Financial Calculus
Probability Intro
version 01 (Feb’10)
first n by Sn = X1 + · · · + Xn and the sample mean by√X n = Sn /n. The
standardized
sample mean is given by Zn = (Sn − nµ)/σ n or Zn = (X n −
√
µ)/(σ/ n). The CLT states that as n approaches infinity Zn converge in distribution
to the standard normal distribution N (0, 1). I.e., limn→∞ P (Zn ≤ z) =
´z
−1/2
(2π)
exp(−x2 /2) dx.
−∞
...
...
14
Financial Calculus
Probability Intro
version 01 (Feb’10)
...
Example: Doob Martingale. Let X, X0 , X1 , . . . be any collection of jointly distributed random variables with E[X] < ∞. Define Mn = E(X|X0 , X1 , ..., Xn )
. Then by Jensen’s inequality,
E[|Mn |] = E[|E[X | X0 , X1 , ..., Xn ]|] ≤ E[E[|X| | X0 , X1 , ..., Xn ]] = E[|X|] < ∞
and
E[Mn | X0 , X1 , ..., Xn ] ≤ E[E[X| X0 , X1 , ..., Xn+1 ] | X0 , X1 , ..., Xn ] = E[X| X0 , X1 , ..., Xn ] = Mn
...
15
Financial Calculus
Probability Intro
version 01 (Feb’10)
....
Moments and Generating Functions.
Definition. The moments of a random variable X are given by mn := E(X n ),
where n = 0, 1, 2, . . . (Note that we always have m0 = 1.) The momentgenerating function of a random variable X is a real valued function of a
2
real argument t ∈ R given by MX (t) := E etX = 1 + tm1 + t2! m2 + · · · ,
n
d
hence we recover the moments as mn = n M (t)|t=0 . It can be written as
´ ∞dt
a Riemann–Stieltjes integral MX (t) = −∞ etx dF (x) where F is the cumulative distribution function. If X has a continuous probability density function
ƒ(x),
MX (−t) is the two-sided Laplace transform of ƒ(x), i.e., MX (t) =
´ ∞ then
tx
e
f
(x)
dx. The characteristic function ϕX (t) is related to the moment−∞
generating function via ϕX (t) = MiX (t) = MX (it), here i is the imaginary unit,
i.e., the characteristic function is the moment-generating function of iX, or the
moment generating function of X evaluated on the imaginary axis, or it is the
Fourier transform of the probability density function (hence the density function
can be obtained from the characteristic function by inverse Fourier transform.)
The probability generating function is defined as GX (s) := E(sX ), i.e.,
setting s = et we get GX (et ) = MX (t).
P
... of the integer valued random variable X is defined by GX (s) = k P (X =
k
X
k)s in general GX (s) = E(s ) .
...
Let X have a binomial distribution
with parameters n and p. Then, with
Pn
n n−k k k
q = 1−p as usual, G(s) = k=0
q
p s = (q + ps)n .
k
...
The most important property of the moment-generating function is that if two
distributions have the same moment-generating function, then they are identical
at all points. That is if for all values of t, we have MX (t) = MY (t), then
FX (x) = FY (x) for all values of x (or equivalently X and Y have the same
distribution).
...
16
Financial Calculus
Probability Intro
version 01 (Feb’10)
...
Brownian motion.
nice Wikipedia article on Normal distribution ....
...
... a(Wt − Ws ) ∼ N (0, a2 (t − s))
...
Definition. A Wiener process Wt starts at zero, i.e. W0 = 0, has continuous
paths, i.e., [0, T ] 3 t 7→ Wt (ω) ∈ R is continuous for every ω, has independent
increments for nonoverlapping time intervals, e.g., (Wt+s − Wt ) and Wt are
independent, and the increments are normally distributed with zero mean and
variance equal to the time increment, i.e., (Wt+s − Wt ) ∼ N (0, s).
Theorem. Wiener process exists.
...
Proposition. If Wt is a Wiener process then besides Wt also (Wt2 − t) and
exp aWt − 21 a2 t , for a any real number, are martingales.
proof. Let s ≤ t. For the conditional expectaion of something given the information up to time s write Es (. . . ) ≡ E(. . . |s) ≡ E(. . . |Fs ). To show that
Wt is a matingale write Es (Wt ) = Es (Ws + (Wt − Ws )). Apply linearity of
expectation, use E(Ws |s) = Ws , which follows from the definition of conditional expectation, and E(Wt − Ws |s) = 0, because (Wt − Ws ) ∼ N (0, t − s).
Thus we obtain E(Wt |s) = Ws hence Wt is a martingale. Now we check that
(Wt2 −t) = ((Wt −Ws +Ws )2 −t) is a martingale. Expanding and using linearity
we obtain Es (Wt2 −t) = Es ((Wt −Ws )2 −t)+2Es ((Wt −Ws )Ws )+Es (Ws2 ). The
first term gives (t − s) − t = (−s), the second term is zero,while the third term
gives Ws2 and we are done. To check that exp aWt − 21 a2 t is a martingale first
recall that that Wt − Ws and Ws are independent hence Es (ea(Wt −Ws +Ws ) ) =
Es (ea(Wt −Ws ) ) Es (eaWs ). Next recall that Es (Wt − Ws ) ∼ N (0, t − s) and that
the moment generating function of a normal random variable X ∼ N (0, t − s)
is given by eaX = exp(a2 (t − s)/2), i.e., Es (ea(Wt −Ws ) ) Es (eaWs ) = exp(a2 (t − s)/2) eaWs . Collecting everything we get Es exp aWt − 21 a2 t = exp aWs − 12 a2 s
and we are done.
.........
...
Proposition. If Xt is a continuous process with X0 = 0 such that Xt and
(Xt2 − t) are martingales then Xt is a Wiener process.
Exercise. Assume Wt is a Wiener process. Show that (a) Wt3 − 3tWt ,
Wt4 − 6tWt2 + 3t2 are martingales.
17
(b)
Financial Calculus
Probability Intro
version 01 (Feb’10)
...
Integration. Riemann and Riemann-Stieltjes.
...
´b
How should we understand the integral a f (x) dg(x)? (It is common to call
f the inegrand and g the inegrator.) If the integrator g is differentiable we
can write dg(x) = g 0 (x) dx and understand this integral as a Riemann integral
´b
f (x) g 0 (x) dx. If g is of bounded total variation then we can interpret this
a
integral as a Riemann-Stieltjes integral (see the following definitions).
Definition. The total variation
P of a real function g on an interval [a, b]
is defined to be V[a,b] (g) = sup i |g(xi+1 ) − g(xi )| where the supremum is
taken over all finite partitions of the interval [a, b]. If g is finite total variation
´b
the Riemann-Stieltjes integral a f (x) dg(x) is defined by taking a partition
a = x0 < x1P< · · · < xn+1 = b, choosing points x∗i ∈ [xi , xi+1 ], forming the
n
partial sum i=0 f (x∗i ) (g(xi+1 ) − g(xi )), and taking the limit (if it exists) as
the partitions become finer and finer. Note the fact, very important for what
will follow, that if the Riemann-Stieltjes integral exists it does not matter how
we choose x∗i ∈ [xi , xi+1 ].
If we try to use a Wiener process as an integrator we cannot use the Riemann
or the Riemann-Stieltjes interpretation due to the following important facts.
Proposition. For almost all ω the path t 7→ Wt (ω) is nowhere differentiable
and not of bounded variation.
Hence if we want to make sense of an integral with Brownian motion as integrator we have to define it in a new way. ...
...
Integration. Lebesgue.
...
Integration. L2 -spaces.
...
As a warm up let us describe the finite dimensional Euclidean spaces Rn as
a space of “square integrable function”. Denote n = {1, 2, . . . , n} the set with
n-elements. First consider the set {f : 1 → R} of all maps from a set with a
single element to R. This is just R itself. What is {f : 2 → R}? All possible
maps from a two element set to the reals is the same as a choice of two real
numbers, a first real number f (1) and a second real number f (2), i.e., an ordered
pair of reals, i.e., the two dimensional Euclidean space R2 , where all operations
are done pointwise, or as we would say in linear algebra coordinate wise. The
scalar product of two functions, or if you prefer two vectors, f and g is given
by hf, gi = f (1)g(1) + f (2)g(2) and
p the norm, or lenght of a vector f is given
by the Euclidean norm kf k2 = (f (1))2 + (f (2))2 . In the same way the we
18
Financial Calculus
Probability Intro
version 01 (Feb’10)
identify Rn with the space of functions {f : n → R} from an
set to
Pn-element
n
the reals.pThe scalar product and norm is given by hf, gi = i=1 f (i)g(i) and
Pn
2
kf k2 =
i=1 (f (i)) . In the cases when the functions are from some fixed
finite set we do not mention that kf k2 < ∞ because we have a finite sum and
this is obvious. The subscript 2 on the norm indicates that we are ....
and such spaces we can denote L2 (n), i.e., we can identify L2 (n) = Rn .
...
L2 (dt) ≡ L2 ([0, T ]) ≡ L2 ([0, T ], dt) is the space of functions f, g : [0, T ] → R
´T
with scalar product and norm given by hf, gi = 0 f (t)g(t) dt and kf k2 =
q´
T
(f (t))2 dt.
0
...
L2 (dP ) ≡ L2 (Ω) ≡ L2 (Ω, dP ) is the space ´of functions f, g : Ω → R with
scalar product and norm given by hf, gi = Ω f (ω)g(ω) dP (ω) = E(f g) and
q´
p
(f (ω))2 dP (ω) = E(f 2 ).
kf k2 =
Ω
... Hilbert space ...
...
L2 (dt × dP ) ≡ L2 ([0, T ] × Ω) ≡ L2 ([0, T ] × Ω, dt × dP )
„,
19
Financial Calculus
Probability Intro
version 01 (Feb’10)
...
Ito integral. Ito formula.
...
integrands : .... f ∈ L2 (dt × dP ) such that f is W -addapted, i.e., f (t, . ) is
Ft -adapted , i.e, measurable , where the filtaration is the natural filtration of
W .... ....................................................
MORE ............... natural filtration Ft = σ{Ws , s ≤ t} ... “history of W up to
time t”
´b
... for a time interval ∆ = (a, b] we denote I∆ (f )(ω) ≡ Iab (f )(ω) ≡ a f (t, ω) dWt (ω)
´b
or skipping the dependence on ω we often write I∆ (f ) ≡ Iab (f ) ≡ a f (t) dWt
and if the time interval is clear we write just I(f ).
Properties:
(a) additive on time intervals (if ∆1 and ∆2 are nonoverlapping time intervals)
´t
´t
´t
I∆1 ∪∆2 = I∆1 + I∆2 t13 . . . dWt = t12 . . . dWt + t23 . . . dWt for t1 ≤ t2 ≤ t3
´t
(b) t12 . . . dWt is linear
(c) E I(f ) = 0
(d) E Ias (f ) ≡ E
´s
a
f (t) dWt is Fs -measurable
(e) the stochastic process t 7→ E I0t (f ) is a martingale
...
...
´T
We start defining the Ito integral I(f )(ω) = 0 f (ω, t) dWt (ω) first for simple
functions f and then take limit. If (a, b] ⊂ [0, T ] and f = 1(a,b] we require that
ˆ
b
dWt = Wb − Wa .
I(1(a,b] ) =
t=a
Next we consider simple functions in the time variable, i.e.,
f (ω, t) =
n
X
ai (ω) 1i (t)
i=0
where we have some partition 0 = t0 < t1 < · · · < tn+1 = T and for short we
denote 1i = 1(ti ,ti+1 ] the indicator functions, and ai ∈ Fti and E(a2i ) < ∞. For
such functions define
I(f )(ω) =
n
X
ai (ω) (Wti+1 − Wti ) .
i=0
We have the Ito isometry
20
Financial Calculus
Probability Intro
version 01 (Feb’10)
...
Proposition. Ito isometry. For f square integrable we have
kI(f )kL2 (dP ) = kf kL2 (dP ×dt)
proof. For simple f of the form above we have
f 2 (ω, t) =
n
X
a2i (ω) 1i (t)
i=0
and the r.h.s. of the Ito isometry is
ˆ
!
T
2
f (ω, t) dt
E
=
0
n
X
E(a2i ) ∆i t
i=0
where ∆i t = (ti+1 − ti ). while for the l.h.s. note that ai is independent ... ...
For the square of the Ito integral we have
!2
n
n
n
X
X
X
2
ai aj ∆i W ∆j W
a2i (∆i W )2 + 2
I(f ) =
ai ∆i W
=
i=0
i=0
0≤i<j≤n
Next we take expecation of the above and using the linearity of expectation we
can push it inside all the sums and we have to consider the terms with squares
E(a2i (∆i W )2 ) and the mixed terms E(ai aj ∆i W ∆j W ). For short let us write
for the conditional expectation Ej (. . . ) ≡ E(. . . |Fj ). We have in particular
E0 (. . . ) = E(. . . ) and the tower property E(. . . ) = E(Ej (. . . )). For the term
with squares we use the tower property, then taking out of the conditional
expectation Ei what is known up to time j, then use Ei (∆i W )2 = ∆i t we
obtain
E(a2i (∆i W )2 ) = E(Ei (a2i (∆i W )2 )) = E(a2i Ei (∆i W )2 ) = ∆i t Ea2i .
For the mixed term also apply the tower property, take out what is know, and
use Ej ∆j W = 0, to obtain
E(ai aj ∆i W ∆j W ) = E(Ej (ai aj ∆i W ∆j W )) = E(ai aj ∆i W Ej ∆j W ) = 0 .
Hence we obtain
n
X
E I(f )2 =
E(a2i ) ∆i t
i=0
Kuo, page 44 Etheridge page .... QED.
...
Example.
´T
0
Wt dWt = (WT2 − T )/2
21
Financial Calculus
Probability Intro
version 01 (Feb’10)
proof.
Partition the interval [0, T ] into n equal intervals of length ∆t, hence T = n ∆t,
or denoting ti = i ∆t we have the partition {t0 < t1 < · · · < tn = T }, the i-th
time interval is from ti to ti+1 . For short write Wi ≡ Wti . For the increment of
the Wiener process of the i-th time interval write ∆i W ≡ Wi+1 − Wi . We want
to verify
!2
n−1
(WT2 − T )2 X
−
lim E
W i ∆i W
=0
n→∞
2
i=0
Pn−1
or if for short we denote V ≡ (WT2 − T )2 /2 and Vn ≡ i=0 Wi ∆i W , we are
after limn→∞ E(V − Vn )2 = 0.
Using polarization, i.e., writing a product as a combination of squares 2ab =
(a + b)2 − a2 − b2 , and noting that Wi + ∆i W = Wi+1 we have
n−1
X
2Vn =
2
Wi+1
−
i=0
Note that
Pn−1
i=0
n−1
X
E
i=0
!
2
(∆i W )
Wi2 −
n−1
X
(∆i W )2
i=0
Pn−1
2
Wi+1
−
n−1
X
=
i=0
i=0
Wi2 = WT2 − W02 = WT2 ...
n−1
X
E(∆i W )2 =
i=0
n−1
X
∆i t = T − 0 = T
i=0
we have obtained up to now:
2(V − Vn ) = T −
n−1
X
(∆i W )2
i=0
...
lim E
n→∞
T−
n−1
X
T−
(∆i W )2
(∆i W )
2
= T − 2T
n−1
X
n−1
X
i=0
!2
(∆i W )2
2
(∆i W ) +
=
n−1
X
n−1
X
i=0
i6=j,i,j=0
(∆i W )4 + 2
n−1
X
!2
2
(∆i W )
i=0
i=0
i=0
and
=0
i=0
!2
2
!2
n−1
X
(∆i W )(∆j W )
Now we have to take expectation, use linearity and separately consider each
term
22
Financial Calculus
Probability Intro
version 01 (Feb’10)
E(∆i W )2 = ∆i t
E(∆i W )4 = 3(∆i t)2
E((∆i W )2 (∆j W )2 ) = ∆i t ∆j t
..........
E
T−
n−1
X
!2
2
(∆i W )
= T 2 − 2T n∆ + 3n∆2 + 2
i=0
n(n − 1)
∆∆
2
T 2 − 2T n∆ + 3n∆2 + nn∆∆ − n∆∆ = T 2 − 2T T + 3T ∆ + T T − T ∆ = 2T ∆
and because as n → ∞ the interval ∆ → 0 we are done.
lim E
n→∞
T−
n−1
X
(∆i W )
i=0
QED.
...
(dWt )2 = dt
23
!2
2
=0
Financial Calculus
Probability Intro
version 01 (Feb’10)
...
http://en.wikipedia.org/wiki/Binomial_distribution
In general, if the discrete random variable K follows the binomial distribution
with parameters n and p, we write K ∼ B(n, p). The probability of getting
exactly k successes in n trials is given by the probability mass function P (K =
k) = nk pk (1 − p)n−k . ... mean np and variance np(1 − p). ... This fact is easily
proven as follows. Suppose first that we have a single Bernoulli trial. There are
two possible outcomes: 1 and 0, the first occurring with probability p and the
second having probability 1 − p. The expected value in this trial will be equal
to µ = 1 · p + 0 · (1−p) = p. The variance in this trial is calculated similarly:
σ 2 = (1−p)2 · p + (0−p)2 · (1−p) = p(1 − p). The generic binomial distribution
is a sum of n independent Bernoulli trials. The mean and the variance of such
distribution are equal toPthe sums of means and
Pnvariances of each individual
n
trial, i.e., we have µn = k=1 µ = np and σn2 = k=1 σ 2 = np(1 − p).
The binomial distribution B(n, p) has moment generating function M (t) = (1 −
p + pet )n . The normal distribution N (µ, σ 2 ) has moment generating fucntion
M (t) = exp(tµ + 21 σ 2 t2 ).
Let X ∼ B(n, p). Then as n approaches ∞ while p remains fixed, the distribuX − np
tion of p
approaches the normal distribution with expected value 0
np(1 − p)
and variance 1 (this is just a specific case of the Central Limit Theorem).
...
...
24
Financial Calculus
Probability Intro
version 01 (Feb’10)
Bibliography.
Grinstead & Snell’s Introduction to Probability (GNU license book 2006)
..
..
...
25
Financial Calculus
Probability Intro
version 01 (Feb’10)
Concept check.
probability space (Ω, F, P ): sample space Ω; collection of events, or σ-algebra
or σ-field of events, or information set F (a collection of sets closed under ...);
probability measure P (main property of measure is additivity, i.e., ...)
finite algebras of events generated by their atoms, examples ....
conditional probability ....
cartesian product A1 ×A2 and projections on Ai ; product of one space with itself
n-times An = A × · · · × A viewed as all functions {1, . . . , n} → A ; independent
events; product of two probability spaces ...
the Borel σ-algebra B on the reals is generated by ...
random variable X : Ω → R is a measurable function ...; the information set, or
σ-algebra generated by X, is σ(X) ...; examples in finite cases ...
a random variable induces a probability distribution on the reals given by FX
...; probability density and mass function ...; properties of F ...; examples ...
normal distribution ....
binomial distribution ...
independent random variables ...; law of large numbers and central limit theorem
...
expectation ...; integral over source or integral over target ...; properties: linear;
normalized to one; multiplicative on independent variables; examples ...
stochastic process (discrete case); trees; filtration of information sets; random
walk and Bernoulli process
conditional expectation as a random variable ... .... .... ....
properties of conditional expectation: linearity ...; tower ...; take out what is
known ...; ...
conditional expectation with respect to a filtration; discrete martingales
BS model on a tree; European option; hedging portfolio; riskless (or martingale)
measure; fair price as expectation
Brownian motion (main property – independent normal increments...); as limit
of random walk; ...
Ito integral
Ito calculus
26