1966 Martingale Transforms D. L. Burkholder The Annals of

1966
1966
Martingale Transforms
D. L. Burkholder
The Annals of Mathematical Statistics, Vol. 37, No. 6. (Dec., 1966), pp.
1494-1504.
So, what is a martingale transform? Let fn be a martingale and dn be
the corresponding martingale difference: dn = fn − fn−1 . Think about it
as prices and the changes in prices. Then a martingale transform is gn =
P
n
i=1 ci di for a certain sequence ci . You can think about it as the wealth
process for a certain predetermined trading strategy.
The following example shows that a martingale transform of an L1 -bounded
martingale may be unbounded.
Let the probability space is the set of all positive integers and the measure
is given by the formula
Pr{k} =
1
1
−
.
k k+1
Let
fn (k) = {
n, if n < k,
−1, if n ≥ k,
where n = 0, . . . .
The sequence fn is an L1 -bounded martingale:
Efn = n Pr{k > n} + (−1) Pr{k ≤ n}
1
1
− (1 −
)
= n
n+1
n+1
= 0.
E|fn | = n Pr{k > n} + Pr{k ≤ n}
1
1
= n
+ (1 −
)
n+1
n+1
< 2.
In particular, by one of Doob’s theorems, the martingale converges almost surely to −1.
Clearly,
1, if n < k,
dn (k) = { −k if n = k,
0 if n > k,
where n = 1, . . .
1
2
Consider the sequence gn which is the martingale transform of fn with
the multiplier sequence 1, −1, 1, . . .. Then
gn (k) = {
[1 + (−1)n−1 ]/2,
if n < k,
[1 + (−1)k ]/2 + (−1)k k, if n ≥ k.
This is an unbounded martingale:
Egn =
1 + (−1)n−1
Pr{k > n}
2
n
X
1 + (−1)k
+
+ (−1)k k] Pr{k}
[
2
k=1
= 0.
However,
E|gn | ∼
n
X
k Pr{k} ∼ log n.
k=1
Of course, this martingale is still convergent almost surely.
Intuitively, in this example there is a random crash date. Before the crash,
the wealth of the gambler grows steadily in an almost non-random fashion
provided that she puts 1 unit of capital every period in the market. At the
crash date she loses everything and stays at –1 forever. (This reminds the
recent financial history of US except the staying at -1 forever.)
The gambler who uses the strategy 1, −1, 1, . . . is not very wise from
the traditional financial point of view. He bets periodically for and against
the market despite the clear trend. As a consequence, his wealth fluctuates
around zero until finally at the crash date he either earns a huge amount of
money in the case when he has a short position or loses an equally huge
amount of money and commits suicide in the case he is long. (After that,
his wealth or rather the logarithm of his wealth is constant at a rather large
negative number.)
The results of Burkholder show that what is important for the eventual
convergence of the wealth process is not its L1 -boundedness which not always holds in realistic situations, but rather that the wealth is a bounded
transform of a bounded martingale process. Intuitively, this should apply to
a very wide class of financial markets. (In certain situation the price process
itself may become unbounded and in this case there is little hope for almost
sure convergence.)
Theorem 1. Suppose that g is a transform of an L1 -bounded martingale f .
Then g converges almost everywhere on the set where the maximal function
c∗ of the multiplier sequence c is finite.
3
Proof goes in several steps. First it is shown that the bounded transform
of L2 -bounded martingale is also L2 -bounded so it is absolutely surely convergent. Then it is shown that if g is a bounded transform of uniformly
bounded sub-martingale, then it is convergent almost surely. Finally, every martingale can be represented as difference of non-negative martingales
and for each of those we can define fbn = − min(fn , C) This is a uniformly
bounded sub-martingale and the theorem follows when C is sent to infinity.
Now, define
∞
X
S(f ) = (
d2n )1/2 .
n=1
Theorem 2. If f is a martingale such that ES(f ) < ∞, then f converges
almost everywhere.
Proof is based on an ingenious argument that shows that f can be represented as a martingale transform of a bounded martingale.
Let rk (t) be the Rademacher functions on the unit interval. Then,
Z 1 X
Z 1 X
n
n
|
rk (t)dk |2 dt]1/2
E|
rk (t)dk |dt ≤ E[
0
0
k=1
k=1
= ESn (f ).
Here dk are considered as Fourier coefficients, and the equality in the last
line is the Parseval equality. It follows that
Z 1
n
X
rk (t)dk |dt ≤ ES(f ).
sup E|
0
n
k=1
Consequently for some t,
sup E|
n
n
X
rk (t)dk | ≤ ES(f ).
k=1
P
For this t, the sum nk=1 rk (t)dk defines an L1 -bounded martingale gn
and fn is a martinale transform of this bounded martingale. Therefore the
conclusion of the theorem follows from Theorem 1.
The next theorem gives another operational criterion for absolutely sure
convergence.
Theorem 3. Suppose that f and g are martingales relative to the same
sequence of sigma fields. If f is L1 -bounded and Sn (g) ≤ Sn (f ) for all
n ≥ 1, then g converges almost everywhere.
4
Radon-Nikodym Derivatives of Gaussian Measures
L. A. Shepp
The Annals of Mathematical Statistics, Vol. 37, No. 2. (Apr., 1966), pp.
321-354.
The paper gives a formula for the Radon-Nikodym derivative (i.e., likelihood ratio function) of one Wiener-equivalent measure relative to another.
First of all, we need a condition for the equivalence of Wiener measures.
Suppose that µ is a measure with mean m and covariance R, and let µW be
the standard Wiener measure.
Theorem 4. µ ∼ µW if and only if there exists a kernel K ∈ L2 (R+ × R+ ),
for which
Z Z
s
t
R(s, t) = min(s, t) −
K(u, v)dudv,
0
0
and whose spectrum does not contain 1, and a function k ∈ L2 (R+ ) for
which
Z t
m(t) =
k(u)du.
0
The kernel K is unique and symmetric and is given by
∂ ∂
K(s, t) = −
R(s, t)
∂s ∂t
for almost every (s, t). The function k is unique and is given by k(t) = m0 (t)
for almsot every t.
The conditin on the spectrum is essentially the condition of positive definiteness of R.
Assume now that µ is equivalent to µW . Since 1 does not belong to the
spectrum we can define the resolvent H:
H = (I − K)−1 .
The formula for the Radon-Nikodym derivative is given by the following
theorem.
Theorem 5. If K is continuous and of trace class, then
Z Z
dµ
1
1 T T
(X) = √ exp[−
H(s, t)dX(s)dX(t)
dµW
2 0 0
d
Z T
Z
1 T 2
+
k(u)dX(u) +
k (u)du],
2 0
0
where
Y
d=
(1 − λj ),
and λj are eigenvalues of K.
5
This theorem can be thought of as a certain variant of the Ito formula.
This version is especially suitable for estimation problems.
One example is a shift of the Wiener measure. Let
Z(t) = −αt + W (t).
Then
1 2
dµZ
(X) = e−aX(T ) e− 2 a T .
dµW
This formula allows us to calculate probabilities for the shifted measure
from the corresponding probabilities for the original Wiener measure.
For another example, consider the time-changed Wiener process:
1
W (h(t))
Z(t) = p
h0 (t)
The Radon-Nicodym derivative of the measure defined by this process with
respect to the original measure is given in the following theorem.
Theorem 6. Z ∼ W if and only if g = h−1/2 is absolutely continuous
and g 0 ∈ L2 . For smooth h, the Radon-Nikodym derivative is given by the
formula:
s
Z
dµZ
h0 (T )
X 2 (T ) h00 (T ) 1 T
exp{−
−
f (t)X 2 (t)dt},
(X) =
dµW
h0 (0)
4 h0 (T )
2 0
where
1 h00
1 h00
f = − ( 0 )0 + ( 0 )2 .
2 h
4 h
With the help of the Radon-Nikodym derivatives, it is possible to calculate certain Wiener integrals for the exponentials of quadratic forms. Let
Z
1 T
A(f ) = EW exp(−
f (t)X 2 (t)dt).
2 0
We want to calculate A(f ) explicitly.
Define g by the following equation:
d2 g
= f g, g 0 (T ) = 0, g > 0 on [0, T ).
dt2
Then,
s
A(f ) =
g(T )
.
g(0)
If there is no positive solution then A(F ) = +∞.
(This problem can also be approached with the use of the Feynman-Kac
formula.)
6
Pooling Cross Section and Time Series Data in the Estimation of a
Dynamic Model: The Demand for Natural Gas
Pietro Balestra; Marc Nerlove
Econometrica, Vol. 34, No. 3. (Jul., 1966), pp. 585-612.
This is the seminal paper about panel data with dynamic time structure.
Dynamics means that the lags of dependent variables are included as explanatory variables. The paper points out that the regular pooled OLS gives
inconsistent estimates because of the dynamic structure. It suggests overcoming the difficulty by a two-stage procedure. In the first stage a consistent estimate of the parameters is obtained by an instrumental regression
that avoids using lagged dependent variables. Next, the structure of covariance matrix is estimated, including the degree of dynamic correlation.
Finally, the regression is estimated by a maximal likelihood, GLS, or a similar method.
The method are illustrated on an example of estimation consumer’s demand for gas.