Hilbert Space Embeddings and Dynamical Systems

Hilbert Space Embeddings and
Dynamical Systems
S.V.N. Vishwanathan
[email protected]
National ICT Australia,
Australian National University
and
Indian Institute of Science
Joint work with Alex Smola
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 1
Classification
Data: Pairs of observations (xi, yi) generated from some distribution P(x, y), e.g., (blood status, cancer), (credit transaction information, fraud), (sound profile of jet engine, defect).
Task: Predict y given x at a new location.
Modification: Find a function f (x) that does the task.
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 2
Optimal Separating Hyperplane
1
Minimize kwk2 subject to yi(hw, xii + b) ≥ 1 for all i.
2
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 3
Kernels and Nonlinearity
Problem: Linear functions are often too
simple to provide good estimators.
Idea 1: Map to a higher dimensional
feature space via Φ : x → Φ(x) and
solve the problem there. Replace every hx, x0i by hΦ(x), Φ(x0)i.
Idea 2: Instead of computing Φ(x) explicitly use a kernel function
k(x, x0) := hΦ(x), Φ(x0)i.
A large class of functions are admissible as kernels.
Non-vectorial data can be handled if
we can compute meaningful k(x, x0).
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 4
The Basic Idea
Key Observation:
Trajectories are easily observable.
Similar trajectories ⇒ similar systems.
Restrict attention to interesting cases.
Kernels Using Dynamical Systems:
Simulate system for both inputs.
Similar time evolution ⇒ similar inputs.
Kernels on Dynamical Systems:
Restrict to interesting initial conditions.
Simulate both the systems.
Similar time evolution ⇒ similar systems.
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 5
Notation
X - state space (Hilbert space).
A - time evolution operators.
T - time of measurement.
µ - nice probability measure on T .
Discounting Factors:
For some λ > 0
µ(t) = λ−1e−λt for T = R+
0
−λt
e
µ(t) =
for T = N0 .
−λ
1−e
Time Evolution:
We study
xA(t) := A(t)x for A ∈ A
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 6
Trajectories and Kernels
Comparing Trajectories:
Define a Hilbert space on XT via
hθ, θ0i := Eµ[hθ(t), θ0(t)i] for θ, θ0 ∈ XT .
Extending to Dynamical Systems:
Identify a dynamical system with its trajectory and define
k((x, A), (x̃, Ã)) := Eµ hA(t)x, Ã(t) x̃i .
Defining Distances:
A Mercer kernel k is a dot product in feature space.
Distances can be defined as
d((x, A), (x̃, Ã))2 :=
k((x, A), (x, A)) + k((x̃, Ã), (x̃, Ã)) − 2k((x, A), (x̃, Ã)).
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 7
Special Cases
Kernels on Dynamical Systems:
Restrict attention to x = x̃.
Compare trajectory for identical initial conditions.
Take expectation if interested in a range of x.
k(A, Ã) := Ex k((x, A), (x, Ã)) .
More generally
k(A, Ã) := EA EÃ Ex k((x, A), (x, Ã)) .
Kernels Using Dynamical Systems:
Restrict attention to a particular Dynamical system.
As before we can take expectations over A.
k(x, x̃) := Ex Ex̃ EA [k((x, A), (x̃, A))]
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 8
Discrete Linear Systems
Linear Systems:
We assume time propagation occurs as
xA(t + 1) = A xA(t) + at + ξt
For simplicity assume that at = 0 and hence
xA(t) = At x0 +
t
X
At−i ξi.
i=0
Our Kernel:
Using an exponential down-weighting scheme
k((A, x), (Ã, x̃)) = tr(C M̃ ) + x>
0 M x̃0 .
C is the covariance matrix of random variables ξt.
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 9
Computing the Kernel
More on M and M̃ :
The matrix M and M̃ look like
"∞
#
X
t
M :=
e−λt(At)> Ã
t=0
and
M̃ := tr
∞ h
X
t
e−λt(At)>M Ã
i
t=0
Sylvester Equation:
Both M and M̃ satisfy the Sylvester equation
e−λ A> M Ã + 1 = M and e−λ A> M̃ Ã +M = M̃
Can be solved for in cubic time.
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 10
Continuous Linear Systems
Linear Systems:
Consider systems with dynamics described by
d
xA(t) = A xA(t) + a(t) + ξ(t).
dt
Here ξ(t) with E[ξ(t)] = 0 is a stochastic process and
Z t
xA(t) = exp(A t)x0 +
exp(A(t − τ ))(a(τ ) + ξ(τ ))dτ.
0
Our Kernel:
We consider a(t) = ξ(t) = 0 and define
Z ∞
k((x, A), (x̃, Ã)) = λ−1
e−λthexp(A t)x0, exp(Ã t)x00idt.
0
Which again can be cast as a Sylvester equation.
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 11
Graph Kernels
Diffusion Process:
We can define a diffusion process by
d
x(t) = Lx(t).
dt
Undirected Graphs (Kondor and Lafferty, 2002):
Here L is symmetric and hence yields
K = exp(2LT )
Labeled Graphs (Gärtner, 2002):
Insert a normalizing matrix W in all equations.
Set Wij = 1 if two nodes have same label.
For other fancy weights see (Kashima et al, 2003).
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 12
Conclusion
A new method to embed dynamical systems.
Analytical solutions to linear systems.
Many graph kernels are special cases.
Extensions to Automata possible.
Analytical solutions require cubic time.
Are better solutions possible for special cases?
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 13
Questions?
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 14
Shameless Plug
We are hiring at NICTA.
Please contact
S V N Vishwanathan ([email protected])
Alex Smola ([email protected])
S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 15

Download Report

Hilbert Space Embeddings and Dynamical Systems

Paperzz.com

Your Paperzz