Hilbert Space Embeddings and Dynamical Systems S.V.N. Vishwanathan [email protected] National ICT Australia, Australian National University and Indian Institute of Science Joint work with Alex Smola S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 1 Classification Data: Pairs of observations (xi, yi) generated from some distribution P(x, y), e.g., (blood status, cancer), (credit transaction information, fraud), (sound profile of jet engine, defect). Task: Predict y given x at a new location. Modification: Find a function f (x) that does the task. S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 2 Optimal Separating Hyperplane 1 Minimize kwk2 subject to yi(hw, xii + b) ≥ 1 for all i. 2 S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 3 Kernels and Nonlinearity Problem: Linear functions are often too simple to provide good estimators. Idea 1: Map to a higher dimensional feature space via Φ : x → Φ(x) and solve the problem there. Replace every hx, x0i by hΦ(x), Φ(x0)i. Idea 2: Instead of computing Φ(x) explicitly use a kernel function k(x, x0) := hΦ(x), Φ(x0)i. A large class of functions are admissible as kernels. Non-vectorial data can be handled if we can compute meaningful k(x, x0). S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 4 The Basic Idea Key Observation: Trajectories are easily observable. Similar trajectories ⇒ similar systems. Restrict attention to interesting cases. Kernels Using Dynamical Systems: Simulate system for both inputs. Similar time evolution ⇒ similar inputs. Kernels on Dynamical Systems: Restrict to interesting initial conditions. Simulate both the systems. Similar time evolution ⇒ similar systems. S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 5 Notation X - state space (Hilbert space). A - time evolution operators. T - time of measurement. µ - nice probability measure on T . Discounting Factors: For some λ > 0 µ(t) = λ−1e−λt for T = R+ 0 −λt e µ(t) = for T = N0 . −λ 1−e Time Evolution: We study xA(t) := A(t)x for A ∈ A S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 6 Trajectories and Kernels Comparing Trajectories: Define a Hilbert space on XT via hθ, θ0i := Eµ[hθ(t), θ0(t)i] for θ, θ0 ∈ XT . Extending to Dynamical Systems: Identify a dynamical system with its trajectory and define k((x, A), (x̃, Ã)) := Eµ hA(t)x, Ã(t) x̃i . Defining Distances: A Mercer kernel k is a dot product in feature space. Distances can be defined as d((x, A), (x̃, Ã))2 := k((x, A), (x, A)) + k((x̃, Ã), (x̃, Ã)) − 2k((x, A), (x̃, Ã)). S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 7 Special Cases Kernels on Dynamical Systems: Restrict attention to x = x̃. Compare trajectory for identical initial conditions. Take expectation if interested in a range of x. k(A, Ã) := Ex k((x, A), (x, Ã)) . More generally k(A, Ã) := EA EÃ Ex k((x, A), (x, Ã)) . Kernels Using Dynamical Systems: Restrict attention to a particular Dynamical system. As before we can take expectations over A. k(x, x̃) := Ex Ex̃ EA [k((x, A), (x̃, A))] S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 8 Discrete Linear Systems Linear Systems: We assume time propagation occurs as xA(t + 1) = A xA(t) + at + ξt For simplicity assume that at = 0 and hence xA(t) = At x0 + t X At−i ξi. i=0 Our Kernel: Using an exponential down-weighting scheme k((A, x), (Ã, x̃)) = tr(C M̃ ) + x> 0 M x̃0 . C is the covariance matrix of random variables ξt. S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 9 Computing the Kernel More on M and M̃ : The matrix M and M̃ look like "∞ # X t M := e−λt(At)> Ã t=0 and M̃ := tr ∞ h X t e−λt(At)>M Ã i t=0 Sylvester Equation: Both M and M̃ satisfy the Sylvester equation e−λ A> M Ã + 1 = M and e−λ A> M̃ Ã +M = M̃ Can be solved for in cubic time. S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 10 Continuous Linear Systems Linear Systems: Consider systems with dynamics described by d xA(t) = A xA(t) + a(t) + ξ(t). dt Here ξ(t) with E[ξ(t)] = 0 is a stochastic process and Z t xA(t) = exp(A t)x0 + exp(A(t − τ ))(a(τ ) + ξ(τ ))dτ. 0 Our Kernel: We consider a(t) = ξ(t) = 0 and define Z ∞ k((x, A), (x̃, Ã)) = λ−1 e−λthexp(A t)x0, exp(Ã t)x00idt. 0 Which again can be cast as a Sylvester equation. S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 11 Graph Kernels Diffusion Process: We can define a diffusion process by d x(t) = Lx(t). dt Undirected Graphs (Kondor and Lafferty, 2002): Here L is symmetric and hence yields K = exp(2LT ) Labeled Graphs (Gärtner, 2002): Insert a normalizing matrix W in all equations. Set Wij = 1 if two nodes have same label. For other fancy weights see (Kashima et al, 2003). S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 12 Conclusion A new method to embed dynamical systems. Analytical solutions to linear systems. Many graph kernels are special cases. Extensions to Automata possible. Analytical solutions require cubic time. Are better solutions possible for special cases? S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 13 Questions? S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 14 Shameless Plug We are hiring at NICTA. Please contact S V N Vishwanathan ([email protected]) Alex Smola ([email protected]) S.V.N. Vishwanathan: Hilbert Space Embeddings and Dynamical Systems, Page 15
© Copyright 2026 Paperzz