Model-Free Stochastic Perturbative Adaptation and Optimization Gert Cauwenberghs Johns Hopkins University [email protected] 520.776 Learning on Silicon http://bach.ece.jhu.edu/gert/courses/776 G. Cauwenberghs 520.776 Learning on Silicon Model-Free Stochastic Perturbative Adaptation and Optimization OUTLINE • Model-Free Learning – Model Complexity – Compensation of Analog VLSI Mismatch • Stochastic Parallel Gradient Descent – Algorithmic Properties – Mixed-Signal Architecture – VLSI Implementation • Extensions – Learning of Continuous-Time Dynamics – Reinforcement Learning • Model-Free Adaptive Optics – AdOpt VLSI Controller – Adaptive Optics “Quality” Metrics – Applications to Laser Communication and Imaging G. Cauwenberghs 520.776 Learning on Silicon The Analog Computing Paradigm • Local functions are efficiently implemented with minimal circuitry, exploiting the physics of the devices. • Excessive global interconnects are avoided: – Currents or charges are accumulated along a single wire. – Voltage is distributed along a single wire. Pros: – – – – Massive Parallellism Low Power Dissipation Real-Time, Real-World Interface Continuous-Time Dynamics Cons: – Limited Dynamic Range – Mismatches and Nonlinearities (WYDINWYG) G. Cauwenberghs 520.776 Learning on Silicon Effect of Implementation Mismatches INPUTS SYSTEM {p i} OUTPUTS ε(p) REFERENCE Associative Element: – Mismatches can be properly compensated by adjusting the parameters pi accordingly, provided sufficient degrees of freedom are available to do so. Adaptive Element: – Requires precise implementation – The accuracy of implemented polarity (rather than amplitude) of parameter update increments ∆pi is the performance limiting factor. G. Cauwenberghs 520.776 Learning on Silicon Example: LMS Rule A linear perceptron under supervised learning: = Σ p ij x j(k ) (k ) yi j ε = 1 Σ Σ ( y itarget ( k ) - y i(k ) )2 2 k j with gradient descent: ∆p ij(k ) ∂ε (k ) = -η x j(k )⋅ y itarget ( = -η ∂p ij k) - y i(k ) reduces to an incremental outer-product update rule, with scalable, modular implementation in analog VLSI. G. Cauwenberghs 520.776 Learning on Silicon Incremental Outer-Product Learning in Neural Nets xj xi pij j i ei ej Multi-Layer Perceptron: x i = f(Σ p ij x j) j Outer-Product Learning Update: ∆p ij = η x j⋅ e i – Hebbian (Hebb, 1949): e i = xi – LMS Rule (Widrow-Hoff, 1960): e i = f 'i⋅ x itarget - x i – Backpropagation (Werbos, Rumelhart, LeCun): e j = f 'j⋅ Σ p ij e i i G. Cauwenberghs 520.776 Learning on Silicon Gradient Descent Learning Minimize ε(p) by iterating: (k ) ∂ ε p i(k + 1) = p i(k ) - η ∂p i from calculation of the gradient: ∂ε =Σ ∂p i l Σm ∂ ε ∂y l ∂ x m ⋅ ⋅ ∂y l ∂x m ∂p i Implementation Problems: – Requires an explicit model of the internal network dynamics. – Sensitive to model mismatches and noise in the implemented network and learning system. – Amount of computation typically scales strongly with the number of parameters. G. Cauwenberghs 520.776 Learning on Silicon Gradient-Free Approach to Error-Descent Learning Avoid the model sensitivity of gradient descent, by observing the parameter dependence of the performance error on the network directly, rather than calculating gradient information from a preassumed model of the network. Stochastic Approximation: – Multi-dimensional Kiefer-Wolfowitz (Kushner & Clark 1978) – Function Smoothing Global Optimization (Styblinski & Tang 1990) – Simultaneous Perturbation Stochastic Approximation (Spall 1992) Hardware-Related Variants: – – – – – Model-Free Distributed Learning (Dembo & Kailath 1990) Noise Injection and Correlation (Anderson & Kerns; Kirk & al. 1992-93) Stochastic Error Descent (Cauwenberghs 1993) Constant Perturbation, Random Sign (Alspector & al. 1993) Summed Weight Neuron Perturbation (Flower & Jabri 1993) G. Cauwenberghs 520.776 Learning on Silicon Stochastic Error-Descent Learning Minimize ε(p) by iterating: p (k +1) = p (k ) – µε (k ) (k ) π from observation of the gradient in the direction of π(k): ε (k ) = 1 2 ε(p (k ) + π (k ) ) – ε(p (k ) –π (k ) ) with random uncorrelated binary components of the perturbation vector π(k): π i(k ) = ±σ ; E( π i(k ) π j(l ) ) ≈ σ 2 δij δkl Advantages: – – – – – No explicit model knowledge is required. Robust in the presence of noise and model mismatches. Computational load is significantly reduced. Allows simple, modular, and scalable implementation. Convergence properties similar to exact gradient descent. G. Cauwenberghs 520.776 Learning on Silicon Stochastic Perturbative Learning Cell Architecture φ (t) ^ (t) –η ε φ(t) πi(t) Σ z –1 pi(t) NETWORK Σ pi(t) + φ(t) πi(t) –η ε^ (t) ε (p(t) + φ(t) π(t)) LOCAL GLOBAL p (k +1) = p (k ) – µε G. Cauwenberghs (k ) (k ) π ε (k ) = 1 2 ε(p (k ) + π (k ) ) – ε(p (k ) –π (k ) ) 520.776 Learning on Silicon Stochastic Perturbative Learning Circuit Cell V σ+ πi EN p V σ– πi πi V bp Cperturb sign( ε) ^ POL pi(t) + φ(t) πi(t) V bn πi G. Cauwenberghs Cstore EN n 520.776 Learning on Silicon Charge Pump Characteristics (b) EN p V bp Iadapt POL V stored ∆Q adapt C 100 V bn EN n stored (V) ∆t = 40 msec 1 msec 10-1 Voltage Increment ²V Voltage Decrement ²V stored (V) (a) 10-2 23 µsec 10-3 10-4 ∆t = 0 10-5 0 0.1 G. Cauwenberghs 0.2 0.3 0.4 0.5 0.6 100 ∆t = 40 msec 10-1 1 msec 10-2 10-3 23 µsec 10-4 10-5 0 0.1 0.2 0.3 0.4 Gate Voltage V bn (V) Gate Voltage V bp (V) (a) (b) 0.5 0.6 520.776 Learning on Silicon Supervised Learning of Recurrent Neural Dynamics BINARY QUANTIZATION Q(.) Q(.) Q(.) Q(.) Q(.) Q(.) TEACHER FORCING π1H W 11 UPDATE ACTIVATION AND PROBE MULTIPLEXING π2H π3H π4H π5H W 21 W 12 W 13 W 14 W 16 x1 x2 W 31 W 56 W 61 θ2 W off π1V G. Cauwenberghs T x2 (t) x4 W 51 W T x1 (t) x3 W 41 θ1 x1(t) x2(t) W 22 π6H π0H W 15 π2V θ3 W off π3V θ4 W off π4V W 65 W 66 θ5 θ6 W off π5V W off π6V x5 x6 + I ref DYNAMICAL SYSTEM – off I ref y(t) dx = F(p , x, y ) dt x(t) z = G(x ) 520.776 Learning on Silicon z(t) The Credit Assignment Problem or How to Learn from Delayed Rewards INPUTS SYSTEM {p i } OUTPUTS r*(t) ADAPTIVE CRITIC r(t) – External, discontinuous reinforcement signal r(t). – Adaptive Critics: • • • • • G. Cauwenberghs Discrimination Learning (Grossberg, 1975) Heuristic Dynamic Programming (Werbos, 1977) Reinforcement Learning (Sutton and Barto, 1983) TD(λ) (Sutton, 1988) Q-Learning (Watkins, 1989) 520.776 Learning on Silicon Reinforcement Learning (Barto and Sutton, 1983) Locally tuned, address encoded neurons: χ(t) ∈ {0,...2n –1} : n–bit address encoding of state space y (t) = y χ(t) : classifier output q (t) = q χ(t) : adaptive critic Adaptation of classifier and adaptive critic: y k (t+1) = y k (t) + α r(t) e k (t) y k (t) q k (t+1) = q k (t) + β r(t) e k (t) – eligibilities: e k (t+1) = λ e k (t) + (1 – λ) δk χ(t) – internal reinforcement: r(t) = r(t) + γ q (t) – q (t – 1) G. Cauwenberghs 520.776 Learning on Silicon Reinforcement Learning Classifier for Binary Control State Eligibility SELhor SELvert Vbp UPD Vαp ek Vδ ^ r qvert qk UPD Vbn Neuron Select (State) Vbn Adaptive Critic SEL hor HYST Vbp UPD Vbp Vαn 64 Reinforcement Learning Neurons UPD Vαp HYST LOCK LOCK yk Action Network x2(t) yvert State (Quantized) x1(t) Vbn SELhor Action y = –1 (Binary) G. Cauwenberghs y(t) y=1 u(t) 520.776 Learning on Silicon A Biological Adaptive Optics System brain iris retina lens zonule fibers cornea G. Cauwenberghs optic nerve 520.776 Learning on Silicon Wavefront Distortion and Adaptive Optics • Imaging - defocus - motion G. Cauwenberghs • Laser beam - beam wander/spread - intensity fluctuations 520.776 Learning on Silicon Adaptive Optics Conventional Approach – Performs phase conjugation • assumes intensity is unaffected – Complex • requires accurate wavefront phase sensor (Shack-Hartman; Zernike nonlinear filter; etc.) • computationally intensive control system G. Cauwenberghs 520.776 Learning on Silicon Adaptive Optics Model-Free Integrated Approach Incoming wavefront Wavefront corrector with N elements: u1 ,…,un ,…,uN – Optimizes a direct measure J of optical performance (“quality metric”) – No (explicit) model information is required • any type of quality metric J, wavefront corrector (MEMS, LC, …) • no need for wavefront phase sensor – Tolerates imprecision in the implementation of the updates • system level precision limited by accuracy of the measured J G. Cauwenberghs 520.776 Learning on Silicon Adaptive Optics Controller Chip Optimization by Parallel Perturbative Stochastic Gradient Descent Φ(u) image J(u) wavefront corrector performance metric sensor J(u) u AdOpt VLSI wavefront controller G. Cauwenberghs 520.776 Learning on Silicon Adaptive Optics Controller Chip Optimization by Parallel Perturbative Stochastic Gradient Descent Φ(u) Φ(u+δu) image J(u+δu) J(u) wavefront corrector performance metric sensor J(u+δu) u+δu AdOpt VLSI wavefront controller G. Cauwenberghs 520.776 Learning on Silicon Adaptive Optics Controller Chip Optimization by Parallel Perturbative Stochastic Gradient Descent Φ(u) Φ(u+δu) image Φ(u-δu) J(u+δu) J(u) wavefront corrector performance metric sensor J(u-δu) J(u-δu) u-δu uj(k+1) = uj (k) + γ δJ(k) δuj (k) AdOpt VLSI wavefront controller G. Cauwenberghs 520.776 Learning on Silicon δJ Parallel Perturbative Stochastic Gradient Descent Architecture γ δJ {-1,0,+1} <δuiδuj > = σ2 δij uncorrelated perturbations δuj z –1 J(u) uj uj(k+1) = uj (k) + γ δJ(k) δuj (k) ( ( ) ( 2 δ J ( k ) = J u1( k ) + δ u1 ,..., u (Nk ) + δ u N G. Cauwenberghs k k) ) − J (u (k) 1 − δ u1( ) , ..., u (Nk ) − δ u N ( k k) ) 520.776 Learning on Silicon Parallel Perturbative Stochastic Gradient Descent Mixed-Signal Architecture di gi ta an l al og sgn(δJ) {-1,0,+1} Bernoulli δuj γ |δJ| δuj = ±1 z –1 J(u) uj uj(k+1) = uj (k) + γ |δJ(k)| sgn(δJ(k) ) δuj (k) Amplitude G. Cauwenberghs Polarity 520.776 Learning on Silicon Wavefront Controller VLSI Implementation Edwards, Cohen, Cauwenberghs, Vorontsov & Carhart (1999) { )} with δ u ( ) = σ and sgn (δ u ( ) ) = π ( ) = ±1 • Generate Bernoulli distributed δ u j( k k k j j k j • Decompose δ J ( k ) into sgn (δ J ( k ) ) ⋅ δ J ( k ) ( • Update u (jk +1) = u (jk ) − γ ′xor sgn (δ J ( k ) ) π j( k ) G. Cauwenberghs ) AdOpt mixed-mode chip 2.2 mm2, 1.2µm CMOS • Controls 19 channels • Interfaces with LC SLM or MEMS mirrors 520.776 Learning on Silicon Wavefront Controller Chip-level Characterization G. Cauwenberghs 520.776 Learning on Silicon Wavefront Controller System-level Characterization (LC SLM) HEX-127 LC SLM G. Cauwenberghs 520.776 Learning on Silicon Wavefront Controller System-level Characterization (SLM) Maximized and minimized J(u) 100 times max G. Cauwenberghs min 520.776 Learning on Silicon Wavefront Controller System-Level Characterization (MEMS) Weyrauch, Vorontsov, Bifano, Hammer, Cohen & Cauwenberghs Response function MEMS mirror (OKO) G. Cauwenberghs 520.776 Learning on Silicon Wavefront Controller System-level Characterization (over 500 trials) G. Cauwenberghs 520.776 Learning on Silicon Quality Metrics imaging: J image J laser beam focusing: laser comm: G. Cauwenberghs J = ∫ ∇I(r, t) beam comm υ d 2r Horn(1968), Delbruck (1999) r = F {I ( r, t )} Muller et al.(1974) Vorontsov et al.(1996) = bit-error rate 520.776 Learning on Silicon Image Quality Metric Chip Cohen, Cauwenberghs, Vorontsov & Carhart (2001) IQM = N ,M ∑I i, j ∗K i, j ∑I edge image Iij ∗ K image Iij N ,M i, j i, j ⎡ 0 −1 0 ⎤ K = ⎢ −1 +4 −1⎥ ⎣ 0 −1 0 ⎦ 2mm G. Cauwenberghs 22 x 22 pixel array 120µm pixel 120µm 520.776 Learning on Silicon Architecture (3 x 3) Edge and Corner Kernels -1 -1 +4 -1 -1 G. Cauwenberghs 520.776 Learning on Silicon Image Quality Metric Chip Characterization Experimental Setup G. Cauwenberghs 520.776 Learning on Silicon Image Quality Metric Chip Characterization Experimental Results dynamic range ~ 20 uniformly illuminated G. Cauwenberghs 520.776 Learning on Silicon Image Quality Metric Chip Characterization Experimental Results G. Cauwenberghs 520.776 Learning on Silicon Beam Variance Metric Chip Cohen, Cauwenberghs, Vorontsov & Carhart (2001) N,M BVM = N ⋅ M ⋅ ∑ I 2 i, j i, j + + + ⎛ ⎞ ⎜ ∑ Ii, j ⎟ ⎝ i, j ⎠ N,M 2 + perimeter of dummy pixels 22 20 x 22 x 20 22x22 array array Pixel Pixel array G. Cauwenberghs 70µm 520.776 Learning on Silicon MNI I BVM ≈ G. Cauwenberghs 2 u N,M ∑ I(i,j 1+κ ) I κu i,j ⎛ N,M ⎞ ⎜ ∑ I i,j ⎟ ⎝ i,j ⎠ 2 520.776 Learning on Silicon Beam Variance Metric Chip Characterization Experimental Setup G. Cauwenberghs 520.776 Learning on Silicon Beam Variance Metric Chip Characterization Experimental Results dynamic range ~5 G. Cauwenberghs 520.776 Learning on Silicon Beam Variance Metric Chip Characterization Experimental Results G. Cauwenberghs 520.776 Learning on Silicon Beam Variance Metric Sensor in the Loop Laser Receiver Setup G. Cauwenberghs 520.776 Learning on Silicon Beam Variance Metric Sensor in the Loop G. Cauwenberghs 520.776 Learning on Silicon Conclusions • Computational primitives of adaptation and learning are naturally implemented in analog VLSI, and allow to compensate for inaccuracies in the physical implementation of the system under adaptation. • Care should still be taken to avoid inaccuracies in the implementation of the adaptive element. Nevertheless, this can easily be achieved by ensuring the correct polarity, rather than amplitude, of the parameter update increments. • Adaptation algorithms based on physical observation of the “performance” gradient in parameter space are better suited for analog VLSI implementation than algorithms based on a calculated gradient. • Among the most generally applicable learning architectures are those that operate on reinforcement signals, and those that blindly extract and classify signals. • Model-free adaptive optics leads to efficient and robust analog implementation of the control algorithm using a criterion that can be freely chosen to accommodate different wavefront correctors, and different imaging or laser communication applications. G. Cauwenberghs 520.776 Learning on Silicon
© Copyright 2026 Paperzz