Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson [email protected] University of Illinois at Urbana-Champaign, USA Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers • Definition: Hyperplane Classifier • Minimum Classification Error Training Methods – Empirical risk – Differentiable estimates of the 0-1 loss function – Error backpropagation • Kernel Methods – – – – – Nonparametric expression of a hyperplane Mathematical properties of a dot product Kernel-based classifier The implied high-dimensional space Error backpropagation for a kernel-based classifier • Useful kernels – Polynomial kernel – RBF kernel Classifier Terminology Hyperplane Classifier x Distance=b x x x x x x x x x x x x x Origin (x=0) x Normal Vector w x x x Class Boundary (“Separatrix”): The plane wTx=b Loss, Risk, and Empirical Risk Empirical Risk with 0-1 Loss Function = Error Rate on Training Data Differentiable Approximations of the 0-1 Loss Function: Hinge Loss Differentiable Approximations of the 0-1 Loss Function: Hinge Loss Differentiable Empirical Risks Error Backpropagation: Hyperplane Classifier with Sigmoidal Loss Sigmoidal Classifier = Hyperplane Classifier with Fuzzy Boundaries x x x x x x x x x x x x More Red x x x x Less Red Less Blue More Blue Error Backpropagation: Sigmoidal Classifier with Absolute Loss Sigmoidal Classifier: Signal Flow Diagram Hypothesis h(x) Sigmoid input g(x) + w1 x1 w2 x2 w3 x3 Connection weights w Input x Multilayer Perceptron Hypothesis h2(x) Sigmoid inputs g2(x) b21 w311 + w312 w313 Connection weights w1 Sigmoid outputs h1(x) Sigmoid inputs g1(x) b11 + b12 + b13 + Connection weights w1 w123w133 w113 x1 x2 x3 Input h0(x)≡x Multilayer Perceptron: Classification Equations Error Backpropagation for a Multilayer Perceptron Classification Power of a OneLayer Perceptron Classification Power of a TwoLayer Perceptron Classification Power of a ThreeLayer Perceptron Output of Multilayer Perceptron is an Approximation of Posterior Probability Kernel-Based Classifiers Representation of Hyperplane in terms of Arbitrary Vectors Kernel-based Classifier Error Backpropagation for a KernelBased Classifier The Implied High-Dimensional Space Some Useful Kernels Polynomial Kernel Polynomial Kernel: Separatrix (Boundary Between Two Classes) is a Polynomial Surface Classification Boundaries Available from a Polynomial Kernel (Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004) Implied Higher-Dimensional Space has a Dimension of Kd The Radial Basis Function (RBF) Kernel RBF Classifier Can Represent Any Classifier Boundary RBF Classifier Can Represent Any Classifier Boundary (Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004) - More training corpus errors - Smoother boundary - Fewer training corpus errors - Wigglier boundary In these figures, C was adjusted, not g, but a similar effect can be achieved by setting N<<M and adjusting g. If N<M, Gamma can Adjust Boundary Smoothness Summary • Classifier definitions – – – – Classifier = a function from x into y Loss = the cost of a mistake Risk = the expected loss Empirical Risk = the average loss on training data • Multilayer Perceptrons – Sigmoidal classifier is similar to hyperplane classifier with sigmoidal loss function – Train using error backpropagation – With two hidden layers, can model any boundary (MLP is a “universal approximator”) – MLP output is an estimate of p(y|x) • Kernel Classifiers – Equivalent to: (1) project into f(x), (2) apply hyperplane classifier – Polynomial kernel: separatrix is polynomial surface of order d – RBF kernel: separatrix can be any surface (RBF is also a “universal approximator”) – RBF kernel: if N<M, g can adjust the “wiggliness” of the separatrix
© Copyright 2026 Paperzz