Lecture 4

Landmark-Based Speech
Recognition:
Spectrogram Reading,
Support Vector Machines,
Dynamic Bayesian Networks,
and Phonology
Mark Hasegawa-Johnson
[email protected]
University of Illinois at Urbana-Champaign, USA
Lecture 4: Hyperplanes, Perceptrons,
and Kernel-Based Classifiers
• Definition: Hyperplane Classifier
• Minimum Classification Error Training Methods
– Empirical risk
– Differentiable estimates of the 0-1 loss function
– Error backpropagation
• Kernel Methods
–
–
–
–
–
Nonparametric expression of a hyperplane
Mathematical properties of a dot product
Kernel-based classifier
The implied high-dimensional space
Error backpropagation for a kernel-based classifier
• Useful kernels
– Polynomial kernel
– RBF kernel
Classifier Terminology
Hyperplane Classifier
x
Distance=b
x
x
x
x
x
x
x
x
x
x
x
x
x
Origin (x=0)
x
Normal Vector w
x
x
x
Class Boundary
(“Separatrix”):
The plane wTx=b
Loss, Risk, and Empirical Risk
Empirical Risk with 0-1 Loss Function
= Error Rate on Training Data
Differentiable Approximations of the
0-1 Loss Function: Hinge Loss
Differentiable Approximations of the
0-1 Loss Function: Hinge Loss
Differentiable Empirical Risks
Error Backpropagation: Hyperplane
Classifier with Sigmoidal Loss
Sigmoidal Classifier = Hyperplane
Classifier with Fuzzy Boundaries
x
x
x
x
x
x
x
x
x
x
x
x
More Red
x
x
x
x
Less Red
Less Blue
More Blue
Error Backpropagation: Sigmoidal
Classifier with Absolute Loss
Sigmoidal Classifier: Signal Flow
Diagram
Hypothesis h(x)
Sigmoid input g(x)
+
w1
x1
w2
x2
w3
x3
Connection weights w
Input x
Multilayer Perceptron
Hypothesis h2(x)
Sigmoid inputs g2(x)
b21
w311
+
w312
w313
Connection weights w1
Sigmoid outputs h1(x)
Sigmoid inputs g1(x)
b11
+
b12 +
b13 +
Connection weights w1
w123w133
w113
x1
x2
x3
Input h0(x)≡x
Multilayer Perceptron:
Classification Equations
Error Backpropagation for a
Multilayer Perceptron
Classification Power of a OneLayer Perceptron
Classification Power of a TwoLayer Perceptron
Classification Power of a ThreeLayer Perceptron
Output of Multilayer Perceptron is an
Approximation of Posterior Probability
Kernel-Based
Classifiers
Representation of Hyperplane in
terms of Arbitrary Vectors
Kernel-based Classifier
Error Backpropagation for a KernelBased Classifier
The Implied High-Dimensional
Space
Some Useful Kernels
Polynomial Kernel
Polynomial Kernel: Separatrix
(Boundary Between Two Classes)
is a Polynomial Surface
Classification Boundaries Available
from a Polynomial Kernel
(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)
Implied Higher-Dimensional Space
has a Dimension of Kd
The Radial Basis Function (RBF)
Kernel
RBF Classifier Can Represent Any
Classifier Boundary
RBF Classifier Can Represent Any
Classifier Boundary
(Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)
- More training corpus errors
- Smoother boundary
- Fewer training corpus errors
- Wigglier boundary
In these figures, C was adjusted, not g, but a similar effect can be achieved by
setting N<<M and adjusting g.
If N<M, Gamma can Adjust
Boundary Smoothness
Summary
• Classifier definitions
–
–
–
–
Classifier = a function from x into y
Loss = the cost of a mistake
Risk = the expected loss
Empirical Risk = the average loss on training data
• Multilayer Perceptrons
– Sigmoidal classifier is similar to hyperplane classifier with sigmoidal
loss function
– Train using error backpropagation
– With two hidden layers, can model any boundary (MLP is a “universal
approximator”)
– MLP output is an estimate of p(y|x)
• Kernel Classifiers
– Equivalent to: (1) project into f(x), (2) apply hyperplane classifier
– Polynomial kernel: separatrix is polynomial surface of order d
– RBF kernel: separatrix can be any surface (RBF is also a “universal
approximator”)
– RBF kernel: if N<M, g can adjust the “wiggliness” of the separatrix