CS 461: Machine Learning

CS 461: Machine Learning
Lecture 4
Dr. Kiri Wagstaff
[email protected]
1/31/08
CS 461, Winter 2009
1
Plan for Today
 Solution to HW 2
 Support Vector Machines
 Neural Networks
 Perceptrons
 Multilayer Perceptrons
1/31/08
CS 461, Winter 2009
2
Review from Lecture 3
 Decision trees
 Regression trees, pruning, extracting rules
 Evaluation
 Comparing two classifiers: McNemar’s test
 Support Vector Machines
 Classification
 Linear discriminants, maximum margin
 Learning (optimization): gradient descent, QP
1/31/08
CS 461, Winter 2009
3
Neural Networks
Chapter 11
It Is Pitch Dark
1/31/08
CS 461, Winter 2009
4
Perceptron
Graphical
Math
d
y  w j x j  w0  wT x
j 1
w  w 0 , w1 ,..., w d 
T
x  1, x 1 ,..., x d 
T
1/31/08
CS 461, Winter 2009
[Alpaydin 2004  The MIT Press]
5
“Smooth” Output: Sigmoid Function
1. Calculate gx   wT x and choose C1 if gx   0, or
2. Calculate y  sigmoid wT x and choose C1 if y  0.5
Why?

• Converts output to
probability!
• Less “brittle” boundary
y  sigmoid wT x
1/31/08
1
1 exp wT x
CS 461, Winter 2009
6
K outputs
Regression:
d
y i   w ij x j  w i 0  wiT x
j 1
y  Wx
Classification:
oi  w iT x
expoi
yi 
k expok
Softmax
choose C i
if y i  max y k
k
1/31/08
CS 461, Winter 2009
[Alpaydin 2004  The MIT Press]
7
Training a Neural Network
1. Randomly initialize weights
2. Update =
Learning rate * (Desired - Actual) * Input
w tj  y t  yˆ t x tj

1/31/08
CS 461, Winter 2009
8
Learning Boolean AND
t
t
ˆ
w  y  y x j
t
j
t

Perceptron demo
1/31/08
CS 461, Winter 2009
[Alpaydin 2004  The MIT Press]
9
Multilayer Perceptrons = MLP = ANN
H
y i  vTi z   v ih z h  v i0
h1
z h  sigmoid wTh x 

1
d
 




1 exp  w hj x j  w h 0 

  j1

1/31/08
CS 461, Winter 2009
[Alpaydin 2004  The MIT Press]
10
x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)
1/31/08
CS 461, Winter 2009
[Alpaydin 2004  The MIT Press]
11
Examples
 Digit Recognition
 Ball Balancing
1/31/08
CS 461, Winter 2009
12
ANN vs. SVM
 SVM with sigmoid kernel = 2-layer MLP
 Parameters
 ANN: # hidden layers, # nodes
 SVM: kernel, kernel params, C
 Optimization
 ANN: local minimum (gradient descent)
 SVM: global minimum (QP)
 Interpretability? About the same…
 So why SVMs?
 Sparse solution, geometric interpretation,
less likely to overfit data
1/31/08
CS 461, Winter 2009
13
Summary: Key Points for Today
 Support Vector Machines
 Neural Networks
 Perceptrons
 Sigmoid
 Training by gradient descent
 Multilayer Perceptrons
 ANN vs. SVM
1/31/08
CS 461, Winter 2009
14
Next Time
 Midterm Exam!
 9:10 – 10:40 a.m.
 Open book, open notes (no computer)
 Covers all material through today
 Neural Networks
(read Ch. 11.1-11.8)
 Questions to answer from the reading
 Posted on the website (calendar)
 Three volunteers?
1/31/08
CS 461, Winter 2009
15