Introduction to Machine Learning

Lecture Slides for
ETHEM ALPAYDIN
© The MIT Press, 2010
[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Neural Networks
 Networks of processing units (neurons) with connections





(synapses) between them
Large number of neurons: 1010
Large connectitivity: 105
Parallel processing
Distributed computation/memory
Robust to noise, failures
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
3
Understanding the Brain
 Levels of analysis (Marr, 1982)
1. Computational theory
2. Representation and algorithm
3. Hardware implementation
 Reverse engineering: From hardware to theory
 Parallel processing: SIMD vs MIMD
Neural net: SIMD with modifiable local memory
Learning: Update by training/experience
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
4
Perceptron
d
y   w j x j  w0  w T x
j 1
w  w 0 , w1 ,...,wd T
x  1, x1 ,..., xd 
T
(Rosenblatt, 1962)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
5
What a Perceptron Does
 Regression: y=wx+w0
y
w0
 Classification: y=1(wx+w0>0)
y
y
s
w0
w
w
x
x
x0=+1
w0
x
1
y  sigmoid o 
1  exp  w T x
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)


6
Regression:
d
K Outputs
y i   w ij x j  wi 0  w Ti x
j 1
y  Wx
Classification:
oi  w Ti x
exp oi
yi 
k exp ok
choose C i
if y i  max y k
k
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
7
Training
 Online (instances seen one by one) vs batch (whole
sample) learning:
 No need to store the whole sample
 Problem may change in time
 Wear and degradation in system components
 Stochastic gradient-descent: Update after a single pattern
 Generic update rule (LMS rule):
wijt   rit  yit x tj
Update LearningFa ctor DesiredOut put  ActualOutp ut  Input
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
8
Training a Perceptron: Regression
 Regression (Linear output):


1 t
1 t
t 2
T t 2
E w | x , r   r  y   r  w x 
2
2
w tj   r t  y t x tj
t
t
t
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
9
Classification
 Single sigmoid output
y t  sigmoid w T xt 
E t w | xt , r t   r t log y t  1  r t  log 1  y t 
w tj   r t  y t x tj
 K>2 softmax outputs
exp w Ti xt
t
y 
T t
exp
w
k
kx
E t w i i | xt , r t    rit log y it
w ijt   rit  y it x tj
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
i
10
Learning Boolean AND
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
11
XOR
 No w0, w1, w2 satisfy:
w0
0
w 2  w0
0
w0
0
w1  w 2  w 0
0
w1 
(Minsky and Papert, 1969)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
12
Multilayer Perceptrons
H
y i  v Ti z   v ih zh  v i 0
h 1
zh  sigmoid w Th x 

 
1  exp 
1
d
j 1
w hj x j  w h 0

(Rumelhart et al., 1986)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
13
x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
14
Backpropagation
H
y i  v z   v ih zh  v i 0
T
i
h 1
zh  sigmoid w Th x 

1  exp  
1
d
j 1
w hj x j  w h 0

E
E y i zh

w hj y i zh w hj
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
15
Regression
H
1
t
t 2
E W, v | X    r  y 
2 t
v h   r t  y t zht
y   v z  v0
t
h 1
t
h h
t
Backward
E
w hj  
w hj
Forward
zh  sigmoid w x 
T
h
E y t zht
   t t
t y z h w hj
    r t  y t v h zht 1  zht x tj
t
x
   r t  y t v h zht 1  zht x tj
t
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
16
Regression with Multiple Outputs
y
i
1
t
t 2
E W ,V | X    ri  y i 
2 t i
vih
H
y it   v ih zht v i 0
zh
h 1
v ih    rit  y it zht
t

 t
t
t
w hj     ri  y i v ih zh 1  zht x tj
t  i

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
whj
xj
17
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
18
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
19
whx+w0
zh
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
vhzh
20
Two-Class Discrimination
 One sigmoid output yt for P(C1|xt) and P(C2|xt) ≡ 1-yt
H


y t  sigmoid   v h zht  v 0 
 h1

E W , v | X    r t log y t  1  r t  log 1  y t 
t
v h    r t  y t zht
t
w hj    r t  y t v h zht 1  zht x tj
t
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
21
K>2 Classes
H
o   v z  vi 0
t
i
h 1
t
ih h
E W , v | X    rit log y it
t
exp oit
t


y 

P
C
|
x
i
t
k exp ok
t
i
i
v ih    rit  y it zht
t

 t
t
t
w hj     ri  y i v ih zh 1  zht x tj
t  i

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
22
Multiple Hidden Layers
 MLP with one hidden layer is a universal approximator
(Hornik et al., 1989), but using multiple layers may lead to
simpler networks
 d

z1h  sigmoid w x   sigmoid   w1hj x j  w1h 0 , h  1,..., H1
 j 1

T
1h
 H1


z 2 l  sigmoid w z   sigmoid   w 2 lh z1h  w 2 l 0 , l  1,..., H2
 h1

T
2l 1
H2
y  vT z 2   vl z2l  v0
l 1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
23
Improving Convergence
 Momentum
t

E
wit  
 wit 1
wi
 Adaptive learning rate
  a if E t   E t
  
 b otherwise
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
24
Overfitting/Overtraining
Number of weights: H (d+1)+(H+1)K
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
25
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
26
Structured MLP
(Le Cun et al, 1989)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
27
Weight Sharing
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
28
Hints
(Abu-Mostafa, 1995)
 Invariance to translation, rotation, size
 Virtual examples
 Augmented error: E’=E+λhEh
If x’ and x are the “same”: Eh=[g(x|θ)- g(x’|θ)]2
Approximation hint:
if gx |  ax , bx 
0

E h  gx |   ax 2 if gx |   ax
gx |   b 2 if gx |   b
x
x

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
29
Tuning the Network Size
 Destructive
 Constructive
 Weight decay:
 Growing networks
E
w i  
 w i
w i
E'  E 

2
2
w
 i
i
(Ash, 1989)
(Fahlman and Lebiere, 1989)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
30
Bayesian Learning
 Consider weights wi as random vars, prior p(wi)
pX | w pw 
ˆ MAP  arg max log pw | X 
w
pX 
w
log pw | X   log pX | w   log pw   C
pw | X  
pw    pw i 
i

w i2 
where pw i   c  exp 

2
(
1
/
2

)


E'  E   w
2
 Weight decay, ridge regression, regularization
cost=data-misfit + λ complexity
More about Bayesian methods in chapter 14
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
31
Dimensionality Reduction
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
32
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
33
Learning Time
 Applications:
 Sequence recognition: Speech recognition
 Sequence reproduction: Time-series prediction
 Sequence association
 Network architectures
 Time-delay networks (Waibel et al., 1989)
 Recurrent networks (Rumelhart et al., 1986)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
34
Time-Delay Neural Networks
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
35
Recurrent Networks
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
36
Unfolding in Time
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
37