Introduction to Artificial Neural Networks

Introduction to Artificial
Neural Networks
主講人: 虞台文
Content


Fundamental Concepts of ANNs.
Basic Models and Learning Rules
–
–
–


Neuron Models
ANN structures
Learning
Distributed Representations
Conclusions
Introduction to Artificial
Neural Networks
Fundamental
Concepts of ANNs
What is ANN? Why ANN?

ANN  Artificial Neural Networks
–
–
To simulate human brain behavior
A new generation of information
processing system.
Applications









Pattern Matching
Pattern Recognition
Associate Memory (Content Addressable Memory)
Function Approximation
Learning
Optimization
Vector Quantization
Data Clustering
...
Traditional Computers are inefficient
at these tasks although their
Applications
computation speed is faster.









Pattern Matching
Pattern Recognition
Associate Memory (Content Addressable Memory)
Function Approximation
Learning
Optimization
Vector Quantization
Data Clustering
...
The Configuration of ANNs

An ANN consists of a large number of
interconnected processing elements called
neurons.
–

A human brain consists of ~1011 neurons of
many different types.
How ANN works?
–
Collective behavior.
The Biologic Neuron
The Biologic Neuron
二神經原之神
經絲接合部分
軸突
樹狀突
The Biologic Neuron
Excitatory or Inhibitory
The Artificial Neuron
x1
wi1
x2
wi2

i
f (.) a (.)
xm
wim
yi
The Artificial Neuron
x1
yi (t  1)  a( f )
wi1
x2
wi2

i
yi
f (.) a (.)
m
f (i ) x  wij xwjim  i
m
j 1
1 f  0
a( f )  
0 otherwise
wij
positive
 excitatory
The Artificial
Neuron
negative  inhibitory
zero  no connection
x
1
wi1
x2
wi2

i
f (.) a (.)
xm
wim
yi
The Artificial Neuron
x1
wi1
x2
wi2
Proposed by McCulloch
and Pitts [1943]
M-P neurons

i
f (.) a (.)
xm
wim
yi
What can be done by M-P neurons?



A hard limiter.
A binary threshold unit.
Hyperspace separation.
y
f (  i )  w1 x1  w2 x2  
1 f (  i )  0
y
0 otherwise


w1
x1
w2
x2
0
x2
1
x1
What ANNs will be?






ANN  A neurally inspired mathematical model.
Consists a large number of highly interconnected
PEs.
Its connections (weights) holds knowledge.
The response of PE depends only on local
information.
Its collective behavior demonstrates the
computation power.
With learning, recalling and, generalization
capability.
Three Basic Entities of ANN Models



Models of Neurons or PEs.
Models of synaptic interconnections and
structures.
Training or learning rules.
Introduction to Artificial
Neural Networks
Basic Models and Learning Rules
 Neuron Models
 ANN structures
 Learning
Processing Elements
Extensions of M-P neurons

What integration
functions we may
have?
i
f (.) a (.)
What activation
functions we may
have?
Integration Functions
m
M-P neuron f i  neti   wij x j   i
j 1
Quadratic
Function
Spherical
Function

m
f i   wij x 2j   i
i
j 1
f (.) a (.)
m
f i   ( x j  wij )   i
2
j 1
Polynomial
m m
j
k
f

w
x
x

x

x
Function
 ijk j k j k  i
i
j 1 k 1


Activation Functions
M-P neuron: (Step function)

i
1 f  0
a( f )  
0 otherwise
f (.) a (.)
a
1
f
Activation Functions
Hard Limiter (Threshold function)

i
 1
a( f )  sgn( f )  
 1
f (.) a (.)
f 0
f 0
a
1
1
f
Activation Functions
Ramp function:

i
f (.) a (.)
f 1
1

a( f )   f
0

0  f 1
f 0
a
1
1
f
Activation Functions
Unipolar sigmoid function:

1
a( f ) 
 f
1 e
i
f (.) a (.)
1.5
1
0.5
0
-4
-3
-2
-1
0
1
2
3
4
Activation Functions
Bipolar sigmoid function:

2
a( f ) 
1
 f
1 e
i
f (.) a (.)
1.5
1
0.5
0
-4
-3
-2
-1
-0.5 0
-1
-1.5
1
2
3
4
Example: Activation Surfaces
y
L1
L3
L1
L3
L2
L2
x
x
y
Example: Activation Surfaces
y
L1
x1=0
L3
xy+4=0
L2
y1=0
x
1=1
2=1
L1
L2
3= 4
1
0
1
0
x
L3
1
1
y
Example: Activation Surfaces
010
y
L1
Region Code
L3
011
111
001
101
110
L1
L3
L2
L2
100 x
x
y
Example: Activation Surfaces
y
z
L1
z=0
L4
L3
z=1
L1
L2
L3
L2
x
x
y
Example: Activation Surfaces
y
z
L1
4=2.5
z=0
L4
L3
z=1
1
L1
1
1
L2
L3
L2
x
x
y
Example: Activation Surfaces
M-P neuron: (Step function)
z
1 f  0
a( f )  
0 otherwise
L4
L1
L2
x
L3
y
Unipolar sigmoid
function:
1
a( f ) 
 f
1 e
Example: Activation Surfaces
=2
z
=3
L4
L1
=5
L2
L3
=10
x
y
Introduction to Artificial
Neural Networks
Basic Models and Learning Rules
 Neuron Models
 ANN structures
 Learning
ANN Structure (Connections)
Single-Layer Feedforward Networks
y1
y2
yn
. . .
w1m
w11 w12
x1
w21 w22w2m
x2
wn1 w
n2 wnm
xm
Multilayer Feedforward Networks
y1
y2
Output Layer
yn
. . .
. . .
Hidden Layer
. . .
Input Layer
. . .
x1
x2
xm
Pattern Recognition
Multilayer Feedforward Networks
Where the
knowledge from?
Learning
Classification
Analysis
Input
Output
Single Node with Feedback to Itself
Feedback
Loop
Single-Layer Recurrent Networks
y1
y2
yn
. . .
x1
x2
xm
Multilayer Recurrent Networks
y1
y2
y3
. . .
. . .
x1
x2
x3
Introduction to Artificial
Neural Networks
Basic Models and Learning Rules
 Neuron Models
 ANN structures
 Learning
Learning


Consider an ANN with n neurons and each
with m adaptive weights.
Weight matrix:
w1T   w11
 T 
w 2   w21

W

    
 T 
w n   wn1
w12
w22

wn 2
 w1m 

 w2 m 
  

 wnm 
How?
Learning


Consider
an ANN
with
n neurons
and each
To “Learn”
the
weight
matrix.
with m adaptive weights.
Weight matrix:
w1T   w11
 T 
w 2   w21

W

    
 T 
w n   wn1
w12
w22

wn 2
 w1m 

 w2 m 
  

 wnm 
Learning Rules
 Supervised
learning
 Reinforcement
 Unsupervised
learning
learning
Supervised Learning
 Learning
with a teacher
 Learning
by examples
 Training set
T  (x(1) , d (1) ), (x(2) , d (2 ) ),
, (x( k ) , d ( k ) ),

T  (x , d ), (x , d ),
(1)
(1)
(2)
(2 )
(k )
(k )
, (x , d ),
Supervised Learning
x
y
ANN
W
Error
signal
Generator
d

Reinforcement Learning
 Learning
with a critic
 Learning by comments
Reinforcement Learning
x
y
ANN
W
Critic
signal
Generator
Reinforcement
Signal
Unsupervised Learning
 Self-organizing
 Clustering
–
Form proper clusters by
discovering the similarities and
dissimilarities among objects.
Unsupervised Learning
x
ANN
W
y
The General Weight Learning Rule
x1 w
i1
x2 w
xj
. i2
.
. wij
.
.
.
xm-1
m 1
Input: neti   wij x j  i
j 1
i
wi,m-1
i
yi
Output: yi  a(neti )
We want to learn the weights & bias.
The General Weight Learning Rule
x1 w
i1
x2 w
xj
. i2
.
. wij
.
.
.
xm-1
m 1
Input: neti   wij x j  i
j 1
i
wi,m-1
i
yi
Output: yi  a(neti )
We want to learn the weights & bias.
The General Weight Learning Rule
x1 w
i1
x2 w
xj
. i2
.
. wij
.
.
.
xm-1
m 1
Input: neti   wij x j  i
j 1
i
Let xm = 1 and wim = i.
m
wi,m-1
i
neti   wij x j
j 1
The General Weight Learning Rule
x1 w
i1
x2 w
xj
. i2
.
. wij
.
.
.
xm-1
wi,m-1
m 1
Input: neti   wij x j  i
j 1
i
Let xm = 1 and wim = i.
m
wim=i
xm= 1
neti   wij x j
j 1
We want
to learn
wi=(wi1, wi2 ,…,wim)T
The General Weight Learning Rule
x1 w
i1
x2 w
xj
. i2
.
. wij
.
.
.
xm-1
wi,m-1
m
Input: neti   wij x j
j 1
i
wim=i
xm= 1
yi
wi(t) = ?
The General Weight Learning Rule
x
yi
wi
r
Learning
Signal
Generator
di
The General Weight Learning Rule
x
yi
wi
r
Learning
f r (w
Signal
i , x, di )
Generator
di
The General Weight Learning Rule
x
yi
wi
w i (t )  rx(t )
r
Learning
f r (w
Signal
i , x, di )
Generator
di
w i (t )  rx(t )
The General Weight Learning Rule
x
yi
wi
wi (t )   rx(t )
r
Learning Rate

Learning
f r (w
Signal
i , x, di )
Generator
di
We want
to learn
wi=(wi1, wi2 ,…,wim)T
The General Weight Learning Rule
wi (t )   rx(t )
r  f r (wi , x, di )
Discrete-Time Weight Modification Rule:
w
( t 1)
i
 w   f r (w , x , d )x
(t )
i
(t )
i
(t )
(t )
i
Continuous-Time Weight Modification Rule:
dw i (t )
  rx(t )
dt
(t )
Hebb’s Learning Law
• Hebb [1994] hypothesis that when an axonal input
from A to B causes neuron B to immediately emit
a pulse (fire) and this situation happens
repeatedly or persistently.
• Then, the efficacy of that axonal input, in terms
of ability to help neuron B to fire in future, is
somehow increased.
• Hebb’s learning rule is a unsupervised learning
rule.
Hebb’s Learning Law
r  f r (w i , x, di )  a(w x)  yi
T
i
wi (t )   rx(t )   yi x
wij   yi x j
+

+

Introduction to Artificial
Neural Networks
Distributed
Representations
Distributed Representations
• Distributed Representation:
– An entity is represented by a pattern of
activity distributed over many PEs.
– Each Processing element is involved in
representing many different entities.
• Local Representation:
– Each entity is represented by one PE.
Example
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
Dog
Cat
Bread
+ _ + + _ _ _ _ + + + + + _ _ _
+ _ + + _ _ _ _ + _ + _ + + _ +
_
_
_
_ _
_
+ +
+
+ +
+
+ + + +

Act as a content addressable memory.
Advantages
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
Dog
Cat
Bread
+ _ + + _ _ _ _ + + + + + _ _ _
+ _ + + _ _ _ _ + _ + _ + + _ +
_
_
_
_ _
_
+ +
+
+ +
+
+ + + +
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
+ + + +
What is this?


Act as a content addressable memory.
Make induction easy.
Advantages
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
Dog
Cat
Bread
+ _ + + _ _ _ _ + + + + + _ _ _
+ _ + + _ _ _ _ + _ + _ + + _ +
_
_
_
_ _
_
+ +
+
+ +
+
+ + + +
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
Fido
+ _ _ + _ _ _ _ + + + + + + _ _
Dog has 4 legs? How many for Fido?



Advantages
Act as a content addressable memory.
Make induction easy.
Make the creation of new entities or
concept easy (without allocation of
new hardware).
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
Dog
Cat
Bread
Doughnut
+ _ + + _ _ _ _ + + + + + _ _ _
+ _ + + _ _ _ _ + _ + _ + + _ +
_
_
_
_ _
_
+ +
+
+ +
+
+ + + +
+ +
_ _ _
+ +
_
+
_ _ _
+ + +
_
Add doughnut by changing weights.



Advantages

Act as a content addressable memory.
Make induction easy.
Make the creation of new entities or
concept easy (without allocation of
new hardware).
Fault Tolerance.
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15
Dog
Cat
Bread
+ _ + + _ _ _ _ + + + + + _ _ _
+ _ + + _ _ _ _ + _ + _ + + _ +
_
_
_
_ _
_
+ +
+
+ +
+
+ + + +
Some PEs break down don’t cause problem.
Disadvantages
• How to understand?
• How to modify?
Learning procedures are required.