Neural networks

Neural networks
1
Context: Artificial intelligence
What does Artificial Intelligence (AI) mean?
• Artificial intelligence (AI) is an area of computer science
that emphasizes the creation of intelligent machines
that work and react like humans. Artificial Intelligence
techniques are increasingly extending and enriching
decision support through such means as coordinating
data delivery, analyzing data, trends, providing forecasts,
developing data consistency, quantifying uncertainty,
anticipating the user’s data needs, providing
information to the user in the most appropriate forms,
and suggesting courses of action.
2
Introduction
• A lot of nowadays well-known methods fall within
bio-inspired computing such as: neural
networks,
Genetic algorithm, memetic
algorithm, swarm intelligence , etc.
3
Introduction: Neural networks
• The nervous system
contains 10^12
interconnected neurons.
4
Introduction: Neural networks
5
Introduction: Neural networks
• Neural networks denote parts of one’s neural system in
neuroscience. Single network is defined by a population
of neurons. Neurons are the base elements that provide
the propagation of excitement through the network.
Simply speaking, when finger tips touch a table, the
information about what happened, starts to propagate
through the body to the effectors.
• And a transport of the excitement is carried out by
means of neurons shaped into neural network.
6
Introduction: Neural networks
• Dr. Robert Hecht-Nielsen defines a neural
network as:
“...a computing system made up of a
number of simple, highly interconnected
processing elements, which process
information by their dynamic state
response to external inputs.”
7
Artificial neural network (ANN)
• Artificial neural network is a computational
model inspired by its biological pattern. It
consists of an interconnected group of artificial
neurons. The information is distributed through
the neurons, nevertheless the main trait is the
ability to learn. It is referred to as adaptation.
8
History and applications
• Classification
Divided objects into different classes
Quantitative data --> qualitative information
Pattern recognition, language recognition, etc.
• Operations Research
Solve hard problems
• Associative memory
Reproduce data based on incomplete and /or noisy
information.
• Machine learning
9
Learning
10
History and applications
• McCulloch & Pitts [1943]
Simple mathematical representation of the biological
neuron and constructed a primitive neural network based
thereon using electrical circuits.
• Hebb [1949]
- Organisation of behavior
- conditioning is a property of neurons
- Learning
11
History and applications
• Rosenblatt [1957]:
Perceptron, the first operational recognition model
with tolerance to noise
Widrow [1960]:
ADALINE, "adaptive linear element"
12
Artificial neuron model
•
Inputs – Neuron consists of
n inputs representing
dendrites of the biological
model. We can formally
denoted them as a vector (x1,
x2, . . . , xn).
13
Artificial neuron model
14
Artificial neuron model
• Weights – Each input is weeded with its synaptic weight.
The weight simulates the permeability of the membrane.
The bigger the weight is, the more permeable membrane
would be in the corresponding biological neuron. Therefore
we can write the weights as a vector of n numbers (w1, w2, .
. . , wn).
15
Artificial neuron model
• Bias – According to biology, the neuron provides an
output when the threshold is reached. Negative value of
the threshold t is represented as a weight of a special
input in the artificial model called bias. It means w0 = −t.
For its formal input x0 holds that at any time x0 = 1.
Hence the value of bias w0 is fully used when computing
an inner potential.
16
Artificial neuron model
• Inner potential – The weighted sum of all inputs
(including the bias) is called inner potential. Formally:
17
Artificial neuron mode
• Inner potential – The weighted sum of all inputs
(including the bias) is called inner potential. Formally:
18
Artificial neuron mode
•
Activation function – Inner potential is evaluated by an
activation function. There are more various functions
used in the field of neural networks. When using the
most basic one, unit step function, the value of the
function is:
19
Artificial neuron mode
•
Activation function –
20
Artificial neuron mode
•
Output – The value of the activation function is denoted
by y. It is the output of the neuron:
21
Artificial neuron mode
• We divide the neural network into three parts:
• architecture – represents a structure of the network,
how its neurons are connected. It can be simply
imagined as a view on the network from outside.
• active phase – is an opposite to the architecture. It
describes the inwards of the network – what happens
from the moment, when the input enters the network till
the computations reach its output.
• adaptation – is a networks reaction on the ongoing
computations. It denotes alterations of the neurons
weights.
22
Learning
• xxxxxx?
• Language English
• Something that fly? bird, insect, airplain????
23
Learning
• The idea is to take a large number of handwritten digits, known as
training and then develop a system which can learn from those
training examples.
24
Learning
• Learning is a phase of the development of a neural
network in which the behavior of the network is altered
until getting the desired behavior. It consists of changing
the network weights to give the network response to an
examples and experiences.
• It is difficult to decide values ​of the weights connection
in neural network for a given application.
• There are two main classes of learning algorithms:
– Supervised learning.
– Unsupervised Learning.
25
Supervised learning
supervisor
desired results
Error
Network
Obtained results
26
Unsupervised Learning
Network
Obtained network
27
Learning rules
Learning means changing the weights of connections between
neurons.
i
Wij
j
There are many rules to change to alter the weights:
– Hebb :  wij(t) = Rxi.xj
– Widrow-Hoff (delta rule) : wij=R(d - aj)ai
– Grossberg : wij=R(aj - wij)ai
28
Hebb rule
• A change in the strength of a connection is a function of the pre –
and post-synaptic neural activities. It is called the “hebbian learning
rule” after D. Hebb (“When neuron A repeatedly participates in firing
neuron B, the strength of the action of A onto B increases”.
reference)
xi xj
0 0
0 1
1 0
1 1
δwij
0
0
0
+
29
Hebb rule
• A more general form of a Hebbian learning rule would be:
wij(t+1) = wij(t) +δwij(t) (New)
Where δwij(t) = xi.xj
in which time and learning thresholds can be taken into
account.
30
Hebb Learning algorithm
Values:
μ, threshold S and
Weights wij
Begin
Wij ,S
Random
Learning knowledge base
e1
e2
xd
1
1
1
(1)
1
-1
1
(2)
-1
1
-1
(3)
-1
-1
-1
(4)
Combination (n)
e1 , e2
Compute X
a = w1.e1 + w2.e2 – S
a ≤ 0 => x = -1
a ≥ 0 => x = 1
Alter the weight
w1 = w1 + μ(e1.x)
w2 = w2 + μ(e2.x)
No
X = Xd?
Combinaison (n+1)
No
Yes
Last
combination?
Yes
Fin
d’apprentissage
31
Hebb Learning algorithm
1. Randomly initialize the weights and threshold values ​S.
2. Write E1 = (e1, ... en) combination.
3. Compute the output x obtained for this entry:
(the threshold value is introduced here in the calculation of the
weighted sum)
x = sign (a) (if a> 0 then x = 1 if a <= 0 then x = -1)
4. If the output x is different from the desired output xd then, the
weight changes (μ is a positive constant that specifies the learning
rate):
5. Repeat until all examples of the training set are treated properly
(back to step 2).
32
Learning example (Hebb rule)
• Example:
• 2 binairies neurones (e1 ,e2)
3 neurones network
33
Learning example (Hebb rule)
e1
e2
x
1
1
1
(1)
1
-1
1
(2)
-1
1
-1
(3)
-1
-1
-1
(4)
Knowlege Base examples using Hebb rule
Learning example (Hebb rule)
1.
Initialization :
μ = +1, the weight (w1=0 et w2=0) and the threshold is null (S=0).
2.
Compute x for combination 1 :
e1=1
=0
x=-1
e2=1
=0
e1
e2
xd
1
1
1
(1)
1
-1
1
(2)
-1
1
-1
(3)
-1
-1
-1
(4)
3.
a = w1.e1 + w2.e2 - S = (0.0 x 1) + (0.0 x 1) – (0x0) = 0
a ≤ 0 => x = -1
4.
The output is different than Xd, we need to change the weight:
5.
w1 = w1 + μ(e1.xd) = 0.0 + 1(1x1) = 1
6.
w2 = w2 + μ(e2.xd) = 0.0 + 1(1x1) = 1
Learning example (Hebb rule)
1.
Initial conditions:
The weight (w1=1 et w2=1) and the threshold is null (S=0).
2.
Compute x for the combination (1) :
e1=1
=1
x=1
e2=1
=1
3.
a = w1.e1 + w2.e2 - S = (1 x 1) + (1 x 1) – (0x0) = 2
4.
The output is true, So processs the next combination.
e1
e2
xd
1
1
1
(1)
1
-1
1
(2)
-1
1
-1
(3)
-1
-1
-1
(4)
a ≥ 0 => x = 1
Learning example (Hebb rule)
1.
Initial conditions:
The weight (w1=1 et w2=1) et le seuil sont nuls (S=0).
2.
Compute x for combinaison (2) :
e1=1
=1
x=-1
e2=-1
=1
3.
a = w1.e1 + w2.e2 - S = (1 x 1) + (-1 x 1) – (0x0) = 0
4.
The output is false, change the weight:
5.
w1 = w1 + μ(e1.xd) = 1 + (1x1) = 2
6.
w2 = w2 + μ(e2.xd) = 1 + (-1x1) = 0
e1
e2
xd
1
1
1
(1)
1
-1
1
(2)
-1
1
-1
(3)
-1
-1
-1
(4)
a ≤ 0 => x = -1
Learning example (Hebb rule)
1.
Initial conditions:
The weight (w1=2 et w2=0) and the threshold is null (S=0).
2.
Compute x for combination (1) :
e1=1
=2
x=1
e2=1
=0
3.
a = w1.e1 + w2.e2 - S = (1 x 2) + (1 x 0) – (0x0) = 3
4.
Compute again x for the combination (2) :
e1=1
5.
=2
=0
a = w1.e1 + w2.e2 - S = (1 x 2) + (-1 x 0) – (0x0) = 2
e2
xd
1
1
1
(1)
1
-1
1
(2)
-1
1
-1
(3)
-1
-1
-1
(4)
e1
e2
xd
1
1
1
(1)
1
-1
1
(2)
-1
1
-1
(3)
-1
-1
-1
(4)
a ≥ 0 => x = 1
x=1
e2=-1
e1
a ≥ 0 => x = 1
Learning example (Hebb rule)
1.
Initial conditions:
The weight (w1=2 et w2=0) and the threshold (S=0).
2.
Compute x for combination (3) :
e1=-1
=2
x=-1
e2=1
=0
3.
a = w1.e1 + w2.e2 - S = (-1 x 2) + (1 x 0) – (0x0) = -2
4.
Compute x for combinaison (4) :
e1=-1
5.
=2
=0
a = w1.e1 + w2.e2 - S = (-1 x 2) + (-1 x 0) – (0x0) = -2
e2
xd
1
1
1
(1)
1
-1
1
(2)
-1
1
-1
(3)
-1
-1
-1
(4)
e1
e2
xd
1
1
1
(1)
1
-1
1
(2)
-1
1
-1
(3)
-1
-1
-1
(4)
a ≤ 0 => x = -1
x=-1
e2=-1
e1
a ≤ 0 => x = -1
Learning example (Hebb rule)
•
At the end, all learning base is reviewed without changing again the
weights.
e1
=2
x
e2
=0
e1
e2
xd
1
1
1
(1)
1
-1
1
(2)
-1
1
-1
(3)
-1
-1
-1
(4)
Hebb rule limitation
• if the set of input patterns are not mutually
orthogonal, interference may occur and the
network may not be able to learn associations.
This limitation of Hebbian learning
(unsupervised) can be overcome by using
Perceptron and the Delta rule.
41
Next
• Perceptron Model
42