3.5. Why does this sometimes not succeed?

3.5. Why does this sometimes not succeed?
(translation by Agata Krawcewicz, [email protected])
Not always is learning such a simple and pleasant process, as I showed in fig. 3.7. At times
the network „struggles” for a long time, before it finds a required solution. The detailed
discussion of difficulties appearing here would again demand references to complicated
considerations on the subject of the infinitesimal and differential calculus together with the
analysis of the convergence of algorithms, and with other rather difficult things. However,
instead of persecuting you with complicated mathematics I will try to explain it, by telling
you a story of a blind kangaroo (look at fig 3.8).
Fig. 3.8. Learning of neural network presented as blind kangaroo hike. Discussion in text.
Imagine that a blind kangaroo got lost in the mountains and wishes to come back to its own
house about which it knows only the fact, that it is situated at the very bottom of the deepest
valley. The kangaroo thinks of a simple method: He is indeed blind, so he cannot see all the
landscape of the mountains surrounding him (just as the algorithm of learning the networks
cannot check the value of the error function in all points for any sets of weights), but can feel
with his little paws, which way does the ground subside (in the same way as the algorithm of
learning the network can find out, which way it is necessary to change weights, so that the
error will grow smaller). So that way the kangaroo finds a proper direction and hop! - it jumps
with as much powers as he has in his legs, counting on the fact, that he should aim to his own
little house.
Unfortunately a few surprises are waiting for the kangaroo (and on the algorithm of
learning the neural network). When you will look thoroughly at figure 3.8 you will notice that
the imprudent jump can lead the kangaroo down of the rift which separates him from his
house. A situation is also possible (not drawn, but easy to imagine), that the ground can drop
in the certain direction - but a little further it can suddenly raise, what will cause that
performing a long jump in the seemingly promising direction - the kangaroo will make his
situation worse, because he will find himself landing higher (that is to say further from the
aim), than he was in the starting moment!
The success of the poor little kangaroo depends mostly on the fact, whether he can
properly measure the length of the jump. If it will perform small jumps, then the way home
will take him a lot of time. But if he decides on a jump that is too long, and in the
environment there are some crags or rifts, he will harm himself!
At learning the networks the creator of the algorithm must also decide, how big the
changes of the weight should be, caused by particular values of input signals and the specific
size of the error. The decision this is made by changing the so called proportion coefficient learning rate. Apparently it can be chosen just as we wish, however every particular decision
has specific consequences. Choosing a coefficient that is too small makes the process of
learning very slow (weights are improved very slightly in every step, so in order for them to
reach desirable values we have to perform lots of such steps). And choosing a too large
coefficient of learning causes very abrupt changes of parameters of the network, which in the
extreme cases can even lead to instability of the process of the learning (the network tosses,
not being able to find the correct values of weights, changes of which being made so quickly,
that precise "shooting itself" into necessary solution is very hard).
One can have a look on this problem from yet another point of view. Large values of
the coefficient of learning resemble an attitude of a teacher, who is very strict and difficult to
please and who too radically and severely punishes the pupil for his mistakes. Such teacher
seldom attains good results of learning, because he sets pupils into confusion and causes their
excessive stress. On the other hand little values of this coefficient of learning resemble a
teacher who is excessively tolerant and whose pupils make too slow progresses, because he
insufficiently rushes them to work.
When learning the networks and learning pupils it is necessary to make a compromise,
taking into account both advantages related with quick work, and safety considerations,
pointing out the necessity of obtaining a stable functioning of the process of learning. You can
however help yourself in another smart manner which I will describe to you in the following
subchapter.