Comparison of Neural Network Learning Algorithms for Prediction

Comparison of Neural Network Learning
Algorithms for Prediction Enhancement of a
Planning Tool
Zakaria Nouir, Berna Sayrac and Benoı̂t Fourestié
Walid Tabbara and Françoise Brouaye
France Telecom, R&D Division
38 rue Général Leclerc Issy-les-Moulineaux FRANCE
E-mail: [email protected]
LSS Supélec
Gif sur Yvette FRANCE
E-mail: ([email protected]
Abstract—This work presents the results of the studies concerning the application of different neural network training
algorithms to enhance the prediction of a radio network planning
tool. Investigations are made on a hybrid model that combines
the a-priori information in form of simulation results with
the a-posteriori knowledge contained in measurement data. The
performances of Back Propagation and Levenberg-Marquardt
algorithms are compared to the measured values. The comparison
is based on the absolute mean square error, standard deviation
and root mean square error between predicted and measured
values. The study is made on the Empirical Risk Minimization
context and the neural network generalization error (Real Risk)
is given with a 95% confidence interval.
used a Multi Layer Perceptron Neural Network to learn the
statistical features between simulations and measurements as
described in figure 1. Thus the radio network planning tool
delivers enhanced predictions that are close to measurements
(figure 2). A pre-processing based on Independent Component
Analysis, Histogram transformation and k-Means Clustering
is done, thus the used learning process works on independent
distributions of the variables to be predicted.
Key-Words–
Neural
Networks,
Back-Propagation,
Levenberg-Marquardt, Radio Network Planning ToolPrediction Enhancement
I. I NTRODUCTION
To predict the quality of service of the 3G network, many
works propose to use radio network planning tools (RNP)
based on theoretical models [1]. On the other hand, a lot of
research work has used empirical models [2] [3] [4] where
an Artificial Neural Network (ANN) is used to parameterize
the prediction tool. While empirical models are based on the
a-posteriori knowledge (i.e. the measurements), the theoretical models deal with the fundamental principles of physical
phenomena.
The theoretical models are all statistical models and therefore they are naturally non-perfect because of mathematical
model simplifications. This leads to discrepancies between
simulation results and reality (measurements).
In the empirical models, all environmental influences are
taken into account. However the a priori knowledge (physical
models) is not used and results depend not only on the
accuracy of the measurements but also on similarities between
the environment to be analysed and the environment where the
measurements are carried out.
In [5] we have proposed a hybrid model to benefit from
the combination of both the a priori and the a posteriori
information by making use of the measurement data in the
simulation tool to enhance the simulation results. We have
Fig. 1.
Diagram for the training process
In this work we compare two different Multi Layer Neural
Network learning algorithms that have been tested to learn
the correspondence between simulations and measurements.
The performances of the learning algorithms are evaluated
by comparing the convergence speed and the prediction error.
These error statistics are empirical absolute mean error, standard deviation and root mean square error between measured
and predicted values. The real absolute mean error with 95%
confidence interval is also given.
The remainder of this paper is structured as follows: In
section 2 we give an overview of the neural network and
A. Multilayer Perceptron Neural Network(MLP-NN)
Figure 3 shows the configuration of a multilayer perceptron
with one hidden layer and one output layer. In this MLP each
neuron is connected to each neuron in the next layer. The
output of the MLP is described by the following equation:

!!
N
N
X
X
I
H

(1)
wij
xi
FH
wjp
yp = FO 
i=0
j=0
for p = 1, 2 . . . N
Fig. 2.
Diagram for the prediction process
Where:
H
• wjp represents the weights from neuron j in the hidden
layer to the pth output neuron
th
• xi represents the i
element in the input layer
• FH and FO represent the activation functions in the
hidden and output layers respectively.
I
• wij are the weights from neuron i in the input layer to
the neuron j in the hidden layer.
The learning phase consists of the minimization of the cost
function defined by:
N
specially the Multi Layer Perceptron Neural Networks. We
present in this section the fundamental aspects of the two
training algorithms tested in our study (Back Propagation and
Levenberg-Marquardt). In section 3, we present the different
performances keys used for algorithm comparison. Next, in
Section 4 two types of results are given. The first type
deals with learning algorithms performances (real and learning
errors) and the second type deals with the application of the
two algorithms to the radio network prediction enhancement.
Each type of results is followed by discussions. In the last
section we conclude this work.
E=
N
1X
1X 2
(yp − dp ) =
e
2 p=1
2 p=1 p
(2)
Where yp is the pth output value calculated by the network
and dp represents the expected value.
B. Back propagation algorithm
The Back propagation algorithm is a simple gradient descend technique that minimizes the mean squared error defined
in equation 2.
The output of each neuron in the output layer is a function of
the weights w. To minimize the cost function we must have:
∂E(w)
=0
for all i
(3)
∂wi
The update rule in the back propagation algorithm is:
∇E(w) =
II. T HE ANN OVERVIEW
Neural Networks are very powerful tools that have been
used in many domains [6]. They can be applied to any problem
of prediction, classification or control where there exists
sufficient amount of observation data. Neural Networks owe
this popularity to their powerful capacity to model extremely
complex non linear functions and to their relatively easy use
that is based on training-prediction cycles. In the training
cycle the user presents to the network a training pattern
that contains a set of inputs and a set of desired outputs
that corresponds to the inputs. Next, in prediction cycle, the
network is supposed to be able to supply the user with output
values corresponding to input values that it has never seen
thanks to its generalization capability. A good generalization
is generally a complex task where the training set must contain
sufficient information representing all cases so that a valid
general mapping between outputs and inputs can be found.
Furthermore, the training sets must be sufficiently large and
representative of all cases [7] [8] [9].
w(t + 1) = w(t) + ∇w(t)
Where:
∇w(t) = −η
∂E(t)
∂w(t)
(4)
(5)
Where:
• η represents the learning parameter
C. Levenberg-Marquardt algorithm
This algorithm is a blend of gradient descend and GaussNewton iteration. The gradient and the Hessian of the cost
function can be written as:
∇E(w) = J(w)T e(w)
(6)
∇2 E(w) ≈ J(w)T J(w) = H
(7)
Where:
• (J(w))ij =
∂ei
∂wj
is the jaccobian matrix of E w.r.t w.
Fig. 3.
Fully connected multi-layer perceptron with one hidden layer
mean error is computed by:
To find the minimum of the cost function we write
∇E(w) = 0
µemp =
We expand the gradient of E using Taylor series around the
current state:
∇E{w(t + 1)} = ∇E{w(t)} +
{w(t + 1) − w(t)}∇2 E{w(t)} + . . .
(8)
This leads to:
w(t + 1) = w(t) + ∇2 E(w(t))−1 ∇E(w(t))
(9)
By combining the gradient method and Gauss Newton method
we obtain the update rule for the Levenberg-Marquardt algorithm:
w(t + 1) = w(t) − (H − ηI)−1 ∇E(w(t))
(10)
III. E VALUATION
OF TRAINING ALGORITHMS
PERFORMANCES
As the generalization property is very important in practical
prediction situation, the selection of training examples is
important to achieve good generalization. The set of training
examples is separated into two disjoint sets that are training
set and test set.
In the first stage we present the training set to the MLP-NN
in order to perform its learning based on one of the training
algorithms presented above. Next the performances of the two
algorithms are compared based on the test set.
A. Empirical Mean Error
The empirical absolute error between the measured and
predicted value is computed with:
Ei =
N
X
predicted
measured
| yip
− yip
|
(11)
p=1
where i represents sample index in the test set and N the
dimension of the NN output vector. The Empirical Absolute
M
1 X
Ei
M i=1
(12)
where M is the size test set.
B. Empirical Standard Deviation
The standard deviation is determined from the empirical
absolute error (eq.11) and the empirical absolute mean error
(eq.12):
v
!
u
M
u 1
X
t
2
2
E − M µemp
(13)
σemp =
M − 1 i=1 i
C. Empirical Root Mean Error
The Empirical Root Mean Square error (RMS) is given by:
q
2
(14)
RM S = µ2emp + σemp
D. Real Mean Error
The errors described above are empirical because they are
calculated using the test set. According to the Empirical Risk
Minimization (ERM) theory the empirical error converge to
the real error when M −→ ∞.
To approximate the real absolute error we have considered that
the error Ei is normally distributed with a mean equal to µemp
and a standard deviation equal to σemp . This consideration is
justified because Ei is a sum of random variables that tends to
a Gaussian distribution according to the central limit theorem.
With regard to this consideration the 95% confidence interval for the real absolute mean error is given by:
r
µemp (1 − µemp )
(15)
µreal = µemp ± 2.26
M
where M is the size of the test set and 2.26 is the value given
by a Student distribution with M − 1 degrees of freedom for
a confidence of 95%.
3G Radio Access Network and performance indicators
IV. R ESULTS
A. Neural Networks Algorithms Comparison Results
1) Results: We have applied the proposed method to simulations of a static third generation (3G) RNP tool: Odyssee.
This tool is used to predict Radio Acess Network (RAN)
performance and to guide operators during the deployment
and optimization phases. In the context of 3G networks, the
main inputs of this tool are the network configuration (i.e.
site locations, NodeB parameters, power settings, antenna
specifications, etc.), the propagation model with correlated
shadowing, the traffic distribution, service parameters (average
target signal-to-interference ratio UL/DL, throughput, etc.),
and Radio Resource Management (RRM) parameters (macro
diversity, admission control, load control thresholds). The
outputs consist of the UpLink (UL) and DownLink (DL) transmission powers, interference, but also performance indicators
such as access and dropping probabilities, average throughput,
etc., which are calculated by taking into account mobility and
RRM algorithms.
Odyssee performs Monte Carlo simulations: Positioning the
mobiles, checking their conditions of access to the RAN and
calculating their UL/DL powers and transmission characteristics such as throughput, etc. In this work, Odyssee operates in
the static mode, i.e., each Monte Carlo draw is an independent
snapshot of the network.
The performance indicators we are interested in are situated
at the base station level: Emission powers, received interference levels, call blocking rates, access rates, etc. An important
property of these indicators is that they are inter-correlated. If
we have excessive transmission powers, interference levels will
increase, increasing call blocking rates and reducing access
rates to the RAN.
Measurements of the indicators at the base station level
which we compare to simulation results can either be found
at the Operation and Maintenance Center (OMC) or can be
obtained via capture tools that work on interfaces such as Iub
for 3G (Figure 4).
The proposed scheme is tested on the Paris UMTS network.
The MLP used has one hidden layer having the same number
of hidden neurons as the input and output layers. In each
layer there are 20 neurons (two histograms of 10 bins each).
The simulations yield two variables: Uplink Load (ULL) and
Downlink Load (DLL) for each station in the network.
Table I shows the statistics of the two training algorithms.
The training set consists of 1000 patterns while for test purpose
a set of 500 patterns was used.
TABLE I
E RROR S TATISTICS
BP
0.019
0.003
0.02
[0.016, 0.021]
µemp
σemp
RMS
µreal 95%
LM
0.4
0.01
0.05
[0.03, 0.04]
In Figure 5 we plot the convergence speed of the two
algorithms used. This figure represents the value of the error,
function of the number of iteration. This error was calculated
on the training set. Note that this figure was given in the
logarithmic scale.
Convergence Speed
10
0
−10
−20
Error
Fig. 4.
−30
−40
−50
−60
Back Propagation Algorithm
Levenberg Marquardt Algorithm
−70
−80
0
5
10
Fig. 5.
15
20
25
Iteration
30
35
40
45
Convergence Speed Comparison
2) Discussions: Measurements taken on a 3G radio network
were used to design neural network based model with different
training algorithms. This neural network uses also the simulations results of a 3G radio network planning tool to learn the
statistical differences between measurements and simulations.
This learned relation will be used to enhance the prediction
quality of the planning tool. To compare the tested learning
algorithms, the empirical error between neural network results
and measurements is computed on a test set.
For the Back propagation algorithm a RMS error of 0.02
is obtained whilst a RMS error of 0.05 was obtained when
applying the Levenberg-Marquardt algorithm (Table I). In
the other hand, the error computed on the training set is
equal to 10−31 for the Levenberg-Marquardt algorithm and
equal to 10−3 for the Back propagation algorithm (Figure 5).
This shows that, despite the LM training error is lower than
BP training error, we have best result for the BP training
algorithm (test error). This means that with the LM algorithm
we have learned all details on the relation between simulations
and measurements, noise included, this leads to overfitting
training. To overcome this overfitting we must stop the LM
training in the 20th iteration.
At this point we can say that the LM algorithm gives more
accurate results but we must be aware to stop training in the
right time in order to ovoid overfitting training. In the other
hand, it is very important to consider the execution time and
algorithmic complexity. For example for our study we have
calculated the execution time for 100 iterations and we have
observed that BP algorithm is faster than LM algorithm (2s
and 19s). For algorithm complexity it’s clear that the LM
algorithm is more complex and needs more memory because
of the Hessian computation.
the LM (KS distance =0.12) than for the BP (KS distance
=0.069). All these results are summarized in table II.
TABLE II
C OMPARISON OF 2-D K OLMOGOROV S MIRNOV D ISTANCE
Mesurement-Simulation
Mesurement-NN Result
BP
0.737
0.12
LM
0.737
0.069
Fig. 6.
Radio Network Prediction Enhancement Results with BP
Fig. 7.
Radio Network Prediction Enhancement Results with LM
B. Radio Network Prediction Enhancement Results
1) Learning Results: In this section we will apply the
scheme proposed in [5] using the Back propagation algorithm
and the LM algorithm. The method is applied to 2 stations and
2 indicators (ULL and DLL) with 20 bins in each histogram.
For illustration purposes, we give the results by scatter plots.
Each point in the scatter plot corresponds to a snapshot data
sample. The vertical axis corresponds to the ULL and the
horizontal axis to the DLL. The numerical results of the
comparison are given by the 2-D Kolmogorov Smirnov test
(KS-test) [10] that determines the difference between two
datasets. According to this test, two datasets are supposed to
be coming from the same distribution if the value returned by
the KS-Test is close to zero. If the two datasets are far from
each other the KS-Test return a value near to 1.
Figure 6 shows the results of the learning phase where
indepenent data are used to train the MLP with Back Propagation algorithm. Figure 7 shows the results of the learning
phase where indepenent data are used to train the MLP with
Levenberg-Marquardt algorithm. The black points correspond
to measurements, dark grey points to simulations and the light
grey points correspond to the outputs of the proposed scheme.
Note that these are the results of the learning phase that are
obtained by passing the simulation data set of the learning
phase through the trained MLP. As shown in these figures the
MLP trained by LM algorithm performs its learning on data
well than when trained by BP algorithm (the predicted-data
distribution is more close to the measurement distribution for
2) Generalization Results: The real interest of the scheme
proposed in [5] lies in using the generalization capability of
the ANN: the input simulation data of the prediction phase
corresponds to a case that has never been encountered by
the ANN during the learning phase (such as different traffic,
different network parameters, etc.) and the ANN succeeds
in correcting the simulations of the new case. However, the
generalization is not always easy to achieve since it is an
extrapolation operation that requires special attention. In this
section we will compare the genealization results for the two
learning algorithms in the case of tilt change.
A new simulation data is generated by modifying the (mechanical) tilt of an antenna from 0◦ to 10◦ . In practical cases,
we would not dispose of measurement data for such a case
and we would like to be able to obtain accurate predictions.
This ability saves us from the cost of going out to the field to
modify the tilt and to collect the measurements.
Thus, the new simulation data corresponding to a tilt value
of 10◦ is passed through the proposed scheme (all the coefficients and parameters of the trained ANN is preserved as well
as those of the learning algorithm) and we obtain the results
depicted in table III. The scatter plots are given in figure 8 for
LM algorithm generalization and in figure 9 for BP algorithm
generalization.
TABLE III
C OMPARISON OF G ENERALIZATION R ESULTS
Mesurement-Simulation (10◦ )
Mesurement-NN Result (10◦ )
BP
0.730
0.050
LM
0.730
0.23
Fig. 9.
Generalization results with BP
Perceptrons neural network. In this paper we compare two
learning algorithms to train the MLP: Back-Propagation and
Levenberg-Marquardt. Results have shown that we obtain a
lower real error for back-propagation than for LevenbergMarquardt despite of a higher training error. This result is
illustrated by the comparison of performances of the two
algorithms when applied to the enhancement of the prediction
of a radio network planning tool. The generalization case taken
into account in this study is the case of up tilt change.
R EFERENCES
Fig. 8.
Generalization results with LM
3) Discussions: As show in the previous section, the LM
algorithm has best results for learning phase. This result
is logic because we have shown in the learning algorithm
comparison that the LM algorithm has a learning error lower
than the BP algorithm.
In the other hand, the table III, shows that the BP algorithm
has a generalization results best that the LM algorithm. This
result is also predictable because we have shown previously
that the test error for the BP algorithm is lower than the LM
algorithm test error.
V. C ONCLUSION
In previous work we have proposed to combine a-priori
information with the a-posteriori knowledge to enhance prediction results of a radio network planning tool. This combination is based on a learning system using MultiLayer
[1] M. Centeno and M. Reyes, “So you have your model: what to do
next? a tutorial on simulation output analysis,” Simulation Conference
Proceedings, vol. 1, pp. 23–29, Dec 1998.
[2] N. Andrea, C. Cecchetti, and A. Lipparwi, “Fast prediction of the
performance of wireless links by simulation trained neural network,”
Proc. IEEE MTT-S Digest 2000, pp. 429–432, 2000.
[3] T. Balandier, A. Caminada., V. Lemoine, and F. Alexandre, “170 mhz
field strength prediction in urban environments using neural nets,” in
Proc. IEEE inter. Symp. Personal, Indoor and mobile Radio Comm. ,
vol. 1, 1995, pp. 120–124.
[4] P. Chang and W.-H. Yang, “Environment-adaptation mobile radio propagation prediction using radial basis function neural networks,” in Proc.
IEEE trans. Vech. Techno , vol. 46, 1997, pp. 155–160.
[5] Z. Nouir, B. Sayrac, and B. Fourestié, “Enhancement of network
planning tool predictions through measurements,” in Proc. IEEE trans.
Vech. Techno , vol. 46, 2006, pp. 155–160.
[6] S. Haykin, Neural Networks: A comprehensive foundation, 2nd ed.
Prentice Hall, 1998.
[7] D. H. Wolpert, “The mathematics of generalization,” in The Proceedings
of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning, Santa Fe Institute Studies in the Sciences of Complexity, vol. 20.
MA: Addison-Wesley, 1994.
[8] ——, “The lack of A priori distinctions between learning algorithms,”
Neural Computation, vol. 8, no. 7, pp. 1341–1390, 1996.
[9] ——, “The existence of A priori distinctions between learning
algorithms,” Neural Computation, vol. 8, no. 7, pp. 1391–1420, 1996.
[Online]. Available: citeseer.ist.psu.edu/88072.html
[10] G. Fasano and A. Franceschini, “A multidimensional version of the
kolmogorov-smirnov test,” Royal Astronomical Society, vol. 255, pp.
155–170, 1987.