Comparison of Neural Network Learning Algorithms for Prediction Enhancement of a Planning Tool Zakaria Nouir, Berna Sayrac and Benoı̂t Fourestié Walid Tabbara and Françoise Brouaye France Telecom, R&D Division 38 rue Général Leclerc Issy-les-Moulineaux FRANCE E-mail: [email protected] LSS Supélec Gif sur Yvette FRANCE E-mail: ([email protected] Abstract—This work presents the results of the studies concerning the application of different neural network training algorithms to enhance the prediction of a radio network planning tool. Investigations are made on a hybrid model that combines the a-priori information in form of simulation results with the a-posteriori knowledge contained in measurement data. The performances of Back Propagation and Levenberg-Marquardt algorithms are compared to the measured values. The comparison is based on the absolute mean square error, standard deviation and root mean square error between predicted and measured values. The study is made on the Empirical Risk Minimization context and the neural network generalization error (Real Risk) is given with a 95% confidence interval. used a Multi Layer Perceptron Neural Network to learn the statistical features between simulations and measurements as described in figure 1. Thus the radio network planning tool delivers enhanced predictions that are close to measurements (figure 2). A pre-processing based on Independent Component Analysis, Histogram transformation and k-Means Clustering is done, thus the used learning process works on independent distributions of the variables to be predicted. Key-Words– Neural Networks, Back-Propagation, Levenberg-Marquardt, Radio Network Planning ToolPrediction Enhancement I. I NTRODUCTION To predict the quality of service of the 3G network, many works propose to use radio network planning tools (RNP) based on theoretical models [1]. On the other hand, a lot of research work has used empirical models [2] [3] [4] where an Artificial Neural Network (ANN) is used to parameterize the prediction tool. While empirical models are based on the a-posteriori knowledge (i.e. the measurements), the theoretical models deal with the fundamental principles of physical phenomena. The theoretical models are all statistical models and therefore they are naturally non-perfect because of mathematical model simplifications. This leads to discrepancies between simulation results and reality (measurements). In the empirical models, all environmental influences are taken into account. However the a priori knowledge (physical models) is not used and results depend not only on the accuracy of the measurements but also on similarities between the environment to be analysed and the environment where the measurements are carried out. In [5] we have proposed a hybrid model to benefit from the combination of both the a priori and the a posteriori information by making use of the measurement data in the simulation tool to enhance the simulation results. We have Fig. 1. Diagram for the training process In this work we compare two different Multi Layer Neural Network learning algorithms that have been tested to learn the correspondence between simulations and measurements. The performances of the learning algorithms are evaluated by comparing the convergence speed and the prediction error. These error statistics are empirical absolute mean error, standard deviation and root mean square error between measured and predicted values. The real absolute mean error with 95% confidence interval is also given. The remainder of this paper is structured as follows: In section 2 we give an overview of the neural network and A. Multilayer Perceptron Neural Network(MLP-NN) Figure 3 shows the configuration of a multilayer perceptron with one hidden layer and one output layer. In this MLP each neuron is connected to each neuron in the next layer. The output of the MLP is described by the following equation: !! N N X X I H (1) wij xi FH wjp yp = FO i=0 j=0 for p = 1, 2 . . . N Fig. 2. Diagram for the prediction process Where: H • wjp represents the weights from neuron j in the hidden layer to the pth output neuron th • xi represents the i element in the input layer • FH and FO represent the activation functions in the hidden and output layers respectively. I • wij are the weights from neuron i in the input layer to the neuron j in the hidden layer. The learning phase consists of the minimization of the cost function defined by: N specially the Multi Layer Perceptron Neural Networks. We present in this section the fundamental aspects of the two training algorithms tested in our study (Back Propagation and Levenberg-Marquardt). In section 3, we present the different performances keys used for algorithm comparison. Next, in Section 4 two types of results are given. The first type deals with learning algorithms performances (real and learning errors) and the second type deals with the application of the two algorithms to the radio network prediction enhancement. Each type of results is followed by discussions. In the last section we conclude this work. E= N 1X 1X 2 (yp − dp ) = e 2 p=1 2 p=1 p (2) Where yp is the pth output value calculated by the network and dp represents the expected value. B. Back propagation algorithm The Back propagation algorithm is a simple gradient descend technique that minimizes the mean squared error defined in equation 2. The output of each neuron in the output layer is a function of the weights w. To minimize the cost function we must have: ∂E(w) =0 for all i (3) ∂wi The update rule in the back propagation algorithm is: ∇E(w) = II. T HE ANN OVERVIEW Neural Networks are very powerful tools that have been used in many domains [6]. They can be applied to any problem of prediction, classification or control where there exists sufficient amount of observation data. Neural Networks owe this popularity to their powerful capacity to model extremely complex non linear functions and to their relatively easy use that is based on training-prediction cycles. In the training cycle the user presents to the network a training pattern that contains a set of inputs and a set of desired outputs that corresponds to the inputs. Next, in prediction cycle, the network is supposed to be able to supply the user with output values corresponding to input values that it has never seen thanks to its generalization capability. A good generalization is generally a complex task where the training set must contain sufficient information representing all cases so that a valid general mapping between outputs and inputs can be found. Furthermore, the training sets must be sufficiently large and representative of all cases [7] [8] [9]. w(t + 1) = w(t) + ∇w(t) Where: ∇w(t) = −η ∂E(t) ∂w(t) (4) (5) Where: • η represents the learning parameter C. Levenberg-Marquardt algorithm This algorithm is a blend of gradient descend and GaussNewton iteration. The gradient and the Hessian of the cost function can be written as: ∇E(w) = J(w)T e(w) (6) ∇2 E(w) ≈ J(w)T J(w) = H (7) Where: • (J(w))ij = ∂ei ∂wj is the jaccobian matrix of E w.r.t w. Fig. 3. Fully connected multi-layer perceptron with one hidden layer mean error is computed by: To find the minimum of the cost function we write ∇E(w) = 0 µemp = We expand the gradient of E using Taylor series around the current state: ∇E{w(t + 1)} = ∇E{w(t)} + {w(t + 1) − w(t)}∇2 E{w(t)} + . . . (8) This leads to: w(t + 1) = w(t) + ∇2 E(w(t))−1 ∇E(w(t)) (9) By combining the gradient method and Gauss Newton method we obtain the update rule for the Levenberg-Marquardt algorithm: w(t + 1) = w(t) − (H − ηI)−1 ∇E(w(t)) (10) III. E VALUATION OF TRAINING ALGORITHMS PERFORMANCES As the generalization property is very important in practical prediction situation, the selection of training examples is important to achieve good generalization. The set of training examples is separated into two disjoint sets that are training set and test set. In the first stage we present the training set to the MLP-NN in order to perform its learning based on one of the training algorithms presented above. Next the performances of the two algorithms are compared based on the test set. A. Empirical Mean Error The empirical absolute error between the measured and predicted value is computed with: Ei = N X predicted measured | yip − yip | (11) p=1 where i represents sample index in the test set and N the dimension of the NN output vector. The Empirical Absolute M 1 X Ei M i=1 (12) where M is the size test set. B. Empirical Standard Deviation The standard deviation is determined from the empirical absolute error (eq.11) and the empirical absolute mean error (eq.12): v ! u M u 1 X t 2 2 E − M µemp (13) σemp = M − 1 i=1 i C. Empirical Root Mean Error The Empirical Root Mean Square error (RMS) is given by: q 2 (14) RM S = µ2emp + σemp D. Real Mean Error The errors described above are empirical because they are calculated using the test set. According to the Empirical Risk Minimization (ERM) theory the empirical error converge to the real error when M −→ ∞. To approximate the real absolute error we have considered that the error Ei is normally distributed with a mean equal to µemp and a standard deviation equal to σemp . This consideration is justified because Ei is a sum of random variables that tends to a Gaussian distribution according to the central limit theorem. With regard to this consideration the 95% confidence interval for the real absolute mean error is given by: r µemp (1 − µemp ) (15) µreal = µemp ± 2.26 M where M is the size of the test set and 2.26 is the value given by a Student distribution with M − 1 degrees of freedom for a confidence of 95%. 3G Radio Access Network and performance indicators IV. R ESULTS A. Neural Networks Algorithms Comparison Results 1) Results: We have applied the proposed method to simulations of a static third generation (3G) RNP tool: Odyssee. This tool is used to predict Radio Acess Network (RAN) performance and to guide operators during the deployment and optimization phases. In the context of 3G networks, the main inputs of this tool are the network configuration (i.e. site locations, NodeB parameters, power settings, antenna specifications, etc.), the propagation model with correlated shadowing, the traffic distribution, service parameters (average target signal-to-interference ratio UL/DL, throughput, etc.), and Radio Resource Management (RRM) parameters (macro diversity, admission control, load control thresholds). The outputs consist of the UpLink (UL) and DownLink (DL) transmission powers, interference, but also performance indicators such as access and dropping probabilities, average throughput, etc., which are calculated by taking into account mobility and RRM algorithms. Odyssee performs Monte Carlo simulations: Positioning the mobiles, checking their conditions of access to the RAN and calculating their UL/DL powers and transmission characteristics such as throughput, etc. In this work, Odyssee operates in the static mode, i.e., each Monte Carlo draw is an independent snapshot of the network. The performance indicators we are interested in are situated at the base station level: Emission powers, received interference levels, call blocking rates, access rates, etc. An important property of these indicators is that they are inter-correlated. If we have excessive transmission powers, interference levels will increase, increasing call blocking rates and reducing access rates to the RAN. Measurements of the indicators at the base station level which we compare to simulation results can either be found at the Operation and Maintenance Center (OMC) or can be obtained via capture tools that work on interfaces such as Iub for 3G (Figure 4). The proposed scheme is tested on the Paris UMTS network. The MLP used has one hidden layer having the same number of hidden neurons as the input and output layers. In each layer there are 20 neurons (two histograms of 10 bins each). The simulations yield two variables: Uplink Load (ULL) and Downlink Load (DLL) for each station in the network. Table I shows the statistics of the two training algorithms. The training set consists of 1000 patterns while for test purpose a set of 500 patterns was used. TABLE I E RROR S TATISTICS BP 0.019 0.003 0.02 [0.016, 0.021] µemp σemp RMS µreal 95% LM 0.4 0.01 0.05 [0.03, 0.04] In Figure 5 we plot the convergence speed of the two algorithms used. This figure represents the value of the error, function of the number of iteration. This error was calculated on the training set. Note that this figure was given in the logarithmic scale. Convergence Speed 10 0 −10 −20 Error Fig. 4. −30 −40 −50 −60 Back Propagation Algorithm Levenberg Marquardt Algorithm −70 −80 0 5 10 Fig. 5. 15 20 25 Iteration 30 35 40 45 Convergence Speed Comparison 2) Discussions: Measurements taken on a 3G radio network were used to design neural network based model with different training algorithms. This neural network uses also the simulations results of a 3G radio network planning tool to learn the statistical differences between measurements and simulations. This learned relation will be used to enhance the prediction quality of the planning tool. To compare the tested learning algorithms, the empirical error between neural network results and measurements is computed on a test set. For the Back propagation algorithm a RMS error of 0.02 is obtained whilst a RMS error of 0.05 was obtained when applying the Levenberg-Marquardt algorithm (Table I). In the other hand, the error computed on the training set is equal to 10−31 for the Levenberg-Marquardt algorithm and equal to 10−3 for the Back propagation algorithm (Figure 5). This shows that, despite the LM training error is lower than BP training error, we have best result for the BP training algorithm (test error). This means that with the LM algorithm we have learned all details on the relation between simulations and measurements, noise included, this leads to overfitting training. To overcome this overfitting we must stop the LM training in the 20th iteration. At this point we can say that the LM algorithm gives more accurate results but we must be aware to stop training in the right time in order to ovoid overfitting training. In the other hand, it is very important to consider the execution time and algorithmic complexity. For example for our study we have calculated the execution time for 100 iterations and we have observed that BP algorithm is faster than LM algorithm (2s and 19s). For algorithm complexity it’s clear that the LM algorithm is more complex and needs more memory because of the Hessian computation. the LM (KS distance =0.12) than for the BP (KS distance =0.069). All these results are summarized in table II. TABLE II C OMPARISON OF 2-D K OLMOGOROV S MIRNOV D ISTANCE Mesurement-Simulation Mesurement-NN Result BP 0.737 0.12 LM 0.737 0.069 Fig. 6. Radio Network Prediction Enhancement Results with BP Fig. 7. Radio Network Prediction Enhancement Results with LM B. Radio Network Prediction Enhancement Results 1) Learning Results: In this section we will apply the scheme proposed in [5] using the Back propagation algorithm and the LM algorithm. The method is applied to 2 stations and 2 indicators (ULL and DLL) with 20 bins in each histogram. For illustration purposes, we give the results by scatter plots. Each point in the scatter plot corresponds to a snapshot data sample. The vertical axis corresponds to the ULL and the horizontal axis to the DLL. The numerical results of the comparison are given by the 2-D Kolmogorov Smirnov test (KS-test) [10] that determines the difference between two datasets. According to this test, two datasets are supposed to be coming from the same distribution if the value returned by the KS-Test is close to zero. If the two datasets are far from each other the KS-Test return a value near to 1. Figure 6 shows the results of the learning phase where indepenent data are used to train the MLP with Back Propagation algorithm. Figure 7 shows the results of the learning phase where indepenent data are used to train the MLP with Levenberg-Marquardt algorithm. The black points correspond to measurements, dark grey points to simulations and the light grey points correspond to the outputs of the proposed scheme. Note that these are the results of the learning phase that are obtained by passing the simulation data set of the learning phase through the trained MLP. As shown in these figures the MLP trained by LM algorithm performs its learning on data well than when trained by BP algorithm (the predicted-data distribution is more close to the measurement distribution for 2) Generalization Results: The real interest of the scheme proposed in [5] lies in using the generalization capability of the ANN: the input simulation data of the prediction phase corresponds to a case that has never been encountered by the ANN during the learning phase (such as different traffic, different network parameters, etc.) and the ANN succeeds in correcting the simulations of the new case. However, the generalization is not always easy to achieve since it is an extrapolation operation that requires special attention. In this section we will compare the genealization results for the two learning algorithms in the case of tilt change. A new simulation data is generated by modifying the (mechanical) tilt of an antenna from 0◦ to 10◦ . In practical cases, we would not dispose of measurement data for such a case and we would like to be able to obtain accurate predictions. This ability saves us from the cost of going out to the field to modify the tilt and to collect the measurements. Thus, the new simulation data corresponding to a tilt value of 10◦ is passed through the proposed scheme (all the coefficients and parameters of the trained ANN is preserved as well as those of the learning algorithm) and we obtain the results depicted in table III. The scatter plots are given in figure 8 for LM algorithm generalization and in figure 9 for BP algorithm generalization. TABLE III C OMPARISON OF G ENERALIZATION R ESULTS Mesurement-Simulation (10◦ ) Mesurement-NN Result (10◦ ) BP 0.730 0.050 LM 0.730 0.23 Fig. 9. Generalization results with BP Perceptrons neural network. In this paper we compare two learning algorithms to train the MLP: Back-Propagation and Levenberg-Marquardt. Results have shown that we obtain a lower real error for back-propagation than for LevenbergMarquardt despite of a higher training error. This result is illustrated by the comparison of performances of the two algorithms when applied to the enhancement of the prediction of a radio network planning tool. The generalization case taken into account in this study is the case of up tilt change. R EFERENCES Fig. 8. Generalization results with LM 3) Discussions: As show in the previous section, the LM algorithm has best results for learning phase. This result is logic because we have shown in the learning algorithm comparison that the LM algorithm has a learning error lower than the BP algorithm. In the other hand, the table III, shows that the BP algorithm has a generalization results best that the LM algorithm. This result is also predictable because we have shown previously that the test error for the BP algorithm is lower than the LM algorithm test error. V. C ONCLUSION In previous work we have proposed to combine a-priori information with the a-posteriori knowledge to enhance prediction results of a radio network planning tool. This combination is based on a learning system using MultiLayer [1] M. Centeno and M. Reyes, “So you have your model: what to do next? a tutorial on simulation output analysis,” Simulation Conference Proceedings, vol. 1, pp. 23–29, Dec 1998. [2] N. Andrea, C. Cecchetti, and A. Lipparwi, “Fast prediction of the performance of wireless links by simulation trained neural network,” Proc. IEEE MTT-S Digest 2000, pp. 429–432, 2000. [3] T. Balandier, A. Caminada., V. Lemoine, and F. Alexandre, “170 mhz field strength prediction in urban environments using neural nets,” in Proc. IEEE inter. Symp. Personal, Indoor and mobile Radio Comm. , vol. 1, 1995, pp. 120–124. [4] P. Chang and W.-H. Yang, “Environment-adaptation mobile radio propagation prediction using radial basis function neural networks,” in Proc. IEEE trans. Vech. Techno , vol. 46, 1997, pp. 155–160. [5] Z. Nouir, B. Sayrac, and B. Fourestié, “Enhancement of network planning tool predictions through measurements,” in Proc. IEEE trans. Vech. Techno , vol. 46, 2006, pp. 155–160. [6] S. Haykin, Neural Networks: A comprehensive foundation, 2nd ed. Prentice Hall, 1998. [7] D. H. Wolpert, “The mathematics of generalization,” in The Proceedings of the SFI/CNLS Workshop on Formal Approaches to Supervised Learning, Santa Fe Institute Studies in the Sciences of Complexity, vol. 20. MA: Addison-Wesley, 1994. [8] ——, “The lack of A priori distinctions between learning algorithms,” Neural Computation, vol. 8, no. 7, pp. 1341–1390, 1996. [9] ——, “The existence of A priori distinctions between learning algorithms,” Neural Computation, vol. 8, no. 7, pp. 1391–1420, 1996. [Online]. Available: citeseer.ist.psu.edu/88072.html [10] G. Fasano and A. Franceschini, “A multidimensional version of the kolmogorov-smirnov test,” Royal Astronomical Society, vol. 255, pp. 155–170, 1987.
© Copyright 2026 Paperzz