Applied Soft Computing 11 (2011) 3690–3696 Contents lists available at ScienceDirect Applied Soft Computing journal homepage: www.elsevier.com/locate/asoc Assessing the contribution of variables in feed forward neural network Mukta Paliwal, Usha A. Kumar ∗ Shailesh J. Mehta School of Management, Indian Institute of Technology, Powai, Mumbai 400 076, India a r t i c l e i n f o Article history: Received 29 January 2010 Received in revised form 4 May 2010 Accepted 30 January 2011 Available online 2 March 2011 Keywords: Network weights Prediction Regression Relative importance Simulation Multicollinearity a b s t r a c t Neural networks are being used as tools for data analysis in a variety of applications. Neural network technique is cited in the literature as a ‘Black Box’ approach and criticized most for the lack of interpretability of the network weights obtained during the model building process. Some attempts have been made in the past in this direction to interpret the contributions of explanatory variables in prediction problem using the weights of neural network. In the present study, a new approach is proposed to interpret the relative importance of independent variables in neural networks and a comparison with the connection weight approach is presented. The performance of this approach is studied for various data characteristics and is found to be a better method in comparison to a well known method existing in the literature. An example from behavioral science is also considered to illustrate how the performance of the proposed approach translates to a real life situation. © 2011 Elsevier B.V. All rights reserved. 1. Introduction Recently, neural networks and regression technique are being used interchangeably for the problems of prediction. Regression technique gathers strength mainly from its ability of drawing inference and in providing explanation of how the input variables are contributing for prediction of dependent variable. However neural network technique is cited in the literature as a ‘Black Box’ approach and criticized most for the lack of interpretability of the network weights obtained during the model building process. This arises from the fact that the internal characteristic of a trained network is a set of numbers that becomes very difficult to relate back to the application in a meaningful fashion. The study of contributions of input variables for neural network models has been attempted only by few authors. For example, Duh et al. [1] have presented a methodology to understand how an input descriptor is correlated to the predicted output by the network. They have tested this methodology on three datasets and have shown that the results correspond well to the partial least squares method interpretation for linear models. Olden and Jackson [2] have reviewed a number of methods (Neural Interpretation Diagram, Garson’s algorithm, and Sensitivity Analysis) and demonstrated utility of these methods for interpreting neural network connection weights. They have also proposed a randomization procedure for testing the statistical significance of these contributions ∗ Corresponding author. Tel.: +91 22 25767786; fax: +91 22 25722872. E-mail addresses: [email protected] (M. Paliwal), [email protected] (U.A. Kumar). 1568-4946/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.asoc.2011.01.040 in terms of individual connection weights and overall influence of each of the input variables. This randomization procedure enables the removal of null neural network connections and non-significant input variables and thereby aid in the interpretation of the neural network by reducing its complexity. Gaudart et al. [3] have attempted to interpret the neural network weights through the role of empirical variances of weights obtained from bootstrapped samples for feed forward neural networks. Papadokonstantakis et al. [4] have compared four different methods, namely, information theory (ITSS), the Bayesian framework(ARD), the analysis of network’s weights (GIM) and the sequential omission of variables (SZW) for inferring variable influence in neural network. They have concluded that the SZW/GIM algorithms are in general more robust compared to the ARD on the basis of four simulated data sets and a real life example. Kemp et al. [5] have also proposed a method to achieve understanding of relative importance of input variables by systematically altering input data patterns and termed it as holdback input randomization method. They have validated this method using a simulated data set in which the relationship between input parameters and output parameters were completely known. Perturbation analysis for determining the order of influence of the elements in the input vector on the output vector is discussed by Azamathulla et al. [6]. The approach is illustrated through neural networks for prediction of scour pattern downstream of flip bucket spillway. The analyses of the results suggest that each variable in the input vector (discharge intensity, head, tail water depth, bed material, lip angle and radius of the bucket) influences the depth of scour in different ways. An attempt was also made by Guven and Gunal [7] to assess the influence of the input parameters on perfor- M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696 mance of neural network modeling using sensitivity analysis. They have presented an explicit neural network formulation for predicting local scour downstream of grade control structures based on neural networks. Some of the authors have presented review and comparison of various approaches of interpreting importance and contribution of input variables in neural network models. Sung [8] has compared and analyzed the effectiveness of fuzzy curves, sensitivity analysis and change of mean square error methods of ranking input importance. They have concluded that the fuzzy curve method performs better than the other two methods if the training samples are representative. Gevrey et al. [9] have used a real life data from ecology to compare seven different methods which can give relative contribution of input variables in artificial neural networks. Olden et al. [10] have provided a comparison of nine different methodologies for assessing variable contributions in artificial neural networks using simulated data exhibiting defined numeric relationships between a response variable and a set of predictor variables. An approach called connection weight (CW) method proposed by Olden and Jackson [2] is shown to outperform other approaches in quantifying importance of variables by Olden et al. [10]. They have shown that the CW method is the least biased among others. This method has also been used in comparison to other available methods in assigning the relative contribution of input variables in prediction of the output by Watts and Worner [11]. In the present work, we propose an alternative method to rank the independent variables in order of its importance in predicting the output variable. This method has also been compared with the connection weight approach [2]. The proposed method is based on the interquartile range of empirical distribution of the network weights obtained from training the network. Monte Carlo simulation is used to generate data sets with different characteristics like amount of noise and sample size at various levels. Further experiments are carried out to investigate the performance of the proposed approach in case of data with multicollinearity. This would help us in gaining some insight into the performance of the proposed approach for a variety of data characteristics. The proposed approach is also used to obtain the relative importance of predictor variables in case of a real life data in order to illustrate the application of the new approach. In the next section, we have defined the proposed approach along with brief descriptions of other methods used in this study to obtain the relative importance of variables. In Section 3, the experimental design and data generation procedure are discussed. Details of data analysis and discussion of results are provided in Section 4 and Section 5 respectively. Section 6 discusses the real life example followed by conclusion in the last section. 2. Relative importance of independent variables By the importance of predictor variables, we mean the relative contribution of each of the variables for prediction of a dependent variable. In this section, the proposed approach is defined after a brief description of the connection weight method in order to rank the importance of independent variables in predicting the output variable for neural network. A brief review of different measures used in case of multiple regression analysis for finding the relative importance of independent variables is also provided. 2.1. Connection weight method The connection weight [2] method calculates the sum of product of raw weights of the connection from input node to hidden nodes with the connection from hidden node to output nodes for all input nodes. The larger the sum for a given input node, the more 3691 the importance of the corresponding input variable. The relative importance of a given input variable can be defined as RI = h (WI H WH O ) (1) H=1 where RI is the relative importance of the input variable I, h is the total number of hidden nodes, WI H is the weight of the connection between input node I and hidden node H, and WH O is the weight of the connection between hidden node H and output node. This approach is based on estimates of network weights obtained by training the network only once. It is observed that these estimates of weights may vary with the change in the initial weights used for starting the training process. This aspect is taken into consideration in the proposed method where the network is been trained a number of times, each time starting with an initial random weight. 2.2. The proposed method In the proposed method, the empirical distribution of the network connection weights of the neural network model is obtained by training the network a number of times (say t). Every time the training is carried out, the initial weight for training the network is chosen randomly. The average of interquartile range of each of the network weights from input node to hidden nodes is calculated for all hidden units for a given input node. For this reason, the method is referred to as interquartile range (IQR) method. The relative importance of a given input variable is defined as 1 Interquartile Range (WI h h RI = H) (2) H=1 where RI is the relative importance of the input variable I, Interquartile range is the difference between the first and third quartiles of the distribution of network weight (WI H ) connecting input node I to hidden node H and h is the total number of hidden nodes in the hidden layer. The larger the value of RI for a given input node, the more the importance of the corresponding input variable. 2.3. Relative importance of independent variables in case of regression Standardized regression coefficient has been suggested as a measure of relative importance by many authors (e.g. [12,13]). For each predictor variable, standardized regression coefficient is obtained by standardizing the variable to zero mean and unit standard deviation before multiple regression is carried out. However when variables are correlated, the confounding influence of correlations between predictor variables makes standardized regression coefficients uninterpretable in terms of relative importance [14]. Lebreton et al. [15] have suggested exercising caution when interpreting standardized regression coefficients as indicators of relative importance for predictors with even moderate levels of collinearity. Researchers have developed other indices that accurately reflect the contribution of predictor variables to the prediction of a dependent variable when variables are correlated. The two most recent methodologies are dominance analysis [16,17] and Johnson’s epsilon (also referred to as relative weights; [14]). Johnson and LeBreton [18] have briefly reviewed the history of research on predictor importance in multiple regression and provided a recent review of the literature on different measures of relative importance. In the present work, we have used standardized regression coefficient as a measure of relative importance in case of no mul- 3692 M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696 high level of multicollinearity. The regression vector length   is chosen to be 30 where  ={1, 2, 3, 4}. Table 1 Error variance values for different levels of SNR and  .   SNR 30 1 9 25 4. Data analysis 100 5.5 1.82 1.1 10 3.33 2 ticollinearity and dominance analysis when multicollinearity is present in the data. Dominance analysis determines the dominance of one predictor over another by comparing their additional contributions across all subset models. Dominance analysis is chosen over other recent measures for reasons stated in Lebreton et al. [15]. 3. Data generation Regression analysis and neural network model are used to analyze the same data under different experimental conditions in order to understand the importance of independent variables in neural networks as explained by regression coefficients. Monte Carlo simulation method is used to simulate the data sets from a linear functional model of the form (3) satisfying all the assumptions of multiple regression model. Yj = ˇ0 + p i=1 ˇi Xij + εj , for i = 1, 2, ...., p and j = 1, 2, ..., n (3) where Yj is the dependent variable, Xij s are p independent variables and generated from Normal distribution with mean zero and variance 1, ˇi s are the parameters of model (3), εj is the random error component generated from Normal distribution with mean zero and constant variance 2 . The variation in the random noise present in the generated data is measured in terms of the signal-to-noise ratio (SNR) and is defined as SNR =  2 (4) where the squared length of the regression parameter vector, 2 =   is the measure of the strength of the signal and the values chosen for this study are 30 and 100. In this study, we have used p = 4 and  ={1, 2, 3, 4}and{1, 3, 5, 8} corresponding to two values of   (30 and 100). The three values for SNR used in this study are 1, 9 and 25 resulting in three levels of SNR and are referred to as high noise, medium noise and low noise levels. Table 1 shows the error variance ( 2 ) for each combination of SNR and regression vector length  . A rule of thumb given by Sawyer [19] is used for deciding the sample size (n) to variable ratio. Accordingly, three values of n chosen for this study are 60, 510 and 1680 and are considered as small, medium and large sample sizes respectively. Thus the data conditions considered in this study are 3 levels of sample sizes and 3 levels of noise and two sets of regression coefficients resulting in two different regression vector lengths. Various values considered here for different parameters like noise levels, sample size etc. represent variety of real life applications in past research. Additional data sets are generated in order to have multicollineraity in the data using singular value decomposition as given in Delaney and Chatterjee [20]. These data sets are then used to conduct additional experiments to see the impact of multicollinearity in interpreting the relative importance of independent variables in neural networks using the proposed approach and connection weight method. In order to vary the degree of collinearity in the data, three values of condition number index (CI) namely, 2, 10 and 50 are chosen. These three values are referred as low, medium and Data matrices with the required data characteristics (three levels of sample size, three levels of random noise, two sets of regression coefficients) were generated using Monte Carlo simulation and analyzed by both regression analysis and neural network techniques. We have standardized all the independent variables in each data set before starting the data analysis. The entire analysis was carried out using SAS 9.1 software package [21]. 4.1. Neural network training and architecture For neural network, the proposed approach obtains the empirical distribution of network weights by training the neural network a number of times (say t) initializing the weights each time the training is carried out. The choice of t depends on the data sets under consideration and for a given situation the appropriate value needs to be determined iteratively. In the present study, initial experiments were carried out to choose an appropriate value of t by training the network a number of times with t = 50, 100 and 200. It is observed that the weight distribution corresponding to t = 50 could capture the distribution reasonably well and hence this value was chosen for this experiment. A three layer feed forward neural network is considered and trained using Levenberg–Marquadt (LM) algorithm. LM algorithm [22–25] is specifically designed for the squared error function and is used because of its faster convergence as compared to the most commonly used back propagation algorithm. In the last few years, the LM algorithm taken from the optimization field is becoming increasingly popular within the neural networks community. It is an advanced non-linear optimization algorithm used to train the network and it gives a good exchange between the speed of the Newton algorithm and the stability of the steepest descent method. This algorithm works in a way that when error size is large, the algorithm approximates gradient decent, whereas as error size gets smaller, the algorithm becomes the Gauss–Newton method which is faster and more efficient [26]. The three layered feed forward network contains one input layer, one hidden layer and one output layer. The input layer contains 4 nodes corresponding to 4 independent variables and the output layer contains one node corresponding to one dependent variable. Initially neural network is trained using 1 node in the hidden layer. Subsequently, experiments are carried out for more number of hidden nodes (h) namely, 3 and 6 in order to study the sensitivity of the proposed approach to the number of hidden units. The hyperbolic tangent activation function is used at the hidden layer and the identity activation function is used at the output layer. The most commonly used weight decay regularizer has been used for controlling the neural network training. Different weight decay parameter values were selected for different experimental conditions in order to get optimum generalization for any given experimental condition. 4.2. Computational aspects For neural networks, the relative importance of the independent variables is obtained using the proposed method and the connection weight approach. The predictor variables are ranked based on their relative importance from the two methods separately for each of the experimental conditions (2 sets of regression coefficients, 3 noise levels and 3 sample sizes). Regression analysis is performed and ˇ coefficients corresponding to the independent variables are estimated using least squares estimation method for M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696 3693 Table 2 Mean and standard deviations of rank correlation coefficients. Size Noise Method name   = 30   = 100 h=1 1 1 2 3 2 1 2 3 3 1 2 3 IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW h=3 h=6 h=1 h=3 h=6 Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD 0.87 0.87 0.99 1.00 1.00 1.00 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.14 0.18 0.04 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.55 0.27 0.73 0.56 0.92 0.72 0.75 0.44 1.00 0.83 0.99 0.98 0.96 0.60 0.99 0.93 1.00 1.00 0.41 0.63 0.13 0.46 0.10 0.34 0.17 0.55 0.00 0.22 0.04 0.06 0.08 0.44 0.04 0.19 0.00 0.00 0.31 0.49 0.80 0.59 0.86 0.72 0.23 0.49 0.99 0.67 0.98 0.91 0.93 0.49 0.97 0.91 0.99 0.97 0.56 0.48 0.16 0.41 0.12 0.39 0.53 0.56 0.05 0.37 0.11 0.15 0.16 0.42 0.11 0.15 0.04 0.12 0.89 0.93 0.98 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.14 0.13 0.06 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.23 0.53 0.93 0.66 0.95 0.80 0.79 0.50 1.00 0.89 0.99 0.97 0.94 0.39 1.00 0.97 1.00 1.00 0.46 0.38 0.13 0.35 0.09 0.35 0.07 0.51 0.00 0.16 0.04 0.08 0.26 0.63 0.00 0.08 0.00 0.00 0.45 0.29 0.80 0.51 0.94 0.61 0.63 0.51 0.99 0.62 0.96 0.93 0.97 0.48 0.99 0.85 1.00 0.97 0.29 0.58 0.09 0.52 0.16 0.40 0.34 0.41 0.04 0.43 0.10 0.14 0.15 0.53 0.04 0.18 0.00 0.07 each of the designs. The relative importance of the independent variables is obtained in the form of the magnitude of the ˇ coefficients as the data sets are already standardized. Predictor variables are then ranked based on their relative importance as suggested by magnitude of standardized regression coefficients. In order to account for sampling fluctuations, the entire analysis is performed 30 times. The degree of similarity between the estimated ranked importance and true ranked importance (as estimated by standardized regression coefficients) of the independent variables is calculated using Spearman’s rank correlation for both the methods. The average of rank correlation coefficient for 30 replications from both the methods is then compared for each of the experimental conditions and the results are presented in the next section. To study the impact of multicollinearity, the following design parameters are considered: three levels of multicollinearity, three levels of noise and three levels of sample size. Relative importance of independent variables is obtained from the proposed method and the connection weight approach for each of the above mentioned experimental conditions. As the data exhibit multicollinearity, we have used dominance analysis (general dominance proposed by Azen and Budescu [17]) to determine the relative importance of predictors from the regression analysis. To examine the degree of agreement among various importance indices, Spearman’s rank correlation between the rank ordering achieved by each of the importance measures (i.e., proposed approach and connection weight approach for neural networks) and the rank ordering achieved by the general dominance is calculated. The entire analysis is performed 30 times for each of the experimental conditions. The average of rank correlation coefficient of 30 replications obtained from both the methods is then compared for each of the experimental conditions and the results are presented in the result section. It is also of interest to know whether the proposed method agrees with the rank orderings of standardized regression coefficients. Similar experiments are carried out to achieve this objective and the results are presented in the next section. 5. Results The relative importance of independent variables for the first experiment with no multicollinearity is obtained by the proposed method. The performance of the proposed method is also compared to connection weight method for all the designs considered. For comparing the performance of the two methods, the predictor variables were ranked based on their relative importance from both the methods for each of the 18 experimental conditions. The degree of similarity between the estimated and true ranked importance of the independent variables was assessed using the Spearman’s rank correlation coefficient as discussed in previous section. As we have considered 3 values for the number of hidden units in the hidden layer, the whole analysis is performed 3 times corresponding to h = 1, 3 and 6. Mean and standard deviations of these rank correlations over 30 replications for each of the 18 data conditions are presented in Table 2 for the three values of hidden units (h = 1, 3, and 6). The rank correlations between the true ranked importance and the estimated ranked importance are obtained for both the proposed and connection weight methods. The mean and standard deviations of these rank correlations are presented graphically in Fig. 1(a)–(c) respectively for h = 1, 3 and 6 for both the sets of ˇ coefficients. This figure is the error bar chart where each bar represents the mean for a given experimental condition and its standard deviation is represented by T shaped error bar on top of the mean bar. From these figures, it is clear that the mean of rank correlation coefficient is greater or equal for the proposed method as compared to that of weight connection approach for all the experimental designs except for the case of small sample size and high noise considered in this study. The error bar represents one standard deviation of the mean and is comparatively less for the proposed approach and hence the stability of this method is better than that of the weight connection method in almost all the design cases. It can also be observed from these figures that the proposed method performs better than the connection weight approach with the increase in the number of hidden units in training the neural network model. When the sample size is large, the proposed method is in close agreement with the ranked importance of independent variables given by regression technique. For the experiment with multicollinear data, the relative importance of independent variables for the three layer feed forward neural networks is obtained by the proposed IQR method and connection weight approach for all the designs considered. For comparing the performance of the two methods, the predictor variables were ranked based on their relative importance from the dominance approach for each of the experimental conditions. The M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696 (a) h=1 IQR 1.2 CW 1 0.8 0.6 0.4 0.2 0 High Med Low High Small Med Low High Med Medium Low Spearman's Rank Correlation Coefficient 0.8 0.6 0.4 0.2 0 Med Low High Small Med Low High Medium Med CW 0.40 0.20 0.00 Small High Med Medium Med Low Low High Med Low High Med Low High Medium Med Low Large ' β β =100 CW 1.4 1.2 1 0.8 0.6 0.4 0.2 0 Med Low High Med Small 0.60 Low 0 High 0.80 Med 0.2 High Spearman's Rank Correlation Coefficients Spearman's Rank Correlation Coefficient Low 1.00 High 0.4 (c) h=6 ' IQR 0.6 Large β β = 30 1.20 0.8 IQR CW 1 High CW 1 (b) h=3 ' IQR IQR 1.2 Small β β = 30 1.2 ' β β =100 Large Spearman's Rank Correlation Coefficient Spearman's Rank Correlation Coefficient β'β =30 Spearman's Rank Correlation Coefficient 3694 Low High Med Medium Low Large ' β β =100 IQR 1.20 CW 1.00 0.80 0.60 0.40 0.20 0.00 High Med Low Small High Med Medium Low High Med Low Large Large Fig. 1. Spearman’s rank correlation coefficients between the true ranked importance and the estimated ranked importance of the independent variables for the proposed approach and weight connection method. degree of similarity between the rank ordering obtained from both the methods for neural network and true ranked importance (as obtained from general dominance) of the independent variables was assessed using the Spearman’s rank correlation coefficient. Mean and standard deviations of these rank correlations over 30 replications for each of the experimental conditions are presented in Table 3 for three values of hidden units (h = 1, 3, and 6). For the case of large sample size, Table 3 indicates that the rankings given by the proposed approach do not get affected by the presence of multicollinearity. However with the decrease in sample size, multicollinearity present in the data seems to weaken the strength of the proposed approach. It can be observed that the no. of hidden units has an impact on the performance of both the methods. Though the rank correlation coefficients decrease with an increase in the no. of hidden units, the performance of the interquartile range method is better than the connection weight approach particularly for medium and large sample sizes. In order to know whether the proposed method agrees with the rank orderings of standardized regression coefficients, Spearman’s rank correlation coefficients for rank orderings of the proposed method and connection weight approach with the rank orderings of the standardized ˇ coefficients are obtained. For convenience, rank correlation coefficient between the rank orderings of general dominance and standardized regression coefficient is also calculated. M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696 3695 Table 3 Mean and SD of rank correlation coefficients of rank ordering of the two approaches with general dominance. Size Noise Method h=1 h=3 CI = 2 Small High Med Low Medium High Med Low Large High Med Low IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW CI = 10 CI = 50 h=6 CI = 2 CI = 10 CI = 50 CI = 2 CI = 10 CI = 50 Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD 0.92 0.95 0.95 0.96 0.96 0.96 1.00 1.00 0.99 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.10 0.09 0.09 0.08 0.08 0.08 0.00 0.00 0.04 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.54 0.72 0.56 0.81 0.57 0.80 0.88 0.90 0.85 0.89 0.89 0.91 0.99 0.99 0.99 0.99 0.99 0.99 0.39 0.30 0.43 0.25 0.40 0.24 0.11 0.10 0.21 0.16 0.16 0.16 0.04 0.04 0.04 0.04 0.04 0.04 0.37 0.73 0.54 0.60 0.63 0.79 0.93 0.95 0.91 0.93 0.93 0.94 0.98 0.99 1.00 1.00 0.97 0.97 0.52 0.30 0.36 0.39 0.40 0.20 0.11 0.10 0.15 0.10 0.13 0.11 0.06 0.05 0.00 0.00 0.07 0.07 0.83 0.48 0.92 0.80 0.95 0.91 0.97 0.66 1.00 0.99 1.00 1.00 0.99 0.79 1.00 1.00 1.00 1.00 0.21 0.50 0.14 0.26 0.09 0.14 0.12 0.40 0.00 0.05 0.00 0.00 0.05 0.35 0.00 0.00 0.00 0.00 0.61 0.76 0.50 0.79 0.55 0.80 0.93 0.87 0.90 0.93 0.91 0.90 0.99 0.97 0.97 0.96 1.00 0.91 0.42 0.31 0.43 0.17 0.45 0.21 0.11 0.26 0.14 0.10 0.11 0.14 0.05 0.12 0.07 0.08 0.00 0.19 0.41 0.64 0.57 0.68 0.47 0.63 0.92 0.68 0.90 0.40 0.91 0.63 0.97 0.64 0.93 0.69 0.95 0.61 0.57 0.44 0.43 0.42 0.48 0.40 0.11 0.35 0.16 0.60 0.11 0.42 0.08 0.41 0.11 0.36 0.10 0.49 0.45 0.49 0.91 0.71 0.95 0.75 1.00 0.47 1.00 0.93 0.99 0.99 0.95 0.56 1.00 1.00 1.00 1.00 0.50 0.49 0.23 0.43 0.09 0.31 0.00 0.41 0.00 0.20 0.04 0.04 0.13 0.43 0.00 0.00 0.00 0.00 0.51 0.22 0.51 0.51 0.70 0.82 0.83 0.75 0.89 0.88 0.89 0.87 0.95 0.76 0.99 0.87 0.99 0.93 0.35 0.54 0.52 0.50 0.31 0.19 0.16 0.28 0.15 0.17 0.19 0.19 0.09 0.46 0.04 0.28 0.05 0.13 0.55 0.51 0.53 0.56 0.49 0.47 0.91 0.44 0.88 0.37 0.85 0.49 0.89 0.48 0.88 0.43 0.48 0.29 0.37 0.45 0.43 0.38 0.45 0.39 0.10 0.59 0.17 0.55 0.17 0.54 0.23 0.50 0.17 0.55 0.38 0.59 Table 4 Mean and SD of rank correlation coefficients of rank ordering of the two approaches with standardized ˇ-coefficient. Size Noise Method h=1 h=3 CI = 2 Small High Med Low Medium High Med Low Large High Med Low IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW IQR CW CI = 10 CI = 50 h=6 CI = 2 CI = 10 CI = 50 CI = 2 CI = 10 CI = 50 Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD 0.97 0.97 0.98 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.08 0.07 0.06 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.83 0.98 0.82 0.96 0.84 0.97 0.99 0.99 0.97 0.99 0.99 0.99 1.00 1.00 1.00 1.00 1.00 1.00 0.21 0.06 0.23 0.08 0.19 0.07 0.05 0.04 0.07 0.04 0.04 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.59 0.81 0.81 0.73 0.84 0.74 0.99 0.99 0.98 1.00 0.99 0.97 0.99 1.00 1.00 1.00 1.00 1.00 0.43 0.23 0.22 0.36 0.20 0.35 0.05 0.04 0.06 0.00 0.04 0.08 0.04 0.00 0.00 0.00 0.00 0.00 0.85 0.47 0.97 0.82 0.99 0.95 0.97 0.66 1.00 0.99 1.00 1.00 0.99 0.79 1.00 1.00 1.00 1.00 0.18 0.46 0.09 0.31 0.04 0.13 0.12 0.40 0.00 0.05 0.00 0.00 0.05 0.35 0.00 0.00 0.00 0.00 0.81 0.78 0.78 0.94 0.76 0.93 1.00 0.92 0.98 1.00 0.99 0.97 1.00 0.99 1.00 0.99 1.00 0.91 0.21 0.31 0.35 0.12 0.29 0.13 0.00 0.26 0.06 0.00 0.04 0.09 0.00 0.05 0.00 0.05 0.00 0.19 0.65 0.66 0.80 0.69 0.78 0.73 0.99 0.69 0.99 0.45 0.98 0.63 0.97 0.65 0.94 0.66 0.99 0.62 0.42 0.40 0.27 0.46 0.34 0.35 0.04 0.41 0.04 0.61 0.06 0.43 0.08 0.38 0.09 0.40 0.07 0.52 0.47 0.39 0.93 0.71 0.98 0.81 1.00 0.47 1.00 0.93 0.99 0.99 0.95 0.56 1.00 1.00 1.00 1.00 0.44 0.56 0.22 0.37 0.06 0.30 0.00 0.41 0.00 0.20 0.04 0.04 0.13 0.43 0.00 0.00 0.00 0.00 0.76 0.21 0.71 0.61 0.77 0.85 0.83 0.74 0.96 0.92 0.99 0.91 0.98 0.79 0.99 0.87 1.00 0.95 0.23 0.57 0.37 0.50 0.26 0.30 0.11 0.29 0.12 0.13 0.05 0.23 0.06 0.45 0.04 0.28 0.00 0.13 0.70 0.53 0.79 0.62 0.87 0.54 0.96 0.45 0.97 0.44 0.96 0.51 0.89 0.49 0.87 0.43 0.51 0.29 0.22 0.45 0.25 0.45 0.17 0.39 0.08 0.57 0.08 0.53 0.08 0.59 0.23 0.47 0.16 0.58 0.39 0.60 Mean and standard deviations of these rank correlations over 30 replications for each of the experimental conditions are presented in Table 4 for three values of hidden units (h = 1, 3, and 6). Following are the observations which can be inferred from this table for the case of medium and large sample sizes. It can be clearly seen that the rank orderings given by IQR method and standardized regression coefficients are in close agreement with each other. Further the rank orderings given by standardized regression coefficients more or less reproduce the rank orderings obtained from dominance analysis even when the level of multicollinearity is high. Thus, for medium and large sample sizes, standardized regression coefficients themselves are able to largely provide the relative importance of variables. This supports the use of IQR method even for multicollinear data for determining the relative importance of variables provided the sample size is not small. 6. Illustration The IQR method along with the connection weight approach is implemented on a real life data set that was used earlier by Barber [27] and Azen and Budescu [28]. The data set containing a total of 689 records was used to determine the effects of parental indicators (measures of mother’s and father’s parenting styles) on youth outcomes (measures of psychological adjustment). In this study we have used a subset of this data to find the relative importance of the parents’ parenting style in predicting the youth outcome. The dependent variable used in this study is youth depression (y) and the independent variables (parental indicators) are mother’s acceptance (x1 ), father’s acceptance (x2 ), mother’s psychological control (x3 ) and father’s psychological control (x4 ). The details of this data set can be found in Barber [27]. The correlation among the independent variables is shown in Table 5. The condition index of this data matrix is 3.45 which does Table 5 Correlation coefficient matrix. Var x1 x2 x3 x4 x1 x2 x3 x4 1 0.5394 −0.4944 −0.2215 1 −0.2211 −0.4003 1 0.6184 1 3696 M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696 Table 6 Rank orderings of independent variables. Var IQR Rank IQR ˇ Rank ˇ CW Rank CW DOM Rank DOM x1 x2 x3 x4 0.1 0.32 0.34 0.04 3 2 1 4 −0.03 −0.07 0.08 −0.01 3 2 1 4 −0.04 −0.13 0.14 −0.02 3 2 1 4 0.05 0.06 0.07 0.02 3 2 1 4 not indicate high multi-collinearity problem in the data. The rankings of all four independent variables from the proposed approach, connection weight approach, dominance analysis and standardized regression coefficients are shown in Table 6. It can be seen that all the methods are in complete agreement with each other in terms of its rank orderings. This data set corresponds to medium sample size with low level of multicollinearity and the results agree with our findings from the simulation experiment. 7. Conclusion In this study, an attempt is made in the direction of overcoming the criticism of neural network being described as a black box approach. A new method is proposed to rank the independent variables in order of its importance in predicting the dependent variable. Ranking is carried out using the interquartile range of the network weights obtained from training the network. Higher the magnitude of interquartile range of the network weights, higher the importance of independent variables in predicting the dependent variable. This study also establishes the validity of the interpretation of network weights by the proposed method under various levels of sample size, amount of noise and extent of multicollinearity using simulation. The performance of the proposed method is also studied for varying no. of nodes in the hidden layer of the network. In order to see the effectiveness of the proposed method, the performance of this method is compared with that of connection weight approach using simulated data sets having various data characteristics. With the increase in the no. of hidden units, the proposed method is seen to perform much better than the connection weight method. For moderate to large sample size, the proposed method is demonstrated to be generally better or at least as good as the connection weight method. The method performs reasonably well even for data with multicollinearity provided the sample size is not small. Further in the proposed method, fluctuations in the rankings over different replications are less than the connection weight approach and hence the method seems more reliable. The method is illustrated on a real life data set and the results are in line with its simulation counterpart. Acknowledgement We would like to thank Prof. Brian K. Barber for providing us with the data set used in the illustration. We would also like to thank Prof. Razia Azen for providing with the SAS file to perform dominance analysis. References [1] M.S. Duh, A.M. Walker, J.Z. Ayanian, Epidemiologic interpretation of artificial neural networks, American Journal of Epidemiology 147 (1998) 1112–1122. [2] J.D. Olden, D.A. Jackson, Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks, Ecological Modelling 154 (2002) 135–150. [3] J. Gaudart, B. Giusiano, L. Huiart, Comparison of the performance of multilayer perceptron and linear regression for epidemiological data, Computational Statistics and Data Analysis 44 (2004) 547–570. [4] S. Papadokonstantakis, A. Lygeros, S.V. Jacobsson, Comparison of recent methods for inference of variable influence in neural networks, Neural Networks 19 (2006) 500–513. [5] S.J. Kemp, P. Zaradic, F. Hansen, An approach for determining relative input parameter importance and significance in artificial neural networks, Ecological Modelling 204 (2007) 326–334. [6] H.M.D. Azamathulla, A.A.B. Ghani, N.A. Zakaria, C.C. Kiat, L.C. Siang, Knowledge extraction from trained neural network scour models, Modern Applied Science 2 (4) (2008) 52–62. [7] A. Guven, M. Gunal, Prediction of local scour downstream of grade-control structures using neural networks, Journal of Hydraulic Engineering 134 (11) (2008) 1656–1660. [8] A.H. Sung, Ranking importance of input parameters of neural networks, Expert Systems with Applications 15 (1997) 405–411. [9] M. Gevrey, I. Dimopoulos, S. Lek, Review and comparison of methods to study the contribution of variables in artificial neural network models, Ecological Modelling 160 (2003) 249–264. [10] J.D. Olden, M.K. Joy, R.G. Death, An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data, Ecological Modelling 178 (2004) 389–397. [11] M.J. Watts, S.P. Worner, Using artificial neural networks to determine the relative contribution of abiotic factors influencing the establishment of insect pest pecies, Ecological Informatics 3 (2008) 64–74. [12] A.A. Afifi, V. Clarke, Computer aided Multivariate Analysis, 2nd ed., Van Nostrand Reinhold, New York, 1990. [13] E.J. Pedhazur, Multiple Regression in Behavioral Research: Explanation and Prediction, 2nd ed., Holt, Rinehart and Winston, New York, 1982. [14] J.W. Johnson, A heuristic method for estimating the relative weight of predictor variables in multiple regression, Multivariate Behavioral Research 35 (2000) 1–19. [15] J.M. Lebreton, R.E. Ployhart, R.T. Ladd, A Monte Carlo comparison of relative importance methodologies, Organizational Research Methods 7 (3) (2004) 258–282. [16] D.V. Budescu, Dominance analysis: a new approach to the problem of relative importance of predictors in multiple regression, Psychological Bulletin 114 (1993) 542–551. [17] R. Azen, D.V. Budescu, The dominance analysis approach for comparing predictors in multiple regression, Psychological Methods 8 (2003) 129–148. [18] J.W. Johnson, J.M. LeBreton, History and use of relative importance indices in organizational research, Organizational Research Methods 7 (2004) 238–257. [19] R. Sawyer, Sample size and the accuracy of predictions made from multiple regression equations, Journal of Educational Statistics 7 (2) (1982) 91–104. [20] N.J. Delaney, S. Chatterjee, Use of the bootstarp and cross validation in ridge regression, Journal of Business and Economic Statistics 4 (2) (1986) 255–262. [21] SAS Institute Inc., Statistical Analysis System. Ver 9.1., SAS Institute Inc., Cary, NC, 2007. [22] K. Levenberg, A method for the solution of certain problems in least squares, Quarterly Applied Mathematics 2 (1944) 164–168. [23] D.W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, Journal of Society of Industrial Mathematics 11 (1963) 431–441. [24] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford Univ. Press, London, U.K., 1995. [25] B.M. Wilamowski, S. Iplikci, O. Kaynak, M.O. Efe, An algorithm for fast convergence in training neural networks, Neural Networks, Proceedings, International Joint Conference on Neural Network 3 (2001) 1778–1782. [26] M.T. Hagan, M. Menhaj, Training feedforward networks with the Marquardt algorithm, IEEE Transactions on Neural Networks 5 (6) (1994) 989–993. [27] B.K. Barber, Parental psychological control: revisiting a neglected construct, Child Development 67 (1996) 3296–3319. [28] R. Azen, D.V. Budescu, Comparing predictors in multivariate regression models: an extension of dominance analysis, Journal of Educational and Behavioral Statistics 31 (2) (2006) 157–180.
© Copyright 2026 Paperzz