19.pdf

Applied Soft Computing 11 (2011) 3690–3696
Contents lists available at ScienceDirect
Applied Soft Computing
journal homepage: www.elsevier.com/locate/asoc
Assessing the contribution of variables in feed forward neural network
Mukta Paliwal, Usha A. Kumar ∗
Shailesh J. Mehta School of Management, Indian Institute of Technology, Powai, Mumbai 400 076, India
a r t i c l e
i n f o
Article history:
Received 29 January 2010
Received in revised form 4 May 2010
Accepted 30 January 2011
Available online 2 March 2011
Keywords:
Network weights
Prediction
Regression
Relative importance
Simulation
Multicollinearity
a b s t r a c t
Neural networks are being used as tools for data analysis in a variety of applications. Neural network technique is cited in the literature as a ‘Black Box’ approach and criticized most for the lack of interpretability
of the network weights obtained during the model building process. Some attempts have been made in
the past in this direction to interpret the contributions of explanatory variables in prediction problem
using the weights of neural network. In the present study, a new approach is proposed to interpret the
relative importance of independent variables in neural networks and a comparison with the connection
weight approach is presented. The performance of this approach is studied for various data characteristics and is found to be a better method in comparison to a well known method existing in the literature.
An example from behavioral science is also considered to illustrate how the performance of the proposed
approach translates to a real life situation.
© 2011 Elsevier B.V. All rights reserved.
1. Introduction
Recently, neural networks and regression technique are being
used interchangeably for the problems of prediction. Regression
technique gathers strength mainly from its ability of drawing inference and in providing explanation of how the input variables are
contributing for prediction of dependent variable. However neural
network technique is cited in the literature as a ‘Black Box’ approach
and criticized most for the lack of interpretability of the network
weights obtained during the model building process. This arises
from the fact that the internal characteristic of a trained network
is a set of numbers that becomes very difficult to relate back to the
application in a meaningful fashion.
The study of contributions of input variables for neural network
models has been attempted only by few authors. For example, Duh
et al. [1] have presented a methodology to understand how an
input descriptor is correlated to the predicted output by the network. They have tested this methodology on three datasets and
have shown that the results correspond well to the partial least
squares method interpretation for linear models. Olden and Jackson [2] have reviewed a number of methods (Neural Interpretation
Diagram, Garson’s algorithm, and Sensitivity Analysis) and demonstrated utility of these methods for interpreting neural network
connection weights. They have also proposed a randomization procedure for testing the statistical significance of these contributions
∗ Corresponding author. Tel.: +91 22 25767786; fax: +91 22 25722872.
E-mail addresses: [email protected] (M. Paliwal), [email protected]
(U.A. Kumar).
1568-4946/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.asoc.2011.01.040
in terms of individual connection weights and overall influence of
each of the input variables. This randomization procedure enables
the removal of null neural network connections and non-significant
input variables and thereby aid in the interpretation of the neural network by reducing its complexity. Gaudart et al. [3] have
attempted to interpret the neural network weights through the
role of empirical variances of weights obtained from bootstrapped
samples for feed forward neural networks.
Papadokonstantakis et al. [4] have compared four different
methods, namely, information theory (ITSS), the Bayesian framework(ARD), the analysis of network’s weights (GIM) and the
sequential omission of variables (SZW) for inferring variable influence in neural network. They have concluded that the SZW/GIM
algorithms are in general more robust compared to the ARD on the
basis of four simulated data sets and a real life example. Kemp et
al. [5] have also proposed a method to achieve understanding of
relative importance of input variables by systematically altering
input data patterns and termed it as holdback input randomization
method. They have validated this method using a simulated data
set in which the relationship between input parameters and output
parameters were completely known.
Perturbation analysis for determining the order of influence of
the elements in the input vector on the output vector is discussed
by Azamathulla et al. [6]. The approach is illustrated through neural
networks for prediction of scour pattern downstream of flip bucket
spillway. The analyses of the results suggest that each variable in
the input vector (discharge intensity, head, tail water depth, bed
material, lip angle and radius of the bucket) influences the depth of
scour in different ways. An attempt was also made by Guven and
Gunal [7] to assess the influence of the input parameters on perfor-
M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696
mance of neural network modeling using sensitivity analysis. They
have presented an explicit neural network formulation for predicting local scour downstream of grade control structures based on
neural networks.
Some of the authors have presented review and comparison of
various approaches of interpreting importance and contribution of
input variables in neural network models. Sung [8] has compared
and analyzed the effectiveness of fuzzy curves, sensitivity analysis
and change of mean square error methods of ranking input importance. They have concluded that the fuzzy curve method performs
better than the other two methods if the training samples are representative. Gevrey et al. [9] have used a real life data from ecology to
compare seven different methods which can give relative contribution of input variables in artificial neural networks. Olden et al. [10]
have provided a comparison of nine different methodologies for
assessing variable contributions in artificial neural networks using
simulated data exhibiting defined numeric relationships between
a response variable and a set of predictor variables. An approach
called connection weight (CW) method proposed by Olden and
Jackson [2] is shown to outperform other approaches in quantifying importance of variables by Olden et al. [10]. They have shown
that the CW method is the least biased among others. This method
has also been used in comparison to other available methods in
assigning the relative contribution of input variables in prediction
of the output by Watts and Worner [11].
In the present work, we propose an alternative method to rank
the independent variables in order of its importance in predicting
the output variable. This method has also been compared with the
connection weight approach [2]. The proposed method is based
on the interquartile range of empirical distribution of the network weights obtained from training the network. Monte Carlo
simulation is used to generate data sets with different characteristics like amount of noise and sample size at various levels.
Further experiments are carried out to investigate the performance
of the proposed approach in case of data with multicollinearity. This
would help us in gaining some insight into the performance of the
proposed approach for a variety of data characteristics. The proposed approach is also used to obtain the relative importance of
predictor variables in case of a real life data in order to illustrate
the application of the new approach.
In the next section, we have defined the proposed approach
along with brief descriptions of other methods used in this study to
obtain the relative importance of variables. In Section 3, the experimental design and data generation procedure are discussed. Details
of data analysis and discussion of results are provided in Section 4
and Section 5 respectively. Section 6 discusses the real life example
followed by conclusion in the last section.
2. Relative importance of independent variables
By the importance of predictor variables, we mean the relative
contribution of each of the variables for prediction of a dependent
variable. In this section, the proposed approach is defined after a
brief description of the connection weight method in order to rank
the importance of independent variables in predicting the output
variable for neural network. A brief review of different measures
used in case of multiple regression analysis for finding the relative
importance of independent variables is also provided.
2.1. Connection weight method
The connection weight [2] method calculates the sum of product of raw weights of the connection from input node to hidden
nodes with the connection from hidden node to output nodes for
all input nodes. The larger the sum for a given input node, the more
3691
the importance of the corresponding input variable. The relative
importance of a given input variable can be defined as
RI =
h
(WI
H WH O )
(1)
H=1
where RI is the relative importance of the input variable I, h is the
total number of hidden nodes, WI H is the weight of the connection
between input node I and hidden node H, and WH O is the weight
of the connection between hidden node H and output node.
This approach is based on estimates of network weights
obtained by training the network only once. It is observed that
these estimates of weights may vary with the change in the initial
weights used for starting the training process. This aspect is taken
into consideration in the proposed method where the network is
been trained a number of times, each time starting with an initial
random weight.
2.2. The proposed method
In the proposed method, the empirical distribution of the network connection weights of the neural network model is obtained
by training the network a number of times (say t). Every time the
training is carried out, the initial weight for training the network
is chosen randomly. The average of interquartile range of each of
the network weights from input node to hidden nodes is calculated for all hidden units for a given input node. For this reason,
the method is referred to as interquartile range (IQR) method. The
relative importance of a given input variable is defined as
1
Interquartile Range (WI
h
h
RI =
H)
(2)
H=1
where RI is the relative importance of the input variable I, Interquartile range is the difference between the first and third quartiles of
the distribution of network weight (WI H ) connecting input node I
to hidden node H and h is the total number of hidden nodes in the
hidden layer.
The larger the value of RI for a given input node, the more the
importance of the corresponding input variable.
2.3. Relative importance of independent variables in case of
regression
Standardized regression coefficient has been suggested as a
measure of relative importance by many authors (e.g. [12,13]).
For each predictor variable, standardized regression coefficient is
obtained by standardizing the variable to zero mean and unit standard deviation before multiple regression is carried out. However
when variables are correlated, the confounding influence of correlations between predictor variables makes standardized regression
coefficients uninterpretable in terms of relative importance [14].
Lebreton et al. [15] have suggested exercising caution when interpreting standardized regression coefficients as indicators of relative
importance for predictors with even moderate levels of collinearity.
Researchers have developed other indices that accurately reflect
the contribution of predictor variables to the prediction of a
dependent variable when variables are correlated. The two most
recent methodologies are dominance analysis [16,17] and Johnson’s epsilon (also referred to as relative weights; [14]). Johnson
and LeBreton [18] have briefly reviewed the history of research on
predictor importance in multiple regression and provided a recent
review of the literature on different measures of relative importance.
In the present work, we have used standardized regression
coefficient as a measure of relative importance in case of no mul-
3692
M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696
high level of multicollinearity. The regression vector length ␤ ␤ is
chosen to be 30 where ␤ ={1, 2, 3, 4}.
Table 1
Error variance values for different levels of SNR and ␤ ␤.
␤ ␤
SNR
30
1
9
25
4. Data analysis
100
5.5
1.82
1.1
10
3.33
2
ticollinearity and dominance analysis when multicollinearity is
present in the data. Dominance analysis determines the dominance
of one predictor over another by comparing their additional contributions across all subset models. Dominance analysis is chosen
over other recent measures for reasons stated in Lebreton et al. [15].
3. Data generation
Regression analysis and neural network model are used to analyze the same data under different experimental conditions in order
to understand the importance of independent variables in neural
networks as explained by regression coefficients.
Monte Carlo simulation method is used to simulate the data
sets from a linear functional model of the form (3) satisfying all the
assumptions of multiple regression model.
Yj = ˇ0 +
p
i=1
ˇi Xij + εj , for
i = 1, 2, ...., p and
j = 1, 2, ..., n
(3)
where Yj is the dependent variable, Xij s are p independent variables
and generated from Normal distribution with mean zero and variance 1, ˇi s are the parameters of model (3), εj is the random error
component generated from Normal distribution with mean zero
and constant variance 2 . The variation in the random noise present
in the generated data is measured in terms of the signal-to-noise
ratio (SNR) and is defined as
SNR =
␤␤
2
(4)
where the squared length of the regression parameter vector,
␤2 = ␤ ␤ is the measure of the strength of the signal and the
values chosen for this study are 30 and 100. In this study, we have
used p = 4 and ␤ ={1, 2, 3, 4}and{1, 3, 5, 8} corresponding to two
values of ␤ ␤ (30 and 100). The three values for SNR used in this
study are 1, 9 and 25 resulting in three levels of SNR and are referred
to as high noise, medium noise and low noise levels. Table 1 shows
the error variance ( 2 ) for each combination of SNR and regression
vector length ␤ ␤.
A rule of thumb given by Sawyer [19] is used for deciding the
sample size (n) to variable ratio. Accordingly, three values of n chosen for this study are 60, 510 and 1680 and are considered as small,
medium and large sample sizes respectively. Thus the data conditions considered in this study are 3 levels of sample sizes and
3 levels of noise and two sets of regression coefficients resulting
in two different regression vector lengths. Various values considered here for different parameters like noise levels, sample size etc.
represent variety of real life applications in past research.
Additional data sets are generated in order to have multicollineraity in the data using singular value decomposition as given
in Delaney and Chatterjee [20]. These data sets are then used to conduct additional experiments to see the impact of multicollinearity
in interpreting the relative importance of independent variables
in neural networks using the proposed approach and connection
weight method. In order to vary the degree of collinearity in the
data, three values of condition number index (CI) namely, 2, 10 and
50 are chosen. These three values are referred as low, medium and
Data matrices with the required data characteristics (three
levels of sample size, three levels of random noise, two sets of
regression coefficients) were generated using Monte Carlo simulation and analyzed by both regression analysis and neural network
techniques. We have standardized all the independent variables in
each data set before starting the data analysis. The entire analysis
was carried out using SAS 9.1 software package [21].
4.1. Neural network training and architecture
For neural network, the proposed approach obtains the empirical distribution of network weights by training the neural network a
number of times (say t) initializing the weights each time the training is carried out. The choice of t depends on the data sets under
consideration and for a given situation the appropriate value needs
to be determined iteratively. In the present study, initial experiments were carried out to choose an appropriate value of t by
training the network a number of times with t = 50, 100 and 200.
It is observed that the weight distribution corresponding to t = 50
could capture the distribution reasonably well and hence this value
was chosen for this experiment.
A three layer feed forward neural network is considered and
trained using Levenberg–Marquadt (LM) algorithm. LM algorithm
[22–25] is specifically designed for the squared error function and
is used because of its faster convergence as compared to the most
commonly used back propagation algorithm. In the last few years,
the LM algorithm taken from the optimization field is becoming
increasingly popular within the neural networks community. It is
an advanced non-linear optimization algorithm used to train the
network and it gives a good exchange between the speed of the
Newton algorithm and the stability of the steepest descent method.
This algorithm works in a way that when error size is large, the
algorithm approximates gradient decent, whereas as error size gets
smaller, the algorithm becomes the Gauss–Newton method which
is faster and more efficient [26].
The three layered feed forward network contains one input
layer, one hidden layer and one output layer. The input layer contains 4 nodes corresponding to 4 independent variables and the
output layer contains one node corresponding to one dependent
variable. Initially neural network is trained using 1 node in the
hidden layer. Subsequently, experiments are carried out for more
number of hidden nodes (h) namely, 3 and 6 in order to study
the sensitivity of the proposed approach to the number of hidden
units. The hyperbolic tangent activation function is used at the hidden layer and the identity activation function is used at the output
layer. The most commonly used weight decay regularizer has been
used for controlling the neural network training. Different weight
decay parameter values were selected for different experimental
conditions in order to get optimum generalization for any given
experimental condition.
4.2. Computational aspects
For neural networks, the relative importance of the independent
variables is obtained using the proposed method and the connection weight approach. The predictor variables are ranked based
on their relative importance from the two methods separately for
each of the experimental conditions (2 sets of regression coefficients, 3 noise levels and 3 sample sizes). Regression analysis is
performed and ˇ coefficients corresponding to the independent
variables are estimated using least squares estimation method for
M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696
3693
Table 2
Mean and standard deviations of rank correlation coefficients.
Size
Noise
Method name
␤ ␤ = 30
␤ ␤ = 100
h=1
1
1
2
3
2
1
2
3
3
1
2
3
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
h=3
h=6
h=1
h=3
h=6
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
0.87
0.87
0.99
1.00
1.00
1.00
0.99
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
0.14
0.18
0.04
0.00
0.00
0.00
0.04
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.55
0.27
0.73
0.56
0.92
0.72
0.75
0.44
1.00
0.83
0.99
0.98
0.96
0.60
0.99
0.93
1.00
1.00
0.41
0.63
0.13
0.46
0.10
0.34
0.17
0.55
0.00
0.22
0.04
0.06
0.08
0.44
0.04
0.19
0.00
0.00
0.31
0.49
0.80
0.59
0.86
0.72
0.23
0.49
0.99
0.67
0.98
0.91
0.93
0.49
0.97
0.91
0.99
0.97
0.56
0.48
0.16
0.41
0.12
0.39
0.53
0.56
0.05
0.37
0.11
0.15
0.16
0.42
0.11
0.15
0.04
0.12
0.89
0.93
0.98
0.99
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
0.14
0.13
0.06
0.05
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.23
0.53
0.93
0.66
0.95
0.80
0.79
0.50
1.00
0.89
0.99
0.97
0.94
0.39
1.00
0.97
1.00
1.00
0.46
0.38
0.13
0.35
0.09
0.35
0.07
0.51
0.00
0.16
0.04
0.08
0.26
0.63
0.00
0.08
0.00
0.00
0.45
0.29
0.80
0.51
0.94
0.61
0.63
0.51
0.99
0.62
0.96
0.93
0.97
0.48
0.99
0.85
1.00
0.97
0.29
0.58
0.09
0.52
0.16
0.40
0.34
0.41
0.04
0.43
0.10
0.14
0.15
0.53
0.04
0.18
0.00
0.07
each of the designs. The relative importance of the independent
variables is obtained in the form of the magnitude of the ˇ coefficients as the data sets are already standardized. Predictor variables
are then ranked based on their relative importance as suggested by
magnitude of standardized regression coefficients.
In order to account for sampling fluctuations, the entire analysis is performed 30 times. The degree of similarity between the
estimated ranked importance and true ranked importance (as estimated by standardized regression coefficients) of the independent
variables is calculated using Spearman’s rank correlation for both
the methods. The average of rank correlation coefficient for 30
replications from both the methods is then compared for each of
the experimental conditions and the results are presented in the
next section.
To study the impact of multicollinearity, the following design
parameters are considered: three levels of multicollinearity, three
levels of noise and three levels of sample size. Relative importance of independent variables is obtained from the proposed
method and the connection weight approach for each of the above
mentioned experimental conditions. As the data exhibit multicollinearity, we have used dominance analysis (general dominance
proposed by Azen and Budescu [17]) to determine the relative
importance of predictors from the regression analysis. To examine the degree of agreement among various importance indices,
Spearman’s rank correlation between the rank ordering achieved
by each of the importance measures (i.e., proposed approach and
connection weight approach for neural networks) and the rank
ordering achieved by the general dominance is calculated. The
entire analysis is performed 30 times for each of the experimental
conditions. The average of rank correlation coefficient of 30 replications obtained from both the methods is then compared for each
of the experimental conditions and the results are presented in the
result section.
It is also of interest to know whether the proposed method
agrees with the rank orderings of standardized regression coefficients. Similar experiments are carried out to achieve this objective
and the results are presented in the next section.
5. Results
The relative importance of independent variables for the first
experiment with no multicollinearity is obtained by the proposed
method. The performance of the proposed method is also compared
to connection weight method for all the designs considered. For
comparing the performance of the two methods, the predictor variables were ranked based on their relative importance from both the
methods for each of the 18 experimental conditions. The degree of
similarity between the estimated and true ranked importance of
the independent variables was assessed using the Spearman’s rank
correlation coefficient as discussed in previous section. As we have
considered 3 values for the number of hidden units in the hidden
layer, the whole analysis is performed 3 times corresponding to
h = 1, 3 and 6. Mean and standard deviations of these rank correlations over 30 replications for each of the 18 data conditions are
presented in Table 2 for the three values of hidden units (h = 1, 3,
and 6).
The rank correlations between the true ranked importance and
the estimated ranked importance are obtained for both the proposed and connection weight methods. The mean and standard
deviations of these rank correlations are presented graphically in
Fig. 1(a)–(c) respectively for h = 1, 3 and 6 for both the sets of ˇ
coefficients. This figure is the error bar chart where each bar represents the mean for a given experimental condition and its standard
deviation is represented by T shaped error bar on top of the mean
bar.
From these figures, it is clear that the mean of rank correlation
coefficient is greater or equal for the proposed method as compared
to that of weight connection approach for all the experimental
designs except for the case of small sample size and high noise
considered in this study. The error bar represents one standard
deviation of the mean and is comparatively less for the proposed
approach and hence the stability of this method is better than that
of the weight connection method in almost all the design cases. It
can also be observed from these figures that the proposed method
performs better than the connection weight approach with the
increase in the number of hidden units in training the neural network model. When the sample size is large, the proposed method
is in close agreement with the ranked importance of independent
variables given by regression technique.
For the experiment with multicollinear data, the relative importance of independent variables for the three layer feed forward
neural networks is obtained by the proposed IQR method and
connection weight approach for all the designs considered. For
comparing the performance of the two methods, the predictor variables were ranked based on their relative importance from the
dominance approach for each of the experimental conditions. The
M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696
(a) h=1
IQR
1.2
CW
1
0.8
0.6
0.4
0.2
0
High
Med
Low
High
Small
Med
Low
High
Med
Medium
Low
Spearman's Rank Correlation Coefficient
0.8
0.6
0.4
0.2
0
Med
Low
High
Small
Med
Low
High
Medium
Med
CW
0.40
0.20
0.00
Small
High
Med
Medium
Med
Low
Low
High
Med
Low
High
Med
Low
High
Medium
Med
Low
Large
'
β β =100
CW
1.4
1.2
1
0.8
0.6
0.4
0.2
0
Med
Low
High
Med
Small
0.60
Low
0
High
0.80
Med
0.2
High
Spearman's Rank Correlation Coefficients
Spearman's Rank Correlation Coefficient
Low
1.00
High
0.4
(c) h=6
'
IQR
0.6
Large
β β = 30
1.20
0.8
IQR
CW
1
High
CW
1
(b) h=3
'
IQR
IQR
1.2
Small
β β = 30
1.2
'
β β =100
Large
Spearman's Rank Correlation Coefficient
Spearman's Rank Correlation Coefficient
β'β =30
Spearman's Rank Correlation Coefficient
3694
Low
High
Med
Medium
Low
Large
'
β β =100
IQR
1.20
CW
1.00
0.80
0.60
0.40
0.20
0.00
High
Med
Low
Small
High
Med
Medium
Low
High
Med
Low
Large
Large
Fig. 1. Spearman’s rank correlation coefficients between the true ranked importance and the estimated ranked importance of the independent variables for the proposed
approach and weight connection method.
degree of similarity between the rank ordering obtained from both
the methods for neural network and true ranked importance (as
obtained from general dominance) of the independent variables
was assessed using the Spearman’s rank correlation coefficient.
Mean and standard deviations of these rank correlations over 30
replications for each of the experimental conditions are presented
in Table 3 for three values of hidden units (h = 1, 3, and 6).
For the case of large sample size, Table 3 indicates that the
rankings given by the proposed approach do not get affected by
the presence of multicollinearity. However with the decrease in
sample size, multicollinearity present in the data seems to weaken
the strength of the proposed approach. It can be observed that the
no. of hidden units has an impact on the performance of both the
methods. Though the rank correlation coefficients decrease with
an increase in the no. of hidden units, the performance of the
interquartile range method is better than the connection weight
approach particularly for medium and large sample sizes.
In order to know whether the proposed method agrees with the
rank orderings of standardized regression coefficients, Spearman’s
rank correlation coefficients for rank orderings of the proposed
method and connection weight approach with the rank orderings of
the standardized ˇ coefficients are obtained. For convenience, rank
correlation coefficient between the rank orderings of general dominance and standardized regression coefficient is also calculated.
M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696
3695
Table 3
Mean and SD of rank correlation coefficients of rank ordering of the two approaches with general dominance.
Size
Noise
Method
h=1
h=3
CI = 2
Small
High
Med
Low
Medium
High
Med
Low
Large
High
Med
Low
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
CI = 10
CI = 50
h=6
CI = 2
CI = 10
CI = 50
CI = 2
CI = 10
CI = 50
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
0.92
0.95
0.95
0.96
0.96
0.96
1.00
1.00
0.99
0.99
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
0.10
0.09
0.09
0.08
0.08
0.08
0.00
0.00
0.04
0.04
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.54
0.72
0.56
0.81
0.57
0.80
0.88
0.90
0.85
0.89
0.89
0.91
0.99
0.99
0.99
0.99
0.99
0.99
0.39
0.30
0.43
0.25
0.40
0.24
0.11
0.10
0.21
0.16
0.16
0.16
0.04
0.04
0.04
0.04
0.04
0.04
0.37
0.73
0.54
0.60
0.63
0.79
0.93
0.95
0.91
0.93
0.93
0.94
0.98
0.99
1.00
1.00
0.97
0.97
0.52
0.30
0.36
0.39
0.40
0.20
0.11
0.10
0.15
0.10
0.13
0.11
0.06
0.05
0.00
0.00
0.07
0.07
0.83
0.48
0.92
0.80
0.95
0.91
0.97
0.66
1.00
0.99
1.00
1.00
0.99
0.79
1.00
1.00
1.00
1.00
0.21
0.50
0.14
0.26
0.09
0.14
0.12
0.40
0.00
0.05
0.00
0.00
0.05
0.35
0.00
0.00
0.00
0.00
0.61
0.76
0.50
0.79
0.55
0.80
0.93
0.87
0.90
0.93
0.91
0.90
0.99
0.97
0.97
0.96
1.00
0.91
0.42
0.31
0.43
0.17
0.45
0.21
0.11
0.26
0.14
0.10
0.11
0.14
0.05
0.12
0.07
0.08
0.00
0.19
0.41
0.64
0.57
0.68
0.47
0.63
0.92
0.68
0.90
0.40
0.91
0.63
0.97
0.64
0.93
0.69
0.95
0.61
0.57
0.44
0.43
0.42
0.48
0.40
0.11
0.35
0.16
0.60
0.11
0.42
0.08
0.41
0.11
0.36
0.10
0.49
0.45
0.49
0.91
0.71
0.95
0.75
1.00
0.47
1.00
0.93
0.99
0.99
0.95
0.56
1.00
1.00
1.00
1.00
0.50
0.49
0.23
0.43
0.09
0.31
0.00
0.41
0.00
0.20
0.04
0.04
0.13
0.43
0.00
0.00
0.00
0.00
0.51
0.22
0.51
0.51
0.70
0.82
0.83
0.75
0.89
0.88
0.89
0.87
0.95
0.76
0.99
0.87
0.99
0.93
0.35
0.54
0.52
0.50
0.31
0.19
0.16
0.28
0.15
0.17
0.19
0.19
0.09
0.46
0.04
0.28
0.05
0.13
0.55
0.51
0.53
0.56
0.49
0.47
0.91
0.44
0.88
0.37
0.85
0.49
0.89
0.48
0.88
0.43
0.48
0.29
0.37
0.45
0.43
0.38
0.45
0.39
0.10
0.59
0.17
0.55
0.17
0.54
0.23
0.50
0.17
0.55
0.38
0.59
Table 4
Mean and SD of rank correlation coefficients of rank ordering of the two approaches with standardized ˇ-coefficient.
Size
Noise
Method
h=1
h=3
CI = 2
Small
High
Med
Low
Medium
High
Med
Low
Large
High
Med
Low
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
IQR
CW
CI = 10
CI = 50
h=6
CI = 2
CI = 10
CI = 50
CI = 2
CI = 10
CI = 50
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
Mean
SD
0.97
0.97
0.98
0.99
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
1.00
0.08
0.07
0.06
0.05
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.83
0.98
0.82
0.96
0.84
0.97
0.99
0.99
0.97
0.99
0.99
0.99
1.00
1.00
1.00
1.00
1.00
1.00
0.21
0.06
0.23
0.08
0.19
0.07
0.05
0.04
0.07
0.04
0.04
0.04
0.00
0.00
0.00
0.00
0.00
0.00
0.59
0.81
0.81
0.73
0.84
0.74
0.99
0.99
0.98
1.00
0.99
0.97
0.99
1.00
1.00
1.00
1.00
1.00
0.43
0.23
0.22
0.36
0.20
0.35
0.05
0.04
0.06
0.00
0.04
0.08
0.04
0.00
0.00
0.00
0.00
0.00
0.85
0.47
0.97
0.82
0.99
0.95
0.97
0.66
1.00
0.99
1.00
1.00
0.99
0.79
1.00
1.00
1.00
1.00
0.18
0.46
0.09
0.31
0.04
0.13
0.12
0.40
0.00
0.05
0.00
0.00
0.05
0.35
0.00
0.00
0.00
0.00
0.81
0.78
0.78
0.94
0.76
0.93
1.00
0.92
0.98
1.00
0.99
0.97
1.00
0.99
1.00
0.99
1.00
0.91
0.21
0.31
0.35
0.12
0.29
0.13
0.00
0.26
0.06
0.00
0.04
0.09
0.00
0.05
0.00
0.05
0.00
0.19
0.65
0.66
0.80
0.69
0.78
0.73
0.99
0.69
0.99
0.45
0.98
0.63
0.97
0.65
0.94
0.66
0.99
0.62
0.42
0.40
0.27
0.46
0.34
0.35
0.04
0.41
0.04
0.61
0.06
0.43
0.08
0.38
0.09
0.40
0.07
0.52
0.47
0.39
0.93
0.71
0.98
0.81
1.00
0.47
1.00
0.93
0.99
0.99
0.95
0.56
1.00
1.00
1.00
1.00
0.44
0.56
0.22
0.37
0.06
0.30
0.00
0.41
0.00
0.20
0.04
0.04
0.13
0.43
0.00
0.00
0.00
0.00
0.76
0.21
0.71
0.61
0.77
0.85
0.83
0.74
0.96
0.92
0.99
0.91
0.98
0.79
0.99
0.87
1.00
0.95
0.23
0.57
0.37
0.50
0.26
0.30
0.11
0.29
0.12
0.13
0.05
0.23
0.06
0.45
0.04
0.28
0.00
0.13
0.70
0.53
0.79
0.62
0.87
0.54
0.96
0.45
0.97
0.44
0.96
0.51
0.89
0.49
0.87
0.43
0.51
0.29
0.22
0.45
0.25
0.45
0.17
0.39
0.08
0.57
0.08
0.53
0.08
0.59
0.23
0.47
0.16
0.58
0.39
0.60
Mean and standard deviations of these rank correlations over 30
replications for each of the experimental conditions are presented
in Table 4 for three values of hidden units (h = 1, 3, and 6). Following are the observations which can be inferred from this table
for the case of medium and large sample sizes. It can be clearly
seen that the rank orderings given by IQR method and standardized regression coefficients are in close agreement with each other.
Further the rank orderings given by standardized regression coefficients more or less reproduce the rank orderings obtained from
dominance analysis even when the level of multicollinearity is
high. Thus, for medium and large sample sizes, standardized regression coefficients themselves are able to largely provide the relative
importance of variables. This supports the use of IQR method even
for multicollinear data for determining the relative importance of
variables provided the sample size is not small.
6. Illustration
The IQR method along with the connection weight approach is
implemented on a real life data set that was used earlier by Barber
[27] and Azen and Budescu [28]. The data set containing a total of
689 records was used to determine the effects of parental indicators (measures of mother’s and father’s parenting styles) on youth
outcomes (measures of psychological adjustment). In this study we
have used a subset of this data to find the relative importance of
the parents’ parenting style in predicting the youth outcome. The
dependent variable used in this study is youth depression (y) and
the independent variables (parental indicators) are mother’s acceptance (x1 ), father’s acceptance (x2 ), mother’s psychological control
(x3 ) and father’s psychological control (x4 ). The details of this data
set can be found in Barber [27].
The correlation among the independent variables is shown in
Table 5. The condition index of this data matrix is 3.45 which does
Table 5
Correlation coefficient matrix.
Var
x1
x2
x3
x4
x1
x2
x3
x4
1
0.5394
−0.4944
−0.2215
1
−0.2211
−0.4003
1
0.6184
1
3696
M. Paliwal, U.A. Kumar / Applied Soft Computing 11 (2011) 3690–3696
Table 6
Rank orderings of independent variables.
Var
IQR
Rank IQR
ˇ
Rank ˇ
CW
Rank CW
DOM
Rank DOM
x1
x2
x3
x4
0.1
0.32
0.34
0.04
3
2
1
4
−0.03
−0.07
0.08
−0.01
3
2
1
4
−0.04
−0.13
0.14
−0.02
3
2
1
4
0.05
0.06
0.07
0.02
3
2
1
4
not indicate high multi-collinearity problem in the data. The rankings of all four independent variables from the proposed approach,
connection weight approach, dominance analysis and standardized
regression coefficients are shown in Table 6. It can be seen that all
the methods are in complete agreement with each other in terms
of its rank orderings.
This data set corresponds to medium sample size with low level
of multicollinearity and the results agree with our findings from
the simulation experiment.
7. Conclusion
In this study, an attempt is made in the direction of overcoming the criticism of neural network being described as a black box
approach. A new method is proposed to rank the independent
variables in order of its importance in predicting the dependent
variable. Ranking is carried out using the interquartile range of the
network weights obtained from training the network. Higher the
magnitude of interquartile range of the network weights, higher the
importance of independent variables in predicting the dependent
variable. This study also establishes the validity of the interpretation of network weights by the proposed method under various
levels of sample size, amount of noise and extent of multicollinearity using simulation. The performance of the proposed method is
also studied for varying no. of nodes in the hidden layer of the
network.
In order to see the effectiveness of the proposed method, the
performance of this method is compared with that of connection
weight approach using simulated data sets having various data
characteristics. With the increase in the no. of hidden units, the proposed method is seen to perform much better than the connection
weight method. For moderate to large sample size, the proposed
method is demonstrated to be generally better or at least as good
as the connection weight method. The method performs reasonably well even for data with multicollinearity provided the sample
size is not small. Further in the proposed method, fluctuations in
the rankings over different replications are less than the connection weight approach and hence the method seems more reliable.
The method is illustrated on a real life data set and the results are
in line with its simulation counterpart.
Acknowledgement
We would like to thank Prof. Brian K. Barber for providing us
with the data set used in the illustration. We would also like to
thank Prof. Razia Azen for providing with the SAS file to perform
dominance analysis.
References
[1] M.S. Duh, A.M. Walker, J.Z. Ayanian, Epidemiologic interpretation of artificial neural networks, American Journal of Epidemiology 147 (1998)
1112–1122.
[2] J.D. Olden, D.A. Jackson, Illuminating the “black box”: a randomization approach
for understanding variable contributions in artificial neural networks, Ecological Modelling 154 (2002) 135–150.
[3] J. Gaudart, B. Giusiano, L. Huiart, Comparison of the performance of multilayer perceptron and linear regression for epidemiological data, Computational
Statistics and Data Analysis 44 (2004) 547–570.
[4] S. Papadokonstantakis, A. Lygeros, S.V. Jacobsson, Comparison of recent methods for inference of variable influence in neural networks, Neural Networks 19
(2006) 500–513.
[5] S.J. Kemp, P. Zaradic, F. Hansen, An approach for determining relative input
parameter importance and significance in artificial neural networks, Ecological
Modelling 204 (2007) 326–334.
[6] H.M.D. Azamathulla, A.A.B. Ghani, N.A. Zakaria, C.C. Kiat, L.C. Siang, Knowledge
extraction from trained neural network scour models, Modern Applied Science
2 (4) (2008) 52–62.
[7] A. Guven, M. Gunal, Prediction of local scour downstream of grade-control
structures using neural networks, Journal of Hydraulic Engineering 134 (11)
(2008) 1656–1660.
[8] A.H. Sung, Ranking importance of input parameters of neural networks, Expert
Systems with Applications 15 (1997) 405–411.
[9] M. Gevrey, I. Dimopoulos, S. Lek, Review and comparison of methods to study
the contribution of variables in artificial neural network models, Ecological
Modelling 160 (2003) 249–264.
[10] J.D. Olden, M.K. Joy, R.G. Death, An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data,
Ecological Modelling 178 (2004) 389–397.
[11] M.J. Watts, S.P. Worner, Using artificial neural networks to determine the relative contribution of abiotic factors influencing the establishment of insect pest
pecies, Ecological Informatics 3 (2008) 64–74.
[12] A.A. Afifi, V. Clarke, Computer aided Multivariate Analysis, 2nd ed., Van Nostrand Reinhold, New York, 1990.
[13] E.J. Pedhazur, Multiple Regression in Behavioral Research: Explanation and
Prediction, 2nd ed., Holt, Rinehart and Winston, New York, 1982.
[14] J.W. Johnson, A heuristic method for estimating the relative weight of predictor
variables in multiple regression, Multivariate Behavioral Research 35 (2000)
1–19.
[15] J.M. Lebreton, R.E. Ployhart, R.T. Ladd, A Monte Carlo comparison of relative importance methodologies, Organizational Research Methods 7 (3) (2004)
258–282.
[16] D.V. Budescu, Dominance analysis: a new approach to the problem of relative importance of predictors in multiple regression, Psychological Bulletin 114
(1993) 542–551.
[17] R. Azen, D.V. Budescu, The dominance analysis approach for comparing predictors in multiple regression, Psychological Methods 8 (2003) 129–148.
[18] J.W. Johnson, J.M. LeBreton, History and use of relative importance indices in
organizational research, Organizational Research Methods 7 (2004) 238–257.
[19] R. Sawyer, Sample size and the accuracy of predictions made from multiple
regression equations, Journal of Educational Statistics 7 (2) (1982) 91–104.
[20] N.J. Delaney, S. Chatterjee, Use of the bootstarp and cross validation in ridge
regression, Journal of Business and Economic Statistics 4 (2) (1986) 255–262.
[21] SAS Institute Inc., Statistical Analysis System. Ver 9.1., SAS Institute Inc., Cary,
NC, 2007.
[22] K. Levenberg, A method for the solution of certain problems in least squares,
Quarterly Applied Mathematics 2 (1944) 164–168.
[23] D.W. Marquardt, An algorithm for least-squares estimation of nonlinear parameters, Journal of Society of Industrial Mathematics 11 (1963) 431–441.
[24] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford Univ. Press, London, U.K., 1995.
[25] B.M. Wilamowski, S. Iplikci, O. Kaynak, M.O. Efe, An algorithm for fast convergence in training neural networks, Neural Networks, Proceedings, International
Joint Conference on Neural Network 3 (2001) 1778–1782.
[26] M.T. Hagan, M. Menhaj, Training feedforward networks with the Marquardt
algorithm, IEEE Transactions on Neural Networks 5 (6) (1994) 989–993.
[27] B.K. Barber, Parental psychological control: revisiting a neglected construct,
Child Development 67 (1996) 3296–3319.
[28] R. Azen, D.V. Budescu, Comparing predictors in multivariate regression models: an extension of dominance analysis, Journal of Educational and Behavioral
Statistics 31 (2) (2006) 157–180.