Constructing Multilayer Feedforward Neural

Constructing Multilayer Feedforward Neural Networks to
Approximate Nonlinear Functions - Examples and Justifications
Jin-Song Pei
School of Civil Engineering and Environmental Science
University of Oklahoma
Norman, OK 73019
Eric C. Mai
School of Civil Engineering and Environmental Science
Honors College
University of Oklahoma
Norman, OK 73019
ABSTRACT
The paper reports on the continuous development of a heuristic methodology for designing multilayer feedforward network networks in modeling nonlinear functions in engineering mechanics
applications. In this and the previous studies [10, 15, 16, 13, 16, 12] that this work is built upon,
the authors do not presume to provide a universal method to approximate any arbitrary function,
rather the focus is given to the development of a procedure that benefits the applications in the
specific domain of engineering mechanics. This goal is fulfilled by utilizing the governing physics
and mathematics of nonlinear functions and the strength of the sigmoidal basis function. A clear
procedure for initializing neural networks to model various nonlinear functions commonly seen in
engineering mechanics is provided to answer questions regarding neural network architecture and
the initial values of weights and biases. Training examples and mathematical insights are presented
to demonstrate the rationality and efficiency of the proposed methodology. Future work is also
identified.
1
OVERVIEW
The motivations and technical challenges of this study were presented by the authors at IMAC
XXIV [12]. The ultimate goal of the authors is to develop a set of detailed guidelines with theoretical
justifications for applying data-driven techniques such as neural networks to engineering applications
based on (1) the mathematical and physical insights of the problem to be modeled and (2) the
capabilities of neural networks in terms of a clear formulation of a linear sum of sigmoidal functions.
The benefits of such an effort are many and include a more constructive approach for neural network
initialization, more reliable training performance, and training results with more validity than could
be obtained otherwise.
To validate and fully develop the proposed neural network initialization methodology, a collection
of ten types of nonlinear functions appearing in [1, 18] and presented in Fig. 1 are selected as target
functions, and an initialization procedure is to be developed in this study. These nonlinearities
represent typical functions encountered in the applications of aerospace, mechanical and structural
engineering.
I. Linear
II. Cubic
stiffness
and more
Prototype 1a, 1b, 1c
III. Bilinear
stiffness
and more
IV. Multislope
Prototype 2
V. Fractional
power
VI. Softening cubic
and more
Prototype 3
VII. ClearVIII. Hard
ance (dead
saturation
space
Prototype 1b+1c
IX. Saturation
X. Stiction
Prototype 1b+
-2
Figure 1: Ten nonlinear functions commonly seen in engineering mechanics applications and the recommended
multilayer feedforward neural network architectures (i.e., prototypes) used to train them. Note that the indicated
relationships are not exhaustive.
Although not entirely arbitrary, the functions to be approximated in this study are not limited
to nonlinear restoring forces as previously studied [10, 15, 16, 13, 16]. Here, the focus is given
to approximating basic nonlinear functions that are widely encountered in engineering mechanics
applications such as those seen in the stress-strain, moment-curvature, and load-displacement relationships, as well as time histories. This study is focused on memoryless and monotonic functions.
Nonlinearities with memory are not treated in this study since they require different types of neural networks (e.g., recurrent neural networks, or multilayer feedforward neural networks with high
dimensional inputs, e.g., [7, 17]). This study will lay a solid foundation for future studies on these
other types of neural networks to build upon. Monotonic nonlinearities are also the focus of this
study. Several existing studies, e.g., [3, 9], have been carried out to analyze strategies for time
history-like nonlinearities with obvious peaks and valleys, however there is a gap in the literature
on how to approximate ubiquitous monotonic nonlinearities using multilayer feedforward neural
networks.
2
PROPOSED INITIALIZATION PROCEDURE, PROTOTYPES, AND VARIANTS
The proposed initialization methodology was briefly introduced in [12]. A self-contained procedure
has been developed by the authors in [11], of which this paper offers a condensed introduction.
The central drive of this domain-specific neural network initialization methodology is to transform
an otherwise ambiguous trial-and-error-based procedure into a clearly defined near-deterministic
procedure that can be easily understood and executed. For a functional approximation problem as
depicted in Fig. 2(a), this study proposes that three cohesive initialization stages including Stage I:
Selecting prototypes; Stage II: Selecting variants, and Stage III: Deciding transformation as outlined
in Fig. 2(b) be implemented. This is recommended for a typical initialization procedure using a
feedforward neural network with one hidden layer to approximate a nonlinear function. In detail, the
number of hidden nodes can be found in the outcome of Stage I, while the values of the weights and
biases (i.e., the values of IW , b and LW as shown in Fig. 2(a)) can be found through a progressive
and iterative procedure consisting of Stages I to III.
(a)
(b)
Input Weights (IW)
Start
Layer Weights (LW)
Σ
Examine data to determine dominating features
Stage I
Bias (b1)
Select prototype that best corrosponds to these features
Prototype 1
Prototype 2
Σ
Prototype 1b+1c
Prototype 1+
-2
Decide number of hidden nodes
Bias (b2)
x
Prototype 3
Σ
Stage II
y(x)
Stage III
Select a variant of the prototype
Step 1:
Decide proportioning and
translation if necessary
Σ
Step 2:
Decide scaling if necessary
Batch mode training
Bias (bnh)
Yes
Input Layer
Hidden Layer
Legend
Σ
Output Layer
No
Yes
Adjust variant?
No
summation
Yes
activation function
Adjust transformtion?
Adjust prototype?
No
End
Figure 2: (a) Universal approximator and notation used in this study. Note that the terms IW , b and LW follow
the notation convention used in the Matlab Neural Network Toolbox [2]. (b) Flow chart to illustrate the proposed
prototype-based initialization procedure.
For the ten types of nonlinear functions specified in Fig. 1, it is recommended that only three
fundamental prototypes be utilized either individually or combinatorially for neural network initialization. This finding reveals the versatility and efficiency of the proposed initialization. The
key elements in this proposed methodology, prototypes and their variants, are predetermined neural
networks that are not obtained from an inverse formulation of training any data sets. Instead, they
are constructed in advance from a forward formulation (based on either the algebraic or geometric
capabilities of linear sums of sigmoidal functions) to capture some dominating features of the nonlinear function to be approximated in the specified applications. The construction of Prototype 2, for
example, was illustrated graphically previously in Fig. 1 of [12]. In the previous work [10, 13, 14, 16]
that this study is built upon, some prototypes were obtained using various linear sums of a few
terms of sigmoidal functions through either algebraic derivations or geometric visualizations. [11]
further depicts how all three prototypes can be obtained, explains why there are numerous variants
for each prototype and gives the values of IW , b and LW for the selected variants. Fig. 3 illustrates
three possible variants for each proposed prototype for a normalized input.
With the prototypes and their variants prepared in advance, training a neural network to approximate a specific function within the scope for which these forward exercises are formulated can
Prototype 1
Prototype 2
Prototype 3
1
1
1
0.5
0.5
0.5
0
0
0
-0.5
-0.5
Variant a
-0.5
Variant b
Variant c
-1
-1
-0.5
0
0.5
1
-1
-1
-0.5
0
0.5
1
-1
-1
-0.5
0
0.5
1
Figure 3: Three variant examples within the proposed three main prototypes.
begin with a matching, selecting and tuning procedure of initialization, which is reflected respectively in the proposed Stages I, II and III as shown in Fig. 2(b). The concept of prototypes and
their variants is generic and thus should not be restricted to normalized input and output ranges.
In principle, one could determine the values of IW , b and LW based on arbitrary input and output
ranges. This flexibility, however, could cause confusion and inconsistency and needs to be handled
with care for the sake of clarity in implementing the proposed methodology. Having said this, it
is adopted in this study to (1) define prototypes and their variants entirely based on normalized
input (x) and output (y(x)) ranges as shown in Fig. 3, and (2) utilize a separate stage, Stage III, to
further transform a selected prototype or its variant for a non-normalized input-output situation.
In detail, one has to adjust the values of weights and biases obtained from Stages I and II according
to the input and output ranges of the training data set as detailed in [11]. This procedure largely
reduces subjective judgements, an approach which is in sharp contrast with commonly seen random
initialization schemes.
3
TRAINING EXAMPLES
In this section, several training examples given previously in [12] are revisited. No errata or retraining is needed. The only purpose is to better illustrate how to precisely follow the procedure
defined in Fig. 2(b) and utilize the prototypes and their variants defined over normalized input and
output as shown in Fig. 3. Also note that the same presentation format of training examples as is
seen in in [12] is adopted in this section. Since the Nguyen-Widrow initialization algorithm does
not specify the required number of hidden nodes, this critical piece of information is borrowed from
the proposed initialization methodology whenever the Nguyen-Widrow initialization is used.
3.1
Direct Adoption of Prototypes
The proposed three variants of Prototype 1 are used directly in Fig. 4. No transformations of these
variants are needed since the input is already normalized, and the output has the same order of
magnitude as that of the variants presented in Fig. 3.
Trained Neural Networks
Initial Neural Networks
2
NguyenWidrow
1.5
1
0.5
0.5
0
0
-0.5
-0.5
-1
-1
-1.5
-1.5
-0.5
0
0.5
1
2
1.5
-2
-1
-1
10
MSE
1
-2
-1
10
-2
-3
-4
-0.5
0
0.5
10
1
1
0.5
0.5
0
0
-0.5
40
60
Epoch
80
100
20
40
60
Epoch
80
100
0
10
10
-1
1a
1b
-1.5
10
0.5
-2
-3
-1.5
1c
0
-1
-0.5
Target
function
-1
-0.5
20
1.5
1
-2
-1
0
10
2
Proposed
Prototype 1
10
10
MSE
1.5
Training Performance
0
2
1
-2
-1
10
-0.5
0
0.5
1
-4
0
Figure 4: An example of using three variants of Prototype 1 (with three hidden nodes) to train a fractional power
function, y = x1/3 . The target function is in magenta, while those curves in blue with different line thicknesses show
four random options using the Nguyen-Widrow initialization [8]. Note that some of the training stopped prematurely.
In Fig. 3 of [12], which was presented at IMAC XXIV, however, a direct adoption of the three
variants of Prototype 2 (as defined in this study) is prevented due to the increased input range
of [−10, 10]. To carry out Step 1 under Stage III, the values of the weights IW can be scaled
down by a factor of 10 (Except for those two terms corresponding to the constant term. See [11]
for more details). For non-normalized inputs in general, x̄ = Cx x, and one can proportion the
derived prototypes and their variants by “stretching” or “squeezing” the function approximated
by the initial neural network along the x-direction in inverse relation to the non-normalized input.
Quantitatively, the transformed value of IW , w̄, is based on wx − b = w̄x̄ − b, where w̄ = C1x w.
In addition to these training exercises, target functions such as linear (Nonlinearity Type I),
sine wave [− π2 , π2 ] (Type VI), and hard saturation (Type VIII), shown in Fig. 1, have been trained
successfully using the proposed initialization methodology. In all these exercises, the proposed
prototypes and their variants are either adopted directly or after proportioning in Step 1 of Stage
III.
3.2
Combining Prototypes
The first examples of the usefulness of Prototype #3 as shown previously in Figs. 4 and 5 in [12]
exhibit a decomposition idea that is used to handle more complex functions. In Fig. 5(a), the
same swept sine wave form from Figs. 4 and 5 [12] is spatially partitioned into three individual
components/cycles, each of which can be approximated independently using Prototype 3 after some
detailed treatment under Stage III transformation. In particular, the center of each cycle needs to
be captured in the initialization through translation (i.e., adjusting the value of the bias, b), while
the non-normalized input range needs to be taken into account through proportioning (i.e., scaling
the value of the weights, IW ). Both translation and proportioning take place during Step 1 of Stage
III. An illustration of a neural network with six hidden nodes has been presented in Fig. 5(a); the
training results were previously presented in Fig. 5 of [12].
1
(a)
Sum
1
0.5
1
0.5
=
0
-0.5
5
10
y
-1
0
x
Component 2
1
+
+
0
-0.5
5
10
y
-1
0
x
(b)
Component 3
0.5
0.5
0
-0.5
-1
0
x
Component 1
5
10
y
10
10
10
-0.5
-10
x
10
y
Component 2
20
0
5
Component 1
20
0
-1
0
Sum
20
-20
-10
=
+
0
-10
0
x
10
y
-20
-10
x
0
-10
0
10
y
-20
-10
x
0
10
y
Figure 5: Decomposing (a) Swept-sine, and (b) Multi-slope function into a summation of some components that can
be approximated directly with the proposed prototypes.
A multi-slope nonlinearity is approximated to further illustrate the use of a combinatorial prototype. The decomposition idea is illustrated in Fig. 5(b). A step-by-step evolution of several initialization options obtained at Steps 1 and 2 and their training results are presented in Fig. 6(b).
Note that multiple options exist for the training of this nonlinearity. The presented training results
are not exhaustive; one can further utilize the proposed Stage III to further generate and refine
other options. Also note that the legend utilizes the nomenclature defined in [11].
The idea of decomposition is very useful in (1) handling numerous types of nonlinearities that
are more complex than those which can be approximated directly by individual prototypes, and (2)
generalizing the solution from one-variable to two-variable functions, especially when dealing with
two uncoupled variables. For example, a softening Duffing oscillator from [4] is selected where the
force-state mapping can be applied in the formulation and displacement and velocity are uncoupled.
Fig. 7 presents the training results using both the Nguyen-Widrow and the proposed initialization,
respectively, each with two options. It can be seen that the proposed initialization is more successful
than the Nguyen-Widrow algorithm in approximating this function even when only five nodes are
used.
3.3
Approximating Piece-Wise Unsymmetrical Functions
Although the proposed prototypes are derived to approximate symmetrical and smooth nonlinearities, e.g., Fig. 1 in [12], these prototypes have shown the ability to be trained and converged well to
piece-wise unsymmetrical nonlinearities over the specified input range, as revealed in Fig. 8 in [12].
Approximating these nonlinearities is of great practical significance. First, they represent experimental phenomena that can often be encountered in the practice of engineering mechanics such
as concrete in compression, and clearance (or dead space) joint behavior. Second, these situations
Initial Neural Networks
Trained Neural Networks
2
Training Performance
20
NguyenWidrow
10
0.5
5
0
0
-0.5
-5
-1
-10
-1.5
-15
-2
-10
-5
0
5
10
-5
0
5
10
10
20
Proposed
Combination
(Step 1)
10
10
-20
-10
10
1
0.5
2
15
1
MSE
1.5
10
1
0
-1
-2
0
10
20
40
60
Epoch
80
100
0
20
40
60
Epoch
80
100
0
20
40
60
Epoch
80
100
2
15
10
10
1
MSE
5
0
0
Target
function
[10]
[10]
1b +2a
-0.5
[10]
1b
+2b
[10]
1b
-1
-10
-5
0
+2c
-1
10
[10]
-15
[10]
5
-2
-20
-10
10
-5
0
5
10
10
20
Proposed
Combination
(Step 2)
10
5
5
0
0
-5
1b
[10]
2x1b
-15
[10]
[10]+
0
[10]
+2a
10
-10
+2a
40x1b
-5
2
10
1
0
-5
Target function
-10
-20
-10
10
15
MSE
10
0
-5
-10
20
15
10
10
-1
[10]
20x2a
5
-15
[10]
-20
-10
10
-5
0
5
10
10
-2
0
10
-5
-50
0
50
velocity
0
10
-5
-10
-10
displacement
0
0
10
target
trained
restoring force
5
5
0
0
-5
-50
0
50
velocity
10
-10
-5
0
-10
displacement
0
target
trained
10
5
5
0
0
-5
-50
0
velocity50
10
-10
-5
0
-10
displacement
0
10
restoring force
10
-5
-10
-10
displacement
0
5
5
0
-5
-50
0
0
50
velocity
10
-5
-10
-10
displacement
0
0
10
5
5
0
-5
-50
0
50
velocity
0
10
-5
-10
-10
displacement
0
0
10
0
10
target
trained
5
restoring force
0
0
0
restoring force
-5
-50
5
5
restoring force
0
Proposed Combination (Stage III)
restoring force
5
5
50
velocity
restoring force
Nguyen-Widrow
restoring force
Figure 6: An example of combining Prototypes 1b and 2a to train a multi-slope function. The target function is
in magenta, while those curves in blue with different line thicknesses show four random options using the NguyenWidrow initialization [8]. Note that both Steps 1 and 2 were used to generate possible options for the initialization.
5
0
-5
-50
0
0
50
velocity
10
-5
0
-10
10
displacement
0
10
5
5
0
-5
-50
0
50
velocity
0
10
-5
0
-10
-10
displacement
target
trained
Figure 7: Training results of a softening Duffing nonlinearity in [4] based on two options using the Nguyen-Widrow
algorithm and two other options using the proposed initialization methodology. All four trainings use neural networks
with five hidden nodes. The target function is in black.
involve the C 1 discontinuity where (1) polynomial fitting normally cannot perform as efficiently and
(2) the Fourier series causes nonuniform convergence (the so-called Gibbs phenomenon).
An idealized function typical for concrete in compression, a parabola joined by a horizontal line
at its vertex, is also approximated. Fig. 8 shows the training results using both the Nguyen-Widrow
algorithm and the proposed initialization methodology. It can be seen that the joint is offset both
horizontally and vertically. The values of the weights and biases, derived from the proposed Stage
III transformation, are detailed in [11]. As in Fig. 6, multiple options for the initialization exist
following the proposed methodology; those presented are merely some possibilities.
Initial Neural Networks
Trained Neural Networks
Nguyen-Widrow
1
Training Performance
5
200
10
150
10
100
10
3
1
MSE
2
0
50
10
-1
0
0
5
10
15
20
200
150
100
Proposed
Prototype 2a
Transformed at
Steps 2 and 1
50
-50
0
5
10
15
20
10
-3
-5
0
200
10
150
10
100
10
MSE
-2
10
-1
Target
function
50
10
20
40
60
Epoch
80
100
20
40
60
Epoch
80
100
5
3
1
-1
[10]
100 x 2a
0
[10]
100 x 2a<10>
0
100 x
-50
0
5
10
15
20
-50
0
5
10
10
-3
[10]
2a<10> +100
15
20
10
-5
0
Figure 8: An example of using Prototype 2, Variant a (with four hidden nodes) to approximate an idealized piecewise unsymmetrical nonlinearity with an offset that is typical for concrete in compression. The target function is
in magenta, while those curves in blue with different line thicknesses show four random options using the NguyenWidrow initialization [8]. Note that both Steps 2 and 1 were gone through individually to generate three possible
options for the initialization.
4
JUSTIFICATIONS AND MATHEMATICAL INSIGHTS
A qualitative justification for applying a prototype-based approach for greater success in neural
network training can be found in the balance between global and local search involved in training.
Ideally, training neural networks in function approximation should belong to the “global search”
category by finding global minimums of error functions. However, currently employed training
techniques are normally only “local search” tools [7]. Thus selecting a good initial point for neural
network training is critical since the training process will normally result in trained values that
are still in the neighborhood of their initial values. If domain knowledge or any other insight of
the function to be approximated could be used to influence neural network initialization, then the
training would more likely converge to the global minimum (instead of just a local minimum),
making the trained neural network more accurate and meaningful. This is the guiding philosophy
of the neural network initialization methodology proposed in this paper and its associated previous
work [10, 13, 14, 16].
In addition to the graphical illustration in Fig. 3, a quantitative exercise is further presented in
this study to offer some insights into the construction of Prototypes 2 and 3. For the convenience of
1
,
discussion, the sigmoidal function S (p) = 1+e1−p is denoted equivalently as σ(w, x, b) = 1+e−(wx−b)
where p = wx − b is for one-variable function approximation. The notation h represents the output
of a hidden node. The superscripts in <> denote the prototype ID, and the subscripts refer to the
serial number of hidden nodes.
Prototype 2: Summing two defined sigmoidal terms To understand Prototype 2, an explanation can be provided as follows, which is similar to the derivation of the approximation of a
cubic power as in [10, 16]:
Two sigmoidal functions are chosen, σ1<2> (w, x, b) and σ2<2> (w, x, −b), where b 6= 0. The Taylor
series expansion of both functions at the origin x = 0 to the third power can be written as:
web
1 w2 eb −1 + eb 2 1 w3 eb 1 − 4eb + e2b 3
1
<2>
+
x+
x +
x +···
σ1
=
1 + eb (1 + eb )2
2!
3!
(1 + eb )3
(1 + eb )4
we−b
1 w2 e−b −1 + e−b 2 1 w3 e−b 1 − 4e−b + e−2b 3
1
<2>
+
x+
x +
x +···
σ2
=
1 + e−b (1 + e−b )2
2!
3!
(1 + e−b )3
(1 + e−b )4
The sum of the above two functions leaves one with the following (referring to [10, 16] for those
vanishing terms):
1 w3 eb 1 − 4eb + e2b 3
2web
<2>
<2>
x+
x +···
(1)
σ1 + σ2 = 1 +
3
(1 + eb )2
(1 + eb )4
= k <2> × σ1<2> + k <2> × σ2<2> −
+ h<2>
Prototype 2 and its variants can be considered as h<2>
2
1
<2>
<2>
k
, where k
is equal to LW . The constant term of −k <2> can be approximated with
no errors using two defined sigmoidal terms [10, 16]. Based on Eq. (1), it can be seen that
Prototype 2 and its variants mimic sums of some odd-power terms of x, i.e., hardening types
of nonlinearities.
Prototype 3: Subtracting one defined sigmoidal term out of the other Consider two sigmoidal functions that share the same center. Here the center refers to the scaled bias, b0 in a
sigmoidal variable p = wx − b = w(x − b0 ) = wx̄. The rest of the argument can either follow
the idea of using the Taylor series expansion of a sigmoidal function or it can make use of
the central difference method to approximate derivatives of the sigmoidal function as proposed
in [6] and adopted in [5]. The latter is utilized in this study.
Since the following discussion can be conveniently generalized to the case when b0 6= 0 by
replacing x with x̄, consider one sigmoidal term with b0 = 0, i.e.,
σ(w, x, 0) =
1
1 + e−wx
(2)
It can be further derived that:
∂σ
xe−wx
(w, x, 0) =
∂w
(1 + e−wx )2
(3)
0.25
0.2
0.15
xe -wx/(1+e -wx)2
0.1
0.05
0
-0.05
w=0.1
w=1
-0.1
w=2
w=3
-0.15
w=4
w=5
-0.2
w=10
-0.25
-1
-0.8
-0.6
-0.4
-0.2
0
x
0.2
0.4
0.6
0.8
1
Figure 9: Understanding Prototype 3
∂σ
(w, x, 0) can be plotted versus x as shown in Fig 9, and can demonstrate
This function ∂w
various functional shapes, including antisymmetrical wavy forms, based on the value of w.
Note that the first derivative shown in Eq. (4) can be approximated using the central difference
method as follows:
σ(w + ∆w, x, 0) − σ(w − ∆w, x, 0)
∂σ
(w, x, 0) ≈
∂w
2∆w
(4)
where ∆w is a user-defined value and controls the approximation accuracy.
If one assigns
σ1<3> = σ(w + ∆w, x, 0)
σ2<3> = σ(w − ∆w, x, 0)
<3>
<3>
then h<3>
+ h<3>
= k2∆w σ1<3> − k2∆w σ2<3> can be used to represent Prototype 3 and its
1
2
variants. Based on Eq. (4) and Fig. 9, it can be seen that Prototype 3 and its variants mimic
antisymmetrical wavy forms. By adjusting the weight of w, this linear sum can be dilated into
a straight line as a special case of Prototype 3, which has been proven using the Taylor series
expansion in [10, 16].
5
CONCLUSION
Neural networks can be highly versatile and efficient in adapting to data when approximating nonlinear functions, however, these qualities can be achieved only if neural networks are initialized properly, as constructively verified in this study. A structured and detailed initialization procedure has
been presented as the continuous development of a heuristic prototype-based initialization approach
for multilayer feedforward neural networks proposed in previous studies [10, 15, 16, 13, 16, 12]. A
range of typical nonlinear functions used in engineering mechanics applications has been targeted,
and training performances have been presented and compared with those of neural networks trained
using the Nguyen-Widrow initialization algorithm. The proposed initialization methodology has
shown satisfactory versatility in addition to being a constructive method. Technical challenges have
been identified, and solution strategies have been provided as in [11]. In particular, adding more
nodes in a transparent and rational manner is being pursued by the authors and their co-author.
6
ACKNOWLEDGEMENT
The Junior Faculty Research Program awarded to the first author by Dr. T.H. Lee Williams, the
Vice President for Research at the University of Oklahoma is greatly appreciated. Funding from
the Undergraduate Research Opportunities Program (UROP) awarded to the second author is also
greatly appreciated.
References
[1] D.E. Adams and R.J. Allemang. Non-linear vibrations classnotes, course 20-263-781. 2000.
[2] M.T. Hagan, H.B. Demuth, and M. Beale. Neural Network Design. PWS Publishing Company,
1995.
[3] A. Lapedes and R. Farber. Neural information processing systems, d. anderson (ed.), american
institute of physics, new york. pages 442–456, 1988.
[4] S.F. Masri, J.P. Caffrey, T.K. Caughey, A.W. Smyth, and A.G. Chassiakos. Identification of the
state equation in complex non-linear systems. International Journal of Non-Linear Mechanics,
39:1111–1127, 2004.
[5] A.J.Jr. Meade. Regularization of a programmed recurrent artificial neural network. Journal of
Guidance, Control, and Dynamics, 2003.
[6] H.N. Mhaskar. Neural networks for optimal approximation of smooth and analytic functions.
Neural Computation, 8:164–177, 1995.
[7] O. Nelles. Nonlinear System Identification: From Classical Approaches to Neural Networks and
Fuzzy Models, pp. 785. Springer Verlag, 2000.
[8] D. Nguyen and B. Widrow. Improving the learning speed of 2-layer neural networks by choosing
initial values of the adaptive weights. In Proceedings of the IJCNN, volume III, pages 21–26,
July 1990.
[9] S. Osowski. New approach to selection of initial values of weights in neural function approximation. Electronic Letters, 29(3):313–315, 1993.
[10] J.S. Pei. Parametric and Nonparametric Identification of Nonlinear Systems. Ph.d. dissertation,
Columbia University, 2001.
[11] J.S. Pei and E.C. Mai. Constructing multilayer feedforward neural networks to approximate
nonlinear functions in engineering mechanics applications. ASME Journal of Applied Mechanics, 2006. under review.
[12] J.S. Pei and E.C. Mai. Neural network initialization for modeling nonlinear functions in engineering mechanics. In Proceedings of the 24rd International Modal Analysis Conference (IMAC
XXIV), 2006.
[13] J.S. Pei and A.W. Smyth. A new approach to design multilayer feedforward neural network
architecture in modeling nonlinear restoring forces: Part i - formulation. ASCE Journal of
Engineering Mechanics, December 2006. to appear.
[14] J.S. Pei and A.W. Smyth. A new approach to design multilayer feedforward neural network
architecture in modeling nonlinear restoring forces: Part ii - applications. ASCE Journal of
Engineering Mechanics, December 2006. to appear.
[15] J.S. Pei, A.W. Smyth, and E.B. Kosmatopoulos. Analysis and modification of volterra/wiener
neural networks for identification of nonlinear hysteretic dynamic systems. Journal of Sound
and Vibration, accepted for publication, 275(3-5):693–718, 2004.
[16] J.S. Pei, J.P. Wright, and A.W. Smyth. Mapping polynomial fitting into feedforward neural
networks for modeling nonlinear dynamic systems and beyond. Computer Methods in Applied
Mechanics and Engineering, 194(42-44):4481–4505, 2005.
[17] I.W. Sandberg, J.T. Lo, C.L. Fancourt, J.C. Principe, S. Katagiri, and S. Haykin. Nonlinear
Dynamical Systems: Feedforward Neural Network Perspectives, pp. 256. Wiley-Interscience,
2001.
[18] K. Worden and G.R. Tomlinson. Nonlinearity in Structural Dynamics: Detection, Identification
and Modelling, pp. 680. Institute of Physics Pub, 2001.