Extrapolating Gain-Constrained Neural Networks

Extrapolating Gain-Constrained Neural Networks Effective Modeling for Nonlinear Control
Hammerstein Model
u
Static
Nonlinearity
Bijan Sayyar-Rodsari, Eric Hartman, Edward Plumer,
Kadir Liano and Carl Schweiger
Wiener Model
Linear
Dynamics
y
u
Linear
Dynamics
(A)
Static
Nonlinearity
y
(B)
Research Department, Pavilion Technologies, Inc., Austin,
TX 78758
Figure 1: (A) Hammerstein model. (B) Wiener model.
Abstract— Nonlinear Model Predictive Control (NLMPC) is
now a widely accepted control technology in many industrial
applications. Since the quality of the model of a physical
non-linear process plays a critical role in the successful
development, deployment, and maintenance of a NLMPC
application, the mathematical representation of such models
has been the subject of significant research in both academia
and industry.
In this paper, Extrapolating Gain-Constrained Neural Networks (EGCN) is described as a key component of a NLMPC
technology that has been in use in more than 100 industrial
applications over the past 7 years. Simulation results are
presented which compare EGCN models to traditional neural
network training methods as well as to the recently proposed
Bounded-Derivative Network (BDN). These results highlight
the critical advantages of EGCN in nonlinear process modeling
for optimization and control applications and underscore the
effectiveness of EGCN models in providing guarantees on
global gain-bounds without compromising accurate representation of available process data.
in the modeling and control of advanced nonlinear multivariable processes. The Hammerstein model may be represented by the state-space equations

 w(t) = Gu (u(t))
ẋ(t) = Ax(t) + Bw(t)
(1)

y(t) = Cx(t)
I. N ON -L INEAR M ODELING FOR P ROCESS C ONTROL
The application of model-based advanced process control technology to nonlinear systems has been the subject
of considerable interest both in industry and academia.
A critical element of all model-based techniques is the
development of suitable process models that utilize both
empirical measurements and a priori information (such as
fundamental process knowledge) [1].
Due to the difficulty of defining general non-linear
models, an attractive approach has been to decompose
the process into a set of linear dynamics and static nonlinear mappings. A brief description of some commonly
used structures for the modeling of non-linear process is
discussed next.
A. Structures for Non-Linear Process Models
A variety of model structures have been based on the
aforementioned decomposition. Henson and Seborg describe some of these, noting that “probably the best-known
member of this class is the Hammerstein model. Because
of its relatively simple structure, this model has become
increasingly popular as a next-step-beyond-linear-modeling
of chemical processes.” They further note that “if the static
nonlinearity follows the linear dynamics - the resulting
system is called a Wiener model which has also been
considered for process modeling applications” [2]. Figure 1
shows the block structure for Wiener and Hammerstein
models.
Multi-input multi-output (MIMO) variations of the
Wiener and Hammerstein models have proven successful
n×1
m×1
where x(t) ∈ R
is the state vector, u(t) ∈ R
is the
µ×1
actual input vector to the process, w(t) ∈ R
is the output
of the nonlinear static mapping Gu (·) that is the input to the
linear dynamic block in the Hammerstein model, and y(t) ∈
r×1
R
is the actual output vector for the overall nonlinear
process [2]. The matrices A, B, and C, and the nonlinear
mapping Gu (·) are defined with appropriate dimensions.
For a Wiener model, the state space description is given by

 ẋ(t) = Ax(t) + Bu(t)
v(t) = Cx(t)
(2)

y(t) = Gy (v(t))
q×1
where v(t) ∈ R
is the input to the nonlinear static
mapping block Gy (·) [2]. The definition for other variables
is similar to that in Equation (1).
In designing the Process Perfecterr , Pavilion Technologies adopted a parametric approach in which a static nonlinear mapping is used to modify certain parameters of a
difference equation. In the simple single-input, single-output
case, the equations look like:
yk
=
N
X
i=1
Ai yk−i +
M
X
Gi (uk−i )uk−i
(3)
i=1
with variable definitions as above. The multiple-input
multiple-output formulation used in practice is described
in more detail in [3]. Note that the nonlinear mapping Gi
is constructed using EGCN.
The nonlinear mappings Gu , Gy , and Gi critically affect the fidelity of the nonlinear models described by
Eqs. (1), (2), and (3), respectively. The following section
discusses some of the most commonly adopted methodologies for developing these nonlinear mappings.
B. Representation of Non-Linearity
The quality of the non-linear model, regardless of the
structure, depends on the quality of the static non-linear
mapping. The universal approximation property of neural
networks [4] has been the inspiration for a number of
creative approaches to constructing such mappings. Two
such approaches are considered in this paper. The first,
originally described by Hartman [5], is the Extrapolating
Gain-Constrained Neural Network (EGCN) which is discussed in more detail in Section II of this paper. The second
is the Bounded-Derivative Network (BDN) in which the
analytical integral of a sigmoidal neural network1 is trained
in such a way that the gains are guaranteed to be globally bounded [6]. Both works underscore the importance
of predictable gain behavior in models used for control
applications such as NLMPC.
In introducing bounded derivative network models,
Turner et al. recently stated that “architectures such as
neural networks cannot cope with the extrapolative demands
of predictive control” and that “neural network process
gain predictions can spuriously invert in real-time which
results in them closing valves when they should be opening
them” [6]. In a subsequent work, they explain the reasoning for such behavior by observing that “[the neural
network] nonlinearity is based on a function where the
derivative continuously ‘decays’ to zero away from its
peak activity.” They go on to state that “the issues of
zero gain predictions are intrinsic to neural networks and
cannot be eliminated by an identification algorithm” [7].
However, these observations are only true when applied
to traditional sigmoidal neural networks2 that are trained
without consideration of gain bounds. In our view, without specifying model architecture and associated training
methodology, general statements about the applicability of
neural networks to process control are not mathematically
correct. Our simulations demonstrate that such problems are
not an inherent failing of all neural network techniques,
and hence statements such as “until now, existing [universal
approximator] UA technologies (such as neural networks)
have had rudimentary architectural deficiencies that result in
highly inaccurate and unusable process gain predictions” [6]
are at best inaccurate.
The fact that a model must be capable of both capturing
the behavior represented by the process data and extrapolating the response surface appropriately has been wellunderstood for a number of years and has been addressed
in various ways. As early as ten years ago, Thomson and
Kramer observed the need to constrain the contribution of
the network outside of the region covered by training data in
the context of combining first-principle and neural network
structures [1]. In proposing EGCN modeling, Hartman
underscored the importance of model gains and their role in
nonlinear system modeling and noted that “inaccurate inputoutput gains (partial derivatives of outputs with respect
to inputs) are common in neural network models when
1 This results in another neural network structure with different types of
activation functions including both linear and non-linear forms.
2 This term refers to networks in which all nodes in the hidden layer are
sigmoidal.
input variables are correlated or when data are incomplete
or inaccurate” and emphasized that “accurate gains are
essential for optimization, control, and other purposes.” He
then argued that “because empirical and physical modeling
methods have largely complementary sets of advantages
and disadvantages, the ability to incorporate available a
priori knowledge into neural networks models can capture
advantages from both modeling methods” [5]. Hartman then
outlined a methodology for constrained training of neural
network models such that the derivative behavior of the
models is consistent with the prior knowledge throughout
the entire anticipated range of the process inputs.
II. E XTRAPOLATING G AIN -C ONSTRAINED N EURAL
N ETWORKS
This section provides an overview of Extrapolating GainConstrained Neural Networks (EGCN). The EGCN model
is composed of set of nodes whose structure is illustrated
in Figure 2. In this structure, xo is an affine function of
the node inputs hi and f (xo , ρo ) is a parametric nonlinear
mapping from xo to the output of the node, ho . These nodes
are interconnected in an arbitrary feedforward network as
shown in Figure 3.
The software implementation of Process Perfecter offers
various options for the configuration and training of EGCN
models used within the NLMPC module. These options
allow the user to select the most appropriate architecture
and training methodology for a particular application and
give rise to a family of EGCN-based techniques. Some of
the options relevant to this discussion include:
1) Activation function: While the default activation function in the EGCN model is a traditional sigmoid (“s”shaped) function, user-defined activation functions [8]
can be specified to provide additional flexibility in
matching the characteristics of the specific problem.
2) Network topology: The topology of the interconnected
nodes can be customized to support configurations
other than the usual cascaded-layer topology. For
example, linear nodes can be included in the hidden
layer to affect global extrapolation behavior as shown
in the simulation experiments.
3) Synthetic extrapolation data-points: The automatic
generation of synthetic data-points can be configured.
These points, which have no target output values,
are defined within the extended operating region to
control interpolation and extrapolation behavior in the
presence of sparse data.
4) Extrapolation logic: Outside of the established operating region of the model, the response surface of
the model can be complemented by principled logic.
For example, the response surface can be linearly
extrapolated over the global input-space using gains
computed by the model at the boundaries of the
operating region.
hi
ρο
bo
wio
xo
summing junction
f( xo, ρo )
ho
non-linear activation
Figure 2: Basic building block of a node in EGCN model.
n3
u1
n1
n4
n7
y1
u2
n2
n5
n8
y2
n6
Figure 3: A typical 2 input/2 output EGCN model.
Training an EGCN model may be formulated as a constrained optimization problem, an idea originally presented
by Hartman [5]. In the following presentation, we consider
a simple feedforward neural network with linear input and
output layers, and a single hidden layer as illustrated in
Figure 3. This structure is one of the most widely adopted
universal approximators to date. However, the problem formulation presented here applies regardless of the topology
of the network and the type of the activation functions used.
The output of the k th output node of this neural network
is the k th output of the nonlinear approximator model. We
denote the output of the k th output node as yk . Although not
required, we assume here that the activation function for this
output node is the identity function, i.e. f (xk , ρk ) = xk .
Output yk is then given by:
yk = x
Pk
(4)
xk =
j (wjk hj ) + bk
where hj is the output of the j th hidden node, wjk is the
weight from j th hidden node to the k th output node, and bk
is the bias term for the summation at the k th output node.
Using the same fundamental building block of Figure 2 for
the hidden nodes of the single hidden layer, the output of
the j th hidden node, hj , is given by:
hj = fP(xj , ρj )
(5)
xj =
i (wij ui ) + bj
where xj is the input to the nonlinear activation function
in the j th hidden node, wij is the weight from input
ui to j th hidden node, bj is the bias of the j th hidden
node, and f (xj , ρj ) is a parametric nonlinear activation
function, with parameter vector ρj . The input nodes in this
simplified example are assumed to be identity functions and
hence the inputs to the hidden node are the inputs to the
network. Training of the EGCN model is now formulated
as a constrained optimization problem:
P P
minΦ d k (tkd − ykd )2
kd
s.t. Lmd ≤ Gm (Φ, ud , y d , ∂y
∂uid , ...) ≤ Umd
(6)
where the decision vector Φ includes the EGCN’s weights
and the biases as well as any parameters in the activation
functions, k indexes the output variables as before, d
indexes the dataset, tkd is the target output for the EGCN
model, ykd is the predicted output of the EGCN model, and
m indexes the two-sided constraints. The sum-squared-error
objective is minimized while simultaneously satisfying the
set of constraints, which may include constraints at each
data point in the dataset.
This optimization-based formulation allows high-quality
models to be created through the imposition of constraints
obtained from a priori process knowledge. As shown in
Equation (6), these imposed constraints may include functions of any or all of the EGCN model’s inputs, outputs,
parameters, and input-output derivatives of any order. Hartman describes in detail how derivative constraints such as
∂y
(7)
Kmin ≤ k ≤ Kmax
∂ui
may be handled [5]. He further proposes one method for
handling such constraints in which the constraint-set in
Equation (7) is translated into penalty terms which augment
the cost function of Equation (6). He also considers the
inclusion of a known profile for the gain-bounds over the
operating region which can significantly improve the quality
of the model [5].
An important characteristic of extrapolating gainconstrained training is that the anticipated operating region
for EGCN models is not limited to regions over which
process data is available. In fact, EGCN models can enforce gain constraints over any extended operating region,
whether or not those regions are populated by data. This
idea is especially important in control applications for which
correct gain predictions over the operating region is critical.
The extended operating region can be partitioned into three
regions:
1) populated – Input regions populated by the available
process data. Gains are enforced over these regions
simply by enforcing the constraints at each data point
in the dataset.
2) interpolated – Input regions inside of the ranges
spanned by the data but which are unpopulated due
to sparsity.3 Gains are enforced in these regions
by generating synthetic input vectors that uniformly
cover such regions. Target output values are neither
known nor necessary for these synthetic points.
3) extrapolated – Finite input regions outside of the
spanned ranges. Gains are also enforced in these
3A
common source of this data sparsity is highly correlated inputs.
-2 -1
11 12
dense data
-2
12
linear extrapolation region
linear extrapolation region
extended operating region
-5
In each experiment, the base-case neural networks had
similar complexity, containing 10 nodes in the hidden
layer.
• In solving the constrained optimization problem, a
proven commercially available solver was used.
• For “dense dataset” experiments, the networks were
trained with data drawn uniformly from the input range
u ∈ [−2, 12].
• For “sparse dataset” experiments, the training data
was restricted to two disjoint regions u ∈
{[−2, −1], [11, 12]}.
• All trained networks were tested over the range u ∈
[−10, 20] which extends beyond the specified operating
region of u ∈ [−5, 15] in order to ascertain extrapolation behavior.
• For EGCN experiments, synthetic data points were
added to impose local gain-constraints. In addition, the
∂y
≤ 5 was applied
global inequality constraint 2 ≤ ∂u
to all data points.
• The parameters of the optimization solver were retuned for best-achievable performance in each experiment.
• Results are shown for noise-free training data as typical noise-levels did not alter the qualitative results
described.
Experiment A: Sigmoidal Network on Dense Data
Description — A sigmoidal neural network was trained on
the dense dataset with no constraints on the model gain.
Results — The model output response and the corresponding gain profile are shown in Figures 5 and 6 using dense
training region u ∈ [−2, 12]. The figures show that while
the gain prediction over the range of available data is
acceptable, the extrapolation behavior is poor. As expected,
the gains tend toward zero far from the selected operating
region. This is due to the saturating effect of the sigmoidal
nodes in the hidden layer. This result is not necessarily a
negative reflection on this basic training technique. If the
training data is representative of the operational range of
the process, these results are satisfactory.
Experiment B: Sigmoidal Network on Sparse Data
Description — A sigmoidal neural network was trained on
the sparse dataset with no constraints on the model gain.
Results — The model output response and the corresponding gain profile are shown in Figures 7 and 8 using
the sparse training region u ∈ {[−2, −1], [11, 12]}. This
example illustrates more graphically the problems that are
encountered when applying data-fitting techniques naively.
Deviation from the expected gain can be observed both
within the span of the available data and within the extended
operating region. This poor performance is obtained because
no consideration was given to the data-sparsity during the
training, either to augment the available data or to control
the degrees-of-freedom exercised by the model.
•
sparse data
15
Figure 4: Description of training and operating regions.
regions using synthetic input vectors. Mathematically,
there is little difference between the interpolated
and extrapolated portions of the extended operating
region.
For safety, the response surface outside of the extended
operating region is defined by deterministic linear extrapolation using gains computed by the model on the boundaries
of the operating region.
III. E XPERIMENTAL C OMPARISON
In this section, we provide results from a set of
model-training experiments. In addition to highlighting the
effect of various options associated with Extrapolating
Gain-Constrained Neural Networks (EGCN) model training, these experiments provide comparisons to both traditional unconstrained neural networks as well as BoundedDerivative Networks (BDN). The results of these experiments show that the EGCN methodology produces models
that are suitable for control and optimization applications.
Description of Experiments
For consistency, all experiments are based on a common
problem formulation. The goal is to model the nonlinear
static mapping and the corresponding derivative relationship
for a single-input, single-output (SISO) process from a set
of data measurements. The operating region and regions
of available data are illustrated in Figure 4. Although the
methods described are all generalizable to multiple-input,
multiple-output (MIMO) processes, the fundamental tradeoffs between these techniques are easier to visualize on
a SISO problem. For the selected SISO problem in this
paper the gain varies across the operating region, is always
positive, and extends with a different value to the left and
to the right of the extended input operating region.
Each experiment tests a particular combination of data
availability, network structure, and training methodology.
The experiments were chosen to highlight particular issues
that arise with the use of these techniques and fall into
three groups: (1) in experiments A,B, and C we test a
traditional sigmoidal network trained without constraints,
(2) in experiment D and E we test a BDN model with
global constraint bounds, (3) in experiments F and G we test
EGCN models using global and point constraints. Results
and observations are provided in the remainder of this
section. For further consistency, the following experimental
conditions were used:
Both this and the previous experiment illustrate commonly known issues involved in training any data-fitting
model, including neural networks. As shown in subsequent
experiments, simple constraining techniques are routinely
used to account for such deficiencies in the data.
Experiment C: Sigmoidal Network With Linear Node on
Dense Data
Description — A sigmoidal neural network, with a linear
(non-saturating) node added to the hidden layer, was trained
on the dense dataset with no constraints on the model gain.
Results — The model output response and the corresponding gain profile for this experiment are shown in
Figures 9 and 10. Within the training region, the results
are comparable to those obtained without the linear node
(Experiment A). Outside the training region, the extrapolated gain asymptotes to a non-zero value of approximately
2.7.
This well-known technique allows neural network to
model asymptotic gains which are non-zero, even without
imposing any gain-constraints in the training algorithm. Far
from the nominal operating region, the sigmoidal nodes
saturate and have minimal contribution to the gain surface,
and the effect of the linear nodes is dominant. Within the
nominal operating region, the saturating nodes supplement
this with specific non-linear characteristics. Even though
the addition of a simple linear node is not a universal
solution to the gain-extrapolation problem, the example
does contradict the perception that all neural networks must
have a multitude of spurious zero-gain through-out the
input-space.
Experiment D: Bounded-Derivative Network on Dense Data
Description — A BDN network was trained on the dense
dataset while globally imposing the inequality constraints
on the model gain. Additional experiments were performed
in order to investigate the effect of varying the number of
nodes in the hidden layer.
Results — The model output response and the corresponding gain profile for the BDN network experiment are
shown in Figures 11 and 12. The figures confirm that the
training methodology maintained the model gains within
the global constraints [2, 5] both within the training region
and extrapolated over the global input space. This is the
mathematically guaranteed result that Turner et al. have
described [6]. However, the gain profile differs significantly
from the actual process, even in regions were process
data is available. Within the region of available data, the
unconstrained neural network result shown in Figure 6 was
superior. The extrapolated gains for the BDN are within
the global gain bounds but do not match the actual process,
even though information necessary to infer these asymptotic
gains is available in the data.
Reasoning that perhaps the network complexity was not
adequate to capture the process characteristics, the number
of nodes in the hidden layer was varied and the training
experiment repeated. The family of resulting gain curves is
shown in Figure 13. Very little improvement was obtained
from changing network complexity. The inaccurate representation of gains observed in this example is symptomatic
of a fundamental limitation in the way gain bounds are
computed and enforced in the BDN method. In order to
understand this problem, consider the derivation provided
by Turner et al. [6]. Analytic bounds on the gain of model
output y with respect to the model input xk are given by:
∂y
∈
∂xk
"
#
∂y ∂y ,
∂xk b(1) ∂xk b(2)
(8)


X (6,5) (3,2) (5,4) ∂y (2,0) X (6,5) (3,2) (5,3)
(6,2)
= w
w
w
w
−
w1j wjk wjj + w1k 
1j
jj
kk
jk
∂xk b(1)
j
j


X
X
∂y (6,5) (3,2) (5,4) (2,0) 
(6,5) (3,2) (5,3)
(6,2)
= w
w
w
w
+
w1j wjk wjj + w1k 
1j
jj
jk
kk
∂xk b(2)
j
j
These computed bounds are estimates of the actual bounds
as shown in the following nested intervals:
L ≤
∂y < min
∂xk b(1)
(
∂y
∂xk
)
≤
∂y
∂xk
(
≤ max
∂y
)
<
∂xk
∂y ≤ U
∂xk b(2)
(9)
Here, the terms L and U represent the upper and lower
constraints imposed on the model-training problem. These
bounds represent the range of gains associated with the
actual process and must be satisfied for the model to be
considered a feasible solution. The values ∂y/∂xk represent
the point-evaluations of the gain at every point in the global
input-space, not just those for which measurements are
available. The intent of the BDN method is to guarantee that
this set of values lies within the interval [L, U ]. The interval
[ min {∂y/∂xk } , max {∂y/∂xk } ] denotes the span of all
such point-evaluations and is the gain-range predicted by the
model. However, this interval cannot be computed in closeformed for a given set of model parameters. As a result,
the BDN method uses the estimates computed by Equation (8) instead. Since these estimates are independent of
the data samples, they are valid over the entire input-space.
However, the inequalities ∂y/∂xk |b(1) < min {∂y/∂xk }
and max {∂y/∂xk } < ∂y/∂xk |b(2) will not be tight.
In other words, although the bounds in Equation (8) are
guaranteed to contain all values of the actual gains, they
span an interval greater than that of the actual gain range.
In constraining these computed estimates, the actual model
gains are, consequently, over-constrained.
Using the BDN methodology, globally verifiable bounds
are achieved at the expense of limiting variations in the
model’s gains necessary to achieve an accurate fit to available process data. In the extreme case, the model and the
resulting model-based controller will become effectively
linear.
Output Prediction By Unconstrained 10−Node NN Model
70
y−actual
y−model
60
50
40
output
30
20
10
0
−10
−20
−30
−10
−5
0
5
input
10
15
20
Figure 5: Experiment A: Output response surface for unconstrained neural network trained on dense data.
Gain Prediction By Unconstrained 10−Node NN Model
5
Gain−actual
Gain−model
4.5
4
3.5
gain
3
2.5
2
1.5
1
0.5
0
−10
−5
0
5
input
10
15
20
Figure 6: Experiment A: Gain surface for the model in Figure 5. Although the gain is accurate across the data
region, the extrapolation behavior is poor with the
gain dropping off to zero beyond this region. This
is a known problem with unconstrained neural
network training.
to available process data. Compared to the actual gain
profile some deviations are observed, especially in resolving
the region of peak gain. This is expected since neither
process data nor a priori knowledge were available during
training to condition this region. As in the previous experiment predictable global behavior outside of the extended
operating region is guaranteed through deterministic linear
extrapolation of the response surface.
Output Prediction By Unconstrained NN Model With 10 Sigmoidal Node on Sparse Data
70
y−actual
y−model
60
50
40
30
output
Experiment E: Bounded-Derivative Network on Sparse
Data
Description — A BDN network was trained on the sparse
dataset while globally imposing the inequality constraints
on the model gain.
Results — The model output response and the corresponding gain profile for the BDN network experiment are shown
in Figures 14 and 15. As with the dense data case in Experiment D, the model gains are globally maintained within
the global constraints but the gain profile is qualitatively
different than the actual profile, both in the operating region
as well as in the extrapolated region.
Experiment F: EGCN On Dense Data
Description — An EGCN network with the default topology of sigmoidal nodes in the hidden layer was trained on
the dense dataset. The global inequality constraints on the
model gain were imposed over the entire operating region.
In addition, the known endpoint gain-constraints 2 and 3
were enforced at a set of additional points at the boundary
of the extended operating region.
Results — The model output response and the corresponding gain profile for the first of two EGCN experiments are
shown in Figures 16 and 17. In this experiment, training
data covers all of the region u ∈ [−2, 12]. Additional
synthetic data-points with target gains were used in the
extended regions u ∈ [−5, −2] and u ∈ [12, 15]. Global
gain constraints were imposed on all real and synthetic datapoints to ensure bounding within the constraints [2, 5] as
specified in the problem statement.
As in Experiment A, (see Figures 5 and 6) excellent fit is
obtained to the measured process data. Unlike that experiment, however, an appropriate gain profile can be observed
over the entire extended operating region u ∈ [−5, 15].
Beyond this extended operating region, the response surface
is linearly extrapolated using the model gains computed at
the boundary points −5 and 15. This prudent engineering
practice improves the robustness of the total engineering
solution.
Experiment G: EGCN On Sparse Data
Description — An EGCN network with the default topology of sigmoidal nodes in the hidden layer was trained on
the sparse dataset. The global inequality constraints on the
model gain were imposed throughout the operating region.
In addition, the known endpoint gain-constraints 2 and 3
were enforced at a set of additional points at the boundary
of the extended operating region.
Results — The model output response and the corresponding gain profile for the second EGCN experiment are shown
in Figures 18 and 19. In this experiment, training data
covers only the sparse data set u ∈ {[−2, −1], [11, 12]};
otherwise the experiment is identical to the previous one.
Again, the global behavior of the network conforms to the
desired operating constraints and a good match is obtained
20
10
0
−10
−20
−30
−10
−5
0
5
input
10
15
20
Figure 7: Experiment B: Output response surface for unconstrained neural network trained on sparse data.
Gain Prediction By 10−Node BDN Model
Gain Prediction By Unconstrained NN Model With 10 Sigmoidal Node on Sparse Data
5
5
Gain−actual
Gain−model
Gain−actual
Gain−model
4.5
4.5
4
gain
gain
4
3.5
2.5
2.5
2
2
−10
3.5
3
3
−5
0
5
input
10
15
−10
20
−5
0
5
input
10
15
20
Figure 8: Experiment B: Gain surface for the model in
Figure 12: Experiment D: Gain surface for the model in Fig-
Figure 7. Undesirable deviations are observed in
the gain due to the sparseness of the training data.
The asymptotic gains tend to zero beyond the
displayed range.
ure 11. Although the model shows that the global
gain bounds are met, the point-gains within the
operating region are compromised due to overconstraining.
Output Prediction By Unconstrained NN With 10−Sigmoidal/1−Linear Hidden Nodes
Gain Predictions By 5 & 10 & 20 Node BDN Models
y−actual
y−model
70
5
Gain−actual
Gain−BDN10
Gain−BDN20
Gain−BDN05
60
4.5
50
40
4
gain
output
30
20
3.5
10
3
0
−10
2.5
−20
2
−30
−10
−5
0
5
input
10
15
20
−2
0
2
4
input
6
8
10
12
Figure 9: Experiment C: Output response surface for un-
Figure 13: Experiment D: Gain responses resulting from the
constrained neural network with additional linear
variation of the number of nodes in the hidden
node trained on dense data.
layer. Over-constraining is an inherent problem
with the computation of global bound estimates.
Changing the complexity of the network topol-
Gain Prediction By Unconstrained NN With 10−Sigmoidal/1−Linear Hidden Nodes
5
Gain−actual
Gain−model
ogy cannot overcome this limitation.
4.5
gain
4
3.5
IV. C ONCLUSIONS
3
2.5
2
−10
−5
0
5
input
10
15
20
Figure 10: Experiment C: Gain surface for the model in
Figure 9. Unlike Figure 6, the gains tend toward
a non-zero value of 2.7 both to the left and right
of the plot.
Output Prediction By 10−Node BDN Model
70
y−actual
y−model
60
50
40
output
30
20
10
0
−10
−20
−30
−10
−5
0
5
input
10
15
20
Figure 11: Experiment D: Output response surface for the
BDN network trained on dense data.
In this article, we have shown that Extrapolating Gain
Constrained Neural Network (EGCN) models are capable
of creating high-fidelity models that incorporate information
from both a priori process knowledge and measured process
data. Gain-constraining techniques used with the EGCN
model ensure appropriate model gains are obtained for
any operating region, including regions where no process
data is available. Deterministic linear extrapolation of the
response surface beyond the anticipated operating region
ensures that predictable behavior is guaranteed for any input
conditions to the model. Simulation results and supporting
mathematical analysis indicate that Bounded Derivative
Networks ensure global gain-bounds at the expense of
model quality. In contrast, the simulation results for the
EGCN, demonstrate that this compromise is not necessary.
Output Prediction By EGCN Model With 10 Sigmoidal Node on Sparse Data
Output Prediction By BDN With 10 Sigmoidal Node on Sparse Data
70
70
y−actual
y−model
60
60
50
50
40
40
output
30
output
30
20
20
10
10
0
0
−10
−10
−20
−20
−30
−10
y−actual
y−model
−30
−10
−5
0
5
input
10
15
−5
0
5
input
10
15
20
20
Figure 14: Experiment E: Output response surface for the
Figure 18: Experiment G: Output response surface for the
EGCN network trained on sparse data.
BDN network trained on sparse data.
Gain Prediction By EGCN Model With 10 Sigmoidal Node on Sparse Data
5
Gain Prediction By BDN Model With 10 Sigmoidal Node on Sparse Data
5
Gain−actual
Gain−model
4.5
Gain−actual
Gain−model
4
gain
4.5
4
3.5
gain
3
3.5
2.5
3
2
2.5
−10
−5
0
5
input
10
15
20
2
−10
−5
0
5
input
10
15
20
Figure 19: Experiment G: Gain surface for the model in Fig-
Figure 15: Experiment E: Gain surface for the model in Figure 14. With minimal training data, the prediction of the BDN network is further compromised.
ure 18. In the absence of sufficient training data,
the gain predictions are not accurate. However,
the specified gain-bounds and point-constraints
are still met.
Output Prediction By EGCN model With 10 Sigmoidal Hidden Node
70
y−actual
y−model
R EFERENCES
60
50
output
40
30
20
10
0
−10
−20
−10
−5
0
5
input
10
15
20
Figure 16: Experiment F: Output response surface for the
EGCN network trained on dense data.
Gain Prediction By EGCN model With 10 Sigmoidal Hidden Nodes
5
Gain−actual
Gain−model
4.5
4
gain
3.5
3
2.5
2
1.5
−10
−5
0
5
input
10
15
20
Figure 17: Experiment F: Gain surface for the model in
Figure 16. In addition to meeting the global gainbounds, the gain predictions are accurate where
data is available.
[1] M. Thompson and M. Kramer, “Modeling Chemical Processes Using
Prior Knowledge and Neural Networks,” AIChE Journal, vol. 40,
p. 1328, 1994.
[2] M. Henson and D. Seborg, Nonlinear Process Control. Prentice Hall,
1997.
[3] S. Piche, B. Sayyar-Rodsari, D. Johnson, and M. Gerules, “Nonlinear model predictive control using neural networks,” IEEE Control
Systems Magazine, vol. 20, pp. 53–62, June 2000.
[4] K. Hornik, M. Stinchcombe, and H. White, “Multilayer Feedforward
Networks Are Universal Approximators,” Neural Networks, vol. 2,
pp. 359–366, 1989.
[5] E. Hartman, “Training feedforward neural networks with gain constraints,” Neural Computation, vol. 12, pp. 811–829, April 2000.
[6] P. Turner, J. Guiver, and B. Lines, “Introducing the State Space
Bounded Derivative Network for Commercial Transition Control,”
American Control Conference, June 2003.
[7] P. Turner, M. Devine, and J. Versteeg, “An Essential Evolution in
Nonlinear Polymer Production Control: An Industrial Case Study,”
Plant Automation and Decision Support, September 2003.
[8] W. Duch and N. Jankowski, “Survey of Neural Transfer Functions,”
Neural Computing Surveys, vol. 2, pp. 163–212, 1999.