Monotone Approximation of Decision Problems

OPERATIONS RESEARCH
informs
Vol. 58, No. 4, Part 2 of 2, July–August 2010, pp. 1158–1177
issn 0030-364X eissn 1526-5463 10 5804 1158
®
doi 10.1287/opre.1100.0814
© 2010 INFORMS
Monotone Approximation of Decision Problems
Naveed Chehrazi, Thomas A. Weber
Department of Management Science and Engineering, Stanford University, Stanford, California 94305
{[email protected], [email protected]}
Many decision problems exhibit structural properties in the sense that the objective function is a composition of different
component functions that can be identified using empirical data. We consider the approximation of such objective functions,
subject to general monotonicity constraints on the component functions. Using a constrained B-spline approximation,
we provide a data-driven robust optimization method for environments that can be sample-sparse. The method, which
simultaneously identifies and solves the decision problem, is illustrated for the problem of optimal debt settlement in the
credit-card industry.
Subject classifications: B-splines; monotone approximation; nonparametric/semiparametric methods; robust optimization;
sample-sparse environments.
Area of review: Special Issue on Computational Economics.
History: Received September 2008; revision received November 2009; accepted December 2009. Published online in
Articles in Advance June 3, 2010.
1. Introduction
second derivatives of the component function by translating
them into simple restrictions in the parameter space.2
Second, the approach combines the problems of model
estimation and optimization by evaluating the approximation error using any user-defined robustness criterion,
such as average performance, worst-case performance,
expected gain, or competitive ratio. The approach is therefore in the spirit of recent advances in operational statistics
(Shanthikumar and Liyanage 2005, Lim et al. 2006, and
Besbes et al. 2010), which emphasize the importance of
using a metric for the evaluation of an approximation error
that can be directly related to the cost of mistakes in the
underlying decision. An optimal robust decision is obtained
together with an optimal approximation error by solving a
pair of nested optimization problems.
Third, the method works equally well in data-rich and
data-poor (or sample-sparse) environments, because it treats
interpolation and approximation in the same unifying framework. In a sample-sparse environment the intersample
behavior of a component function we wish to approximate is
not well understood. This may be the case even though the
information at a given sample point is extremely accurate.
We therefore distinguish between intersample and intrasample uncertainty. Our method deals with both sources of
parameter uncertainty jointly by fitting a stochastic process
instead of a deterministic function (cf. Figure 1). The resulting stochastic model can be used to design optimal experiments for the acquisition of additional data points, so as to
maximize the value of the additional information.
Fourth, in fitting the stochastic process, we use information derived from the data as well as the structural insights
about the behavior of the component functions. The latter
In this paper we study the approximation of decision problems, subject to structural monotonicity constraints, in view
of robustly determining optimal actions. A key motivation for imposing monotonicity constraints on the approximation of a decision problem is to combine information
from available data with structural insights obtained from
a model-based analysis of the problem. Structural monotonicity constraints naturally arise when viewing the objective function as a composition of several “component
functions,” each of which directly relates to an observable
phenomenon that may naturally exhibit a monotone behavior, such as a supply curve that is increasing in price, a
demand curve that is decreasing in price, a response curve
that is increasing in a certain stimulus, or a production
function with a decreasing slope due to decreasing returns
to scale.
Our approach has five key features. First, it is semiparametric in the sense that each element of the class of
approximating functions (splines of a certain degree and
for a certain set of breakpoints or “knots”) can be represented by a unique, finite-dimensional parameter vector.1
With this it is possible to arbitrarily closely approximate
any continuous component function (and not just functions
within a narrow parameterized class). Standard parametric
approximation methods do not have this feature. For example, in a classical linear regression one varies parameters
so as to select a linear model by minimizing a quadratic
cost criterion, which may lead to a significant residual
error whenever the actual underlying system is nonlinear.
In addition, the method provides a natural way of incorporating monotonicity constraints with respect to first and
1158
Chehrazi and Weber: Monotone Approximation of Decision Problems
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
Figure 1.
(a) Approximation of response curve in
sample-sparse environment. (b) Detail with
confidence interval.
(a)
(b)
implies a set of “consistent parameters” (a bounded polyhedron as we show), whereas the former yields a conditional
probability distribution over these consistent parameters.
Fifth, the method produces not only an optimal decision
relative to a given robustness criterion (and a “best approximation error”), but also a probability measure that indicates
the likelihood of optimality for any feasible action. The
latter constitutes a valuable input for sensitivity analysis.
We illustrate the method for the practical problem of
designing optimal settlement offers to collect delinquent
credit-card debt (cf. §5).
1.1. Related Literature
a. Robust Optimization and Identification. Work on
the problem of joint system identification and optimization dates back to Haimes et al. (1971), who formulate
a bicriterion vector-optimization problem that includes a
quadratic error measure for the identification of a finite
1159
number of model parameters based on sample observations and a cost function to evaluate the quality of decisions. As detailed further by Haimes and Wismer (1972),
several approaches to this multicriteria optimization problem are possible. First, one can consider the minimization
of a convex combination of the two criteria (parametric
approach). Second, it is possible to alternate between minimizing each criterion individually and using the solution to
one problem for the other problem. This recursive two-step
approach does not generally converge. Third, by incorporating one of the two criteria as a constraint when optimizing the other criterion, the trade-offs between optimization and identification can be made apparent (-criterion
approach). Fourth, one can follow a “robust” procedure by
optimizing a worst-case version of the model, in which the
parameters have been chosen so as to most adversely affect
the optimization result (maximin approach). The last two
approaches resurface in more recent work on robust optimization (see, e.g., Ben-Tal and Nemirovski 1998, 2002; or
Bertsimas and Sim 2004), which studies the dependence of
a mathematical program, consisting of an objective function and a constraint set, on the set of unknown parameters
either with or without a probabilistic structure incorporating a priori information about likelihood of the different
parameter realizations. It aims at solving robust versions of
the original problems, approximating uncertainty sets, e.g.,
by suitable ellipsoids, to obtain computationally tractable
programs. In that same spirit, our method generates a linearly constrained parameter domain (a bounded polyhedron) that represents the decision maker’s knowledge about
the model structure and which defines the support of the
robustness criterion.
b. Shape-Preserving Approximation. The problem of
“monotone approximation” (i.e., the approximation of a
monotone function f with monotone functions of a certain class) was introduced by Shisha (1965).3 Early efforts
in this area concentrated on bounding the approximation
error by the modulus of continuity of derivatives of f . Such
“Jackson-type” inequalities were established, for example,
by De Vore (1977a, b) for polynomials and splines, respectively, and by Anastassiou and Yu (1992) for wavelets.
Much of the subsequent developments have focused on
the special case where the available data points satisfy
the monotonicity constraints themselves, so that monotone
interpolation becomes feasible. McAllister et al. (1977),
based in part on Passow and Roulier (1977), provide an
algorithm that determines an interpolating shape-preserving
(i.e., increasing, convex) spline. The monotonicity domain
for the parameters of spline polynomials is generally not
simple. For example, in the case of cubic 1 -splines, the
monotonicity domain for the data-normalized coefficients
consists in the nontrivial union of a triangle and an ellipsoid
(Fritsch and Carlson 1980). This stands in contrast to the
computationally much simpler polyhedron obtained here
using so-called B-splines (see, e.g., De Boor 1978/2001).4
1160
B-splines have been used by Dierckx (1980) for approximating convex functions, and by Ramsay (1988) for
approximating increasing functions. Kocić and Milovanović
(1997) and Kvasov (2000) survey recent advances in
shape-preserving approximation of data by polynomials
and splines, which include more general shape constraints
such as symmetry, unimodality, or discontinuities. The
extant work on shape-constrained spline interpolation has
produced a multitude of algorithms that yield various
candidates for interpolation functions, depending on the
precise constraints, smoothness requirements, and degrees
of the splines considered. However, for the purpose of
jointly identifying and optimizing a decision model it is
necessary to characterize the class of suitable models,
i.e., by constructing finite-dimensional and computationally simple parameter domains, rather than by pinpointing
particular candidate functions. In that sense, our work is
related to recent papers by Beliakov (2000, 2002), who
examines conditions for the approximation of functions
using least-squares splines subject to shape constraints.5
c. Applications. Shape-preserving approximation techniques have seen only very few applications in economics
and management science.6 Wang and Judd (2000) approximate a two-dimensional value function in a savings allocation problem using two-dimensional shape-constrained
splines. In the statistics literature, there has been work on
the parametric estimation of functions subject to economically motivated shape constraints, such as isotone regression (Robertson et al. 1988) or concave regression (Hanson
and Pledger 1976). This work is principally concerned with
model identification and relies on the assumption that the
chosen class of parametric regressors is appropriate for the
model under consideration. The criteria used to measure
the approximation error (such as mean square error or maximum absolute error) are typically disconnected from the
underlying decision problem. Besbes et al. (2010), motivated in part by this deficiency, develop a hypothesis test
for the significance of a parameterization based on sample data of objective-function values when comparing the
parameter-induced optimum to a nonparametric baseline
estimate of the objective function. In sample-sparse environments, however, parametric methods are of limited use
only, because the data does not induce any reasonable
assumptions on the model class. We therefore concentrate
on nonparametric (or semiparametric) methods.
A much-needed assumption for many of the available
nonparametric estimation methods in the literature is that
a “well-diversified” and “sufficiently large” data sample is
available. However, in many applications, such as the problem of designing optimal credit-card settlement offers discussed in §5, adding a new sample point may be fairly
expensive. As a result, a method is needed that performs
well in such sample-sparse environments by systematically
using all available structural information to identify primitives of the problem in view of making good decisions
Chehrazi and Weber: Monotone Approximation of Decision Problems
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
(and not necessarily to minimize an abstract error criterion). We assume that the structural information is available in the form of monotonicity constraints and require
an entire class of approximating functions (consistent with
these constraints) to remain “in play” until a decision
has been selected, tightly coupling the optimization and
identification.
1.2. Performance Comparison
To illustrate the performance gain that can be achieved
using our method in a sample-sparse environment, we
compare it to three standard nonparametric estimation
procedures: (i) kernel-based, (ii) K-nearest-neighbor, and
(iii) orthogonal-series.7 Each of these methods provides an
estimate y of a dependent variable ỹ, conditional on the
realization x of an independent variable x̃. The estimated
functional relationship, f x = Eỹ x̃ = x
, is then typically used in place of the unknown relationship in the decision problem. Another common feature of these methods is
that they all seek to replace the true functional relationship
y = f x by an approximate relationship,
fˆx =
n
j=1
j Wn j x
(1)
which is obtained as a superposition of n “weight functions,” Wn j · , j ∈ 1 n. Specifically, in methods (i) and (ii) n is the cardinality of the data sample
the weight function
z = xj yj nj=1 , it is j = yj , and is given by Wn j x = x − xj // nl=1 x − xl /,
where → is a suitable kernel function. Note that in
method (ii) the (variable) scaling factor = x is chosen
so as to include the K nearest data points closest to x in (1).
Common shapes of · include rectangular, triangular,
and quadratic (Epanechnikov). In method (iii) n is the order
of the (truncated) orthogononal series, the Wn j s are basis
functions for the approximation class (e.g., 0 a b
), and
the j s are the corresponding Fourier coefficients.
In sample-sparse environments (see, e.g., the application
in §5), methods (ii) and (iii) are unsuitable. In method (ii),
any nontrivial support diameter may lead to excessive averaging, which makes it impossible to obtain function estimates that include higher-frequency components (which
allow for fast variations and thus flexible intersample
behavior in sample-sparse regions). Method (iii) does allow
for higher frequencies, yet in its standard form provides
only a single trend estimate, without necessarily interpolating the data. At least in data-sparse settings, the latter is not useful when the accuracy at the few sample
points is fairly large. We also note that the quality of the
approximation is limited by the well-known sampling theorem (Shannon 1949) that relates the maximum frequency
in (1) (“bandwidth” B) to the required distance between
two sample points (“Nyquist rate” 1/2B). Lastly, standard orthogonal-series methods are at present unsuited for
use in conjunction with monotonicity constraints.8
Chehrazi and Weber: Monotone Approximation of Decision Problems
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
Consider now method (i), kernel-based estimation.9 Similar to method (ii), the Wn j s are scaled by a (constant)
scaling factor > 0 to avoid trivial estimation of the
relation y = f x between sample points. As before, this
introduces—via averaging—an approximation error in the
function estimate fˆx, especially when the (few) available data points are fairly accurate. Furthermore, the function estimate inherits the properties of the kernel functions.
Because the latter are typically not monotone, this complicates the consideration of shape constraints. The following
example illustrates the performance of our procedure based
on B-splines in comparison to method (i).
Numerical Example. For the Epanechnikov kernel,
with
−1
n
x − xl 2
Wn j x =
1−
1xl − xl +
l=1
x − xj 2
1xj − xj +
· 1−
Figure 2.
1161
given the support diameter 2 > 0, Figure 2(a) shows the
kernel-based estimation procedure applied to a samplesparse debt-collection problem (cf. §5, Segment 2) to obtain
an estimate fˆx of a “response rate,” where x ∈ 0 1
corresponds to a settlement rate, and where the data sample z = 0 1 049 00996 06 00743 1 00457
is given. The decision maker is interested in maximizing
the expected revenue gfˆx x = xfˆx, depicted in Figure 2(b). As can be seen, kernel-based estimation incorrectly suggests an optimal settlement rate of 100%, given
various reasonable bandwidths, although the data itself suggest that a settlement rate of 49% is superior. Figures 2(c)
and 2(d) show the corresponding results for the monotone
approximation method used in this paper. As detailed in
§5, the method predicts that the optimal average performance is obtained at a settlement rate of 20%. The expected
return (“spin”) at the optimal settlement rate suggested by
our method is about 80% larger than what is achieved by
the kernel-based estimator.10 The performance difference
(a) Approximation of the response-rate function using Epanechnikov kernel. (b) Expected income (spin)
using Epanechnikov kernel. (c) Approximation of the response-rate function by a stochastic process using
the proposed monotone approximation method. (d) Stochastic process of expected income (spin) using the
proposed monotone approximation method.
Chehrazi and Weber: Monotone Approximation of Decision Problems
1162
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
for a different robustness measure (competitive ratio) is
similarly large. The main driver for the difference is that
the new method considers all functional relationships consistent with the existing knowledge about the structure of
the decision problem, which provides significant additional
information, especially in sample-sparse settings.
1.3. Outline
The paper is organized as follows. Section 2 introduces
the decision problem, its decomposition into component
functions, and the basic problem of approximating the
component functions subject to structural monotonicity
constraints. Section 3 provides characterizations of the
“monotonicity domain” and the “robust approximation set”
conditional on observed sample data. Spline parameters
in the latter set guarantee that approximations are within
a prespecified approximation-error bound and satisfy the
monotonicity constraints. Section 4 examines informational
issues related to the use of sample data, in terms of both
consistent belief updates and information acquisition by
selecting an optimal experiment. In §5 we analyze a debtsettlement problem as an application of the method in a
sample-sparse setting. Section 6 concludes.
2. The Model
2.1. Decision Problem
We consider the decision problem of finding
x∗ ∈ arg max gf x x
x∈X
(2)
where X = a b
⊂ , with − < a < b < , is a given
choice set, f X → is a component function, and g ×
X → is a structural objective function. We make the
following assumptions on the primitives of the decision
problem (2).
A1. g ∈ L0 , i.e., the structural objective function g is
Lipschitz-continuous.
A2. For some r 1 f ∈ r a b
, i.e., the component
function f is r-times continuously differentiable.
We assume that although the structural objective function
g is known, an analytic expression for the component function is generally not available. In other words, the decision
maker knows how to compute his objective function
"x = gf x x
(3)
Remark 2. The reason for writing the objective function
in (2) as a composition of a structural objective function
and a component function is that this enables the decision maker to use both his knowledge about properties of
the component function and his knowledge about how the
component function affects the objective function before
committing to a specific feasible action x ∈ X. Using this
decomposition, the decision maker can solve the identification problem and the optimization problem jointly. The
standard procedure would be to first solve the identification
problem to obtain a component-function approximation
fˆ X → . The decision maker’s approximate objective
function is then
"x
= gfˆx x
leading to the approximate solution
x ∈ arg max gfˆx x
x∈X
This procedure, however, ignores the model uncertainty
resulting from the approximation of f by fˆ at the optimization step.
Lastly, we assume that the decision maker knows that
the component function satisfies a set of monotonicity
constraints.
M1. The component function f satisfies a first-order
monotonicity constraint of the form
#1 f x 0
∀ x ∈ X
(4)
given some real number #1 .
M2. If r 2, then the component function f satisfies a
second-order monotonicity constraint of the form
#2 f x 0
∀ x ∈ X
(5)
given some real number #2 .
Note that if #1 = 0 or #2 = 0, then the corresponding constraints (M1 and/or M2) are not active. Without loss of
generality, #1 and #2 can be chosen in −1 0 1.
2.2. Component-Function Approximation
Data relative to the component function f is available in
the form of the sample z = xj yj nj=1 containing n noisy
observations zj = xj yj with a = x1 < x2 < · · · < xn = b,
which are such that
only after f has been identified. For example, if x is the
price of a product and f x denotes a (downward-sloping)
market demand curve, then a revenue-maximizing firm’s
objective function "x is given by gf x x = xf x.
for all j ∈ 1 n, where $j are realizations of independent zero-mean random variables $̃j .
Remark 1. The situation in which the objective function
itself is unknown corresponds to the special case where
gf x x = f x.
Remark 3. In applications, multiple observations of the
component function evaluated at the same set of xj -values
can sometimes be obtained. If for a given xj the i.i.d.
yj = f xj + $j
(6)
Chehrazi and Weber: Monotone Approximation of Decision Problems
1163
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
sample yj 1 yj m is available, then by the strong law
as
of large numbers ȳj m = yj 1 yj m /m −→ f xj as
m → . Furthermore, by the central limit theorem the sample mean ȳj m is asymptotically normal.11 Thus, in practice
we collapse the vector yj 1 yj m to its sample mean
yj = ȳj m and assume that the corresponding sample error
is normal, i.e., $̃j ∼ N 0 (j2 , with zero mean and variance
m
1
2
(j2 = mm−1
l=1 yj l − ȳj m .
Figure 3.
Model primitives in the parameter space.
Given a family of functions s · ) * → , indexed
by a parameter vector * in a finite-dimensional parameter
space +, we henceforth use the approximation error
e* z = w1 sx1 ) * − y1 wn sxn ) * − yn = max wj sxj ) * − yj j∈1n
(7)
which represents the (weighted) maximum absolute deviation of s · ) * from the sample z, where w = w1 wn is a given vector of positive weights.12 The inclusion of
weights in the error norm is isomorphic to having different
standard deviations for different yj s (cf. Remark 3). The
weight is typically chosen larger at a point with smaller
standard deviation, e.g., wj = 1/(j .
2.3. Robust Decision Making
The set +# ⊂ + containing parameters for functions s · ) *
that satisfy the monotonicity constraints M1 and M2 for a
given # = #1 #2 is referred to as monotonicity domain.
Relative to the family of functions that is considered for
the approximation of a component function, the set +#
contains all component-function approximations that satisfy the monotonicity requirements, i.e., all those that could
possibly be relevant for the decision problem (2), even in
the complete absence of empirical data.
Given a sample z, which contains data on the component
function, we can, for any approximation error 0, deter # z ⊂ +# that contains all those parammine the set +
eter vectors * for which the maximum absolute devia # z is referred to as the robust
tion e* z . The set +
approximation set (cf. Figure 3). It is used as a starting
point for what we term a robust solution to the decision
problem (2). More specifically, considering componentfunction approximations in s · ) **∈+# , we say that x∗ z
is a robust solution to the decision problem (2) (relative to
the sample z) if
∗
x z ∈ arg max gsx) * x * ∈ +# x∈Xz
(8)
where
Xz
=
X
z
(9)
∈ ¯
¯
and
# z z
X
z = arg max gsx) * x * ∈ +
x∈X
(10)
for all in the set of admissible approximation errors,
.
¯ The lower bound cannot be smaller than # z,
¯
¯
#¯z is
which is the minimum achievable error such that +
¯
nonempty (cf. Equation (25) in §3.3) and ¯ is a maximum
acceptable error, specified by the decision maker.13 Given a
of the parameter space +,
nonempty measurable subset +
is a realthe robustness criterion gsx) * x * ∈ +
valued functional that evaluates its argument gsx) * x
Following are a number of well-known robustat all * ∈ +.
ness criteria. Let F* z be a cumulative distribution function (cdf) that represents the decision maker’s beliefs over
conditional on the observed data
realizations of * in +,
sample z.14 The following are standard robustness criteria
for x ∈ X:
z
Avg gsx) * x * ∈ +
= gsx) * x d F* z Average Performance;
+
z
WC gsx) * x * ∈ +
= infgsx) * x d F* z > 0
Worst-Case Performance;
z
EG gsx) * x * ∈ +
=
gsx) * x − maxgsx)
* x
d F* z
+
x∈X
Expected Gain;
z
CR gsx) * x * ∈ +
gsx) * x
=
d F* z
maxx∈X
* x
+
gsx)
Competitive Ratio.
Other robustness criteria can be used. All robustness criteria are defined analogously when they are not conditioned on z. Our definition of a robust solution x∗ z to
the decision problem (2) in (8)–(10) guarantees that the
Chehrazi and Weber: Monotone Approximation of Decision Problems
1164
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
decision maker does not pay excessive attention to the data
when jointly considering the problems of approximating the
objective function and determining a best action. In relation (10) the decision maker determines the set of optimal
actions X
z, considering all component-function candidates s · ) * that achieve a specified error bound relative
to the available data sample z. In (9) the decision maker
determines the superset of actions that may be optimal,
given any allowable error ∈ .
¯ 15 Finally, in (8) an
¯
optimal robust action is determined by testing any potential
solution against all parameter realizations * ∈ +# , without conditioning on z. The decision-relevant information
contained in the data sample z is transmitted from prob
lem (10) to problem (8) via the set Xz.
For example,
when the amount of data is very large, then because of
the fact that the conditional distribution F* z in (10) is
sharply concentrated around the true parameter value, the
set X
z becomes insensitive to the error , so that the
cardinality of Xz
is “small.”
∗
Remark 4. The robust solution x z in (8)–(10) implies
the optimal approximation error
¯ x∗ z ∈ X
z
(11)
ˆ∗ z = inf ∈ ¯
which is the smallest allowable error such that X
z
contains x∗ z.
Remark 5. To understand the hierarchical nature of a
robust solution to the decision problem (2) that satisfies (8)–(10), it is useful to consider a simple example without monotonicity constraints (i.e., # = 0). Let
sx) * = 2* − x, where the parameter * 0 is unknown,
and let gsx) * x = sx) *x = 2* − xx. Given a
# z =
sample z = xj yj nj=1 and , it is * ∈ +
¯
0 ∈ +# maxj∈1n yj − 20 − xj if and only
if *1 z * *2 z, for some appropriate functions
*1 , *2 . Taking average performance as a robustness criterion, solving the optimization problem (10) yields
# z z
X
z = E*˜ * ∈ +
*2 z
= arg max
2* − xx d F* z
x∈X
*1 z
where X = 0 x̄
for some large enough x̄ > 0. The solution
to (9) becomes
Xz
= E*˜ *1 z * *2 z z
¯
so that by Equation (8) we obtain as the optimal robust
˜
decision x∗ z the point in Xz
that is closest to *¯ = E*
.
Remark 6. If the hierarchy between problems (8) and (10)
is eliminated by replacing the monotonicity domain +#
¯ # z, then the
in (8) by the robust approximation set +
decision maker can choose ¯ so as to ignore certain unfavorable parameter values. Using such an “ostrich-algorithm,”
the decision maker may be led to overly optimistic actions
when the initially observed sample suggests “good news.”16
3. Monotone Approximation
We now consider the approximation of a twice continuously differentiable real-valued component function f .
By M1 and M2, f satisfies the monotonicity constraint
min#1 f x #2 f x 0
(12)
for a given # = #1 #2 . The key idea is to select a family of approximating functions s · ) *, so that the monotonicity constraint (12) can be transformed into a parameter
constraint such that
min#1 s 1 x) * #2 s 2 x) * 0 ⇔ * ∈ +#
(13)
for an appropriate monotonicity domain +# ⊆ +. The latter
is determined explicitly in Corollary 1 below.
3.1. Approximating Functions
The set of approximating functions is the linear space
k 2 of splines of degree k 1, defined on the interval
a b
with an ordered vector of knots 2 ∈ a b , where
a b = 20 23+1 ∈ 3+2 a = 20 < 21 < · · · < 23+1 = b and 3 ∈ Let 2 ∈ a b and 3 = dim2 − 2. A spline s ∈ k 2 is
a function s a b
→ that satisfies the following two
conditions.
S1. For all i ∈ 0 3: s2i 2i+1 ∈ k , i.e., the restriction of the function s to the interval 2i 2i+1 is a polynomial of degree k;
S2. s ∈ k−1 a b
, i.e., the function s and its derivatives
up to order k − 1 are continuous on a b
.
Remark 7. (i) The components of the knot vector 2 do not
have to be strictly increasing if condition S2 is appended
as follows: if 2i = · · · = 2i+l for some i l with l k, then s
is required only to have continuous derivatives up to order
k − l − 1 at the point 2i . (ii) It is possible to use splines of
degree k = 0, i.e., piecewise-constant functions, by dropping condition S2 altogether and replacing the closed intervals 2i 2i+1 in condition S1 by the right-open intervals
2i 2i+1 . The right endpoint sb is then either undefined
or specified separately (see, e.g., Dierckx 1993).
For any i ∈ −k 3, let Ni k+1 a b
→ denote a
(normalized) B-spline of degree k with knots 2i 2i+k+1 ,
where we assume that17
2−k = 2−k+1 = · · · = 2−1 = 20 = a
and
b = 23+1 = 23+2 = · · · = 23+k = 23+k+1 It can be shown that every spline s ∈ k 2 has a
unique B-spline representation as a linear combination of
B-splines, in the form
sx) * = Nk+1 x*
for all x ∈ a b
(14)
Chehrazi and Weber: Monotone Approximation of Decision Problems
1165
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
where
Similarly, the second derivative of the ordered basis Nk+1
can be expressed in the form
0 0
2
Nk+1 = 0 0 Nk−1 6k+1 (18)
0 6k
* = *−k *3 T ∈ k+3+1
is the B-spline-coefficient vector, and
Nk+1 = N−k k+1 · · · N3 k+1 is a row vector of k + 3 + 1 B-splines, forming a basis for
the spline space k 2.
Remark 8. For our developments below it proves useful
to introduce an equivalent representation of s in parameters
*ˆ = L−1
k+1 *,
ˆ = Nk+1 x*ˆ = Nk+1 xLk+1 L−1 * = sx) *
sx) *
k+1
(15)
where Nk+1 = Nk+1 Lk+1 , and Lk+1 is a k + 3 + 1 ×
k + 3 + 1 lower-triangular matrix with ones as the only
nonzero entries, so that


1
0


 −1

1






−1
1
L−1
=
(16)
k+1






··· ···


0
−1 1
We recall that splines of any positive order can be used
to approximate continuous functions on a compact interval
in the sup-norm.
Lemma 1 (Denseness of Splines in 0 a b
).18 Let k 0.
For any given f ∈ 0 a b
and > 0, there exist 2 ∈ a b
and s ∈ k 2 such that supx∈a b
f x − sx < .
3.2. Monotonicity Domain
Given an element of the spline space k 2 in the representation (14) or (15), we seek to convert the monotonicity constraint (12) (i.e., M1 and M2) into an equivalent
parameter constraint of the form * ∈ +# , where +# is an
appropriate monotonicity domain.
The first derivative of the ordered basis Nk+1 can be
written, using standard properties of B-spline functions
(cf. Appendix B),19 in the form
1
Nk+1 = 0 Nk 6k+1 (17)
where Nk = N−k+1 k N3 k forms an ordered basis for
k−1 2, defined over the same set of knots as Nk+1 ,
−1
6k+1 = Lk+1 Dk+1 and

1

Dk+1 = 
k
20 − 2−k
0
−1
= Dk+1
L−1
k+1 21 − 2−k+1
0
2k+3 − 23





In terms of the representation (15), the first and second
ˆ are therefore
derivatives of s · ) *
ˆ = N 1 x*
ˆ
s 1 x) *
k+1
(19)
ˆ = N 2 x*
ˆ
x) *
k+1
(20)
s
2
for all x ∈ a b
, where
1
−1
Nk+1 = 0 Nk Dk+1
2
Nk+1 = 0 0 Nk−1 and
0
0
0
Dk−1 L−1
k
−1
Dk+1
Proposition 1 (First-Order Monotonicity). Let k 0,
2 ∈ a b , and let s ∈ k 2 be given in the B-spline represenˆ for some *ˆ = *ˆ−k *ˆ−k+1 *ˆ3 T ∈
tation (15), s = Nk+1 *,
k+3+1
. Then
(i) s is nondecreasing on a b
if
*ˆ = *ˆ−k+1 *ˆ3 T 0)
(21)
(ii) if k ∈ 0 1 2 and s is nondecreasing on a b
, then
condition (21) must hold.
Part (i) states that the first-order monotonicity constraint M1 is implied by the linear constraint (21). Part (ii)
guarantees that for splines of at most second degree this
condition is also necessary.
To ensure that the second derivative of a spline s ∈ k 2
is nonnegative it suffices to have Ik 0 6k *ˆ 0, where
6k = diag21 − 2−k+1 2k+3 − 23 Lk −1
(22)
and Ik 0 is the k + 3 × k + 3 identity matrix whose first
element is zero.
Proposition 2 (Second-Order Monotonicity). Let k 1,
2 ∈ a b , and let s ∈ k 2 be given in the B-spline repˆ for some *ˆ = *ˆ−k *ˆ−k+1 resentation (15), s = Nk+1 *,
T
k+3+1
ˆ
*3 ∈ . Then
(i) s 1 is nondecreasing on a b
if
Ik 0 6k *ˆ 0)
(23)
(ii) if k ∈ 1 2 3 and s 1 is nondecreasing on a b
,
then condition (23) must hold.
Condition (23) is equivalent to
*ˆ−k+2
*ˆ3
*ˆ−k+1
··· 21 − 2−k+1 22 − 2−k+2
2k+3 − 23
Chehrazi and Weber: Monotone Approximation of Decision Problems
1166
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
Remark 9. Linear inequality conditions (similar to the
ones in Propositions 1 and 2) on the B-spline coefficient *ˆ
of a spline s ∈ k 2 can be obtained to guarantee that
a higher-order derivative s r , for some r ∈ 1 k, is
nonnegative-valued a.e. on a b
.
Corollary 1 (Monotonicity Domain). Let k ∈ 1 2,
2 ∈ a b , and let s ∈ k 2 be given in the B-spline representation (14), s = Nk+1 *, for some * ∈ k+3+1 . For any
# = #1 #2 it is
min#1 s 1 x) * #2 s 2 x) * 0
⇔ * ∈ +# = 0 ∈ k+3+1 M# 0 0
where
M# =
0
#1 I k
0
#2 Ik 0 6k
2k+3×k+3+1
L−1
k+1 ∈ and Ik denotes the k + 3 × k + 3 identity matrix.
The last result provides a characterization of the monotonicity domain +# , combining the insights from the first
two propositions. It gives necessary and sufficient parameter restrictions for the corresponding approximation functions to satisfy the monotonicity constraints M1 and M2.20
3.3. Robust Approximation Sets
Given a data sample z = xj yj nj=1 (of size n 3) and
an allowable approximation error , we consider the subset
# z = * ∈ +# e* z s · ) * ∈ k 2
(24)
+
of the monotonicity domain +# , which we term a robust
approximation set. This set contains all parameter vectors *
that are compatible with z and . Relative to the available data and the allowable approximation error, candidates
for robust decisions (i.e., decisions that maximize a robustness criterion as introduced in §2.3) are judged against all
parameters in +# (cf. Equations (8)–(10)).
Proposition 3 (Boundedness). Let k ∈ 1 2. For any
# = 0 and nonnegative , the robust approximation set
# z is a bounded polyhedron.
+
The last result ensures that no matter what the approximation error or the monotonicity constraints on the
component function (in terms of # = 0), the parameter
set of interest is a compact subset of a finite-dimensional
Euclidean space, allowing the use of standard numerical
algorithms when solving the robust decision problem. In
the case where # = 0, e.g., when no structural knowledge
about the monotonicity of the component function is avail # z may become
able, the robust approximation set +
unbounded.
# z is
Let # z denote the smallest error such that +
¯
nonempty, i.e.,
# z = (25)
# z = inf 0 +
¯
If # z = 0, then the approximation error is zero, and inter¯
polation
of the sample data is possible.21
To determine if for a given the robust approximation
set is empty or not, we formulate the following “sample
monotonicity conditions” in analogy to the original monotonicity conditions M1 and M2.
For a given real constant #1 the sample z satisfies
M1.
a “first-order divided-difference condition” of the form
#1 :j z ≡ #1
yj+1 − yj
0
xj+1 − xj
(26)
for all j ∈ 1 n − 1.
For a given real constant #2 , the sample z satisfies
M2.
a “second-order divided-difference condition” of the form
yj+1 − yj
yj − yj−1
2
0
(27)
−
#2 :j z ≡ #2
xj+1 − xj xj − xj−1
for all j ∈ 2 n − 1.
In general, there is no reason to believe that the (noisy)
data points in z always satisfy the sample monotonicity conditions in Equations (26)–(27). As long as the approximation error is allowed to be positive, there is no need to satisfy
,
and M2
them strictly. The following -relaxations, M1
determine if the sample z can be “adjusted” to a sample ẑ =
and
xi ŷi ni=1 such that the adjusted sample ẑ satisfies M1
and at the same time the distance between z and ẑ (in
M2
the sup-norm) does not exceed . Each one of these can be
conveniently formulated in terms of a linear program (LP).
The value of the LP
M1

n


min
t<
s.t.


t ŷ

<=1







−#1 ŷj+1 − ŷj tj 



1 i n
0 ti 1 j n − 1
− wi ŷi − yi 




is zero, given the constants 0 and #1 ∈ −1 0 1.
The value of the LP
M2


n
ŷj+1 − ŷj ŷj − ŷj−1



min t<
tj 
s.t. −#2
−




t ŷ


xj+1 −xj xj −xj−1
<=1

1 i n




2 j n−1
0 ti − wi ŷi −yi 




is zero, given the constants 0 and #2 ∈ −1 0 1.
We say that the sample z satisfies the -relaxation of the
or M2
if the optimal
sample monotonicity condition M1
value of the corresponding LP vanishes. The sample z sat
and M2
isfies both -relaxations jointly if the LPs in M1
admit a common solution t) ŷ = t1 tn ) ŷ1 ŷn ,
and their respective optimal values are both zero. It is now
possible to use these conditions to guarantee that the mono # z is nonempty.
tone approximation set +
Chehrazi and Weber: Monotone Approximation of Decision Problems
1167
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
Proposition 4. Let #1 ∈ −1 0 1 and 0 be given.
If the sample z satisfies the relaxed sample monotonicity
# z
, then there exists 2 ∈ a b such that +
condition M1
is nonempty for # = #1 0.
Remark 10. (i) Proposition 4 generalizes the main finding
in Pruess (1993) for monotone spline interpolation from
k = 3 to k 1. (ii) The proof of our result is constructive
in the sense that it provides a simple condition N1 for the
knot vector 2, which can be used directly for a numerical
implementation.
Proposition 5. Let # ∈ −1 0 12 and 0 be given.
If the sample z satisfies the relaxed sample monotonicity
and M2
jointly, then there exists 2 ∈ a b
conditions M1
such that + # z is nonempty.
Remark 11. As for Proposition 4, the proof of Proposition 5 is constructive, and provides an algorithm for finding
an appropriate knot vector, in terms of conditions N1 and
N2, which together are less restrictive than condition N1
mentioned in Remark 10.
As pointed out in §2.3, conditional on observing the
data sample z = xj yj nj=1 , an optimal robust decision
x∗ z together with an optimal approximation error ˆ∗ z
(cf. Remark 4) can be obtained by solving the nested pair
of optimization problems (8)–(10). The robustness criterion
generally depends on the decision maker’s beliefs about
the realization of the parameter vector *. We now investigate two important issues, the belief update conditional
on sample data, and the acquisition of new information by
designing an optimal experiment.
4.1. Belief Update
The decision maker’s prior beliefs about the distribution of
the parameter vector * on the monotonicity domain +# ,
given by the cdf F *, can be updated using the data sample z. In Remark 12 below we discuss how such beliefs
can be obtained in practice. Let the updated beliefs be represented by the cdf F * z. Furthermore, for any * ∈ +# ,
let hz * be the probability density for the realization
of the random data sample z conditional on the parameter vector *. Because the yj s are independent given *, by
Remark 3 we have
n
yj − sxj ) *2
1
exp −
hz * =
√
2(j2
2" (j
j=1
Given F * and hz *, the updated beliefs are obtained
by Bayes’ rule,
hz *dF *
hz 0 dF 0
+#
F* z = 1*∈+ # z
F * z
˜
# z
P * ∈ +
(29)
(ii) Uniform posterior on iso-error sets. If no other
information about the structural properties of the component function is available, the decision maker may want to
consider all parameter realizations * that produce a given
approximation error, i.e., are such that all
* ∈ ? # z ≡ 0 ∈ +# e0 z = are equally likely. Therefore, the updated beliefs can be
written as
@* z dA z
(30)
dF * z =
+
4. Information for Robust Decisions
dF * z = determining the posterior distribution in Equation (28). We
outline three (in part implicit) approaches.
(i) Uniform prior on +# . Without any available information, a decision maker might simply assume the (entropymaximizing) uniform distribution over the monotonicity
domain +# . The updated beliefs F* z over the robust
# z ⊂ +# are given by
approximation set +
(28)
Remark 12. In practice, the question might arise as to
which prior beliefs F * might plausibly be used for
where @ · z is the standard Lebesgue measure over
? # z,22 and for 0:
˜
A z = P max wj yj − sxj ) * z j∈1n
Assuming that f ∈ k 2 (motivated by Lemma 1), the
error probability A can be derived from the probability dis˜ By Remark 3 the
tribution function of $̃j = yj − sxj ) *.
independent error terms $̃j are normally distributed with
mean zero and variance (j2 . Thus, A z can be written
in the form
n
˜ 2 2 A z = P wj2 yj − sxj ) *
j=1
˜ 2 2 is in fact a
Note that each P wj2 yj − sxj ) *
2
2 2
B1 distribution with mean wj (j and variance 2wj2 (j2 . The
# z can be obtained from Equabeliefs F* z over +
tion (29) as in (i).
(iii) Uniform posterior on iso-difference sets. Similar
to the last method, the decision maker may consider all
parameter realizations *, satisfying
ˆ
*∈?
C # = 0 ∈ +# sxj ) 0 − yj = Cj 1 j n
i.e., located at the same position relative to the sample z,
as equally likely. The updated beliefs are then
@* C z dAC z
dF * z =
n
where the measure @ is as in (ii), and
n
Cj 2
1
AC z =
exp −
√
2(j2
2" (j
j=1
# z are obtained from EquaThe beliefs F* z over +
tion (29) as in (i).
Chehrazi and Weber: Monotone Approximation of Decision Problems
1168
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
4.2. Information Acquisition
Let us now consider the question of how to best improve
the quality of the optimal decision by selecting from all
available experiments one of maximum value. For this we
consider a family of available experiments = r˜ ∈A
indexed by in the compact set A ⊂ 0 1
. Each experiment r˜ is a random variable with realizations in the
(nonempty, measurable) sample space R. Conditional on a
true parameter vector * ∈ +# , it is distributed with the conditional cdf F r *. Let X0 z be the set of “default”
decisions maximizing a robustness criterion (cf. §2.3),
so that by Equation (10)
x0 ∈ X0 z
# z z
⇒ max gsx) * x * ∈ +
x∈X
# z z
≡ ∗ z)
= gsx0 ) * x0 * ∈ +
0
and let X r z be the set of “contingent” decisions
maximizing the robustness criterion conditional on having
observed the realization r of the signal r˜ , so that
x r ∈ X r z ⇒ ∗ r z
# z z r˜ = r = gsx r ) * x r * ∈ +
where
# z z r˜ = r
∗ r z ≡ max gsx) * x * ∈ +
x∈X
for all r ∈ R. From Equations (8) and (9) we obtain the
corresponding optimal values of the robustness criterion,
¯ ∗0 z = max gsx) * x * ∈ +# x∈X0 z
¯ ∗ r z =
max gsx) * x * ∈ +# x∈X r z
where
X0 z =
X0 z and
∈ ¯
¯
X r z =
∈ ¯
¯
X r z
Remark 4 yields the optimal approximation errors ˆ∗ r z
and ˆ∗0 z corresponding to the contingent and default solutions X r z and X0 z. The expected value of the
experiment r˜ is then
V z = ∗ r ˆ∗r z z − ∗0 ˆ∗0 z z d F̄ r z
R
where F̄ r z is the cdf for the realizations of the signal
r˜ obtained by
F̄ r z = F r *zdF * z = F r *dF * z
+#
+#
with conditional cdf F * z as in §4.1. An optimal experiment maximizes the value of information over all available
experiments. Correspondingly, we refer to finding
∗ z ∈ arg max V z
∈A
as the decision maker’s (robust) information acquisition
problem. The solution to this problem (which always exists,
as long as V z is continuous in , for A is compact) defines an optimal experiment in terms of its expected
improvement to the robustness criterion.
5. Application: Optimal Debt-Settlement
Offers
In October 2009, the Federal Reserve Statistical Release
(G.19) reported the 2008 U.S. total consumer credit outstanding in excess of 2.5 trillion dollars, of which about
40% is revolving debt such as unpaid credit-card balances.
The debt-collection sector of the credit-card industry, which
is concerned with the retrieval of outstanding loans from
“delinquent” accounts (with an outstanding repayment significantly past the agreed due dates), can be viewed as a
multibillion dollar industry in and of itself. A credit-card
issuer’s debt-collection process typically consists of different collection phases that vary among the various account
segments (e.g., according to the holder’s FICO score, the
outstanding balance, the account age, or business versus
personal use). A common feature across all segments and
collection phases is that a large fraction (typically in the
order of 40%–50%) of all delinquent accounts fail to repay
any of their debt. Moreover, a delinquent account holder’s
assets tend to “decay” due to financial distress, bankruptcy
costs, and claims by other lenders, which implies declining prospects for collecting the outstanding amount as time
passes. Hence, credit-card issuers increasingly look favorably upon the option of extending early settlement offers
to delinquent account holders. The design of optimal settlement offers can be interpreted as a revenue management
problem, in which a bank sets the settlement rate so as
to maximize the product of collected amount (corresponding to rate times outstanding balance) and a response rate
(describing the likelihood of payback).
Optimization Model. We consider two segments of
accounts. Each segment is modeled in the aggregate with
the total outstanding balance and hidden characteristics
such as willingness and ability to pay off the balance.
These segment-specific characteristics are summarized by
a “type” vector *, which from the bank’s point of view is
distributed on an appropriate type space +. The type vector
describes how a segment’s average response rate, f x ≈
sx) * ∈ 0 1
, decreases in the settlement rate x ∈ 0 1
,
where s0) * = 1. For a given data sample z = xj yj nj=1
and an allowable approximation error the relevant set of
# z ⊂ +# , where # = −1 0. The bank’s goal
types is +
is to maximize its aggregate payoff gsx) * x = xsx) *
using an appropriate robustness criterion (cf. §2.3).
The Data Set. Data obtained from a major creditcard company, covering the time between 2005 and 2007,
includes full financial information for different types of
credit-card accounts, segmented by attributes such as FICO
score, line of credit, outstanding balance, or mortgage status. We focus on two segments of a charge-card product
consisting of about 12,000 and 33,000 similar accounts,23
respectively, each subdivided into three smaller groups. In
the settlement-offer experiment two groups were offered a
reduced settlement rate, and the remaining group served
Chehrazi and Weber: Monotone Approximation of Decision Problems
1169
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
Table 1.
Summary statistics as a function of settlement
rate: (a) sample size; (b) response rate.
(a)
Settlement rate
Segment 1
Segment 2
0.49
0.60
1
1964
5032
8244
22710
2320
5561
4.36
7.43
2.87
4.57
(b)
Segment 1 (%)
Segment 2 (%)
4.74
9.96
as control. The response of an account holder in a given
group is considered positive if the demanded portion of the
outstanding balance is fully paid in a timely manner. Averaging the individual responses in each group (cf. Remark 3)
yields the response rates in Table 1(b). Due to the costliness of settlement-offer trials, and a potentially negative
reputation effect, the available experiments are sparse in X,
with only n = 4 data points available.
Numerical Analysis. The response-rate samples in
and M2
for # = −1 0, so that by
Table 1 satisfy M1
Proposition 5 there exists a knot vector 2 such that the
# z is nonempty for any 0.
robust approximation set +
Indeed, for k = 2, the vector 2 = 0 02 04 06 08 1
guarantees that # z = 0 in Equation (25), i.e., interpola¯
# z
tion is possible. Moreover, by Proposition 3 the set +
is a bounded polyhedron, described by the following set of
linear inequalities,
!
W N̄k+1 * − Y M# * 0
at a settlement rate of 18% (respectively, 17%). For Segment 2 both robustness criteria are maximized at a settlement rate of 20%. Interpreting the competitive ratio, the
expected income at the optimal settlement rate is on average about 90% of the highest achievable repayment, in
either segment.
The solution to the optimization problem (10) is depicted
in Figure 8(a) for Segment 2 when ∈ 0 ¯ and ¯ = 3%.
The optimal decision x∗ z is unique and does not vary
for ∈ 0 ¯ due to the quality of the available data.
This implies that the decision maker can be quite confident in the quality of x∗ z = 20% as an optimal robust
settlement offer.25 Indeed, this offer increases the bank’s
expected payoff by at least 60% over the best available
settlement-rate experiment. Figure 8(b) shows the averageperformance robustness criteria in problems (8) and (10)
together for Segment 2. It can be seen that without any data,
a settlement rate of about 65% becomes optimal. Interestingly, this rate is consistent with one of the bank’s initial
experiments of offering to settle at a rate of 60%.
Distribution of Optimal Robust Decisions. The probability measure d Fx for the distribution of the maximizer
# z, can be
x* ∈ arg maxx∈X gsx) * x, for * ∈ +
obtained from the measure d F* z in §4.1. Because
gsx) * x is continuous in *, the support of d Fx is compact. For = ,
¯ and given a uniform posterior on isodifference sets (cf. Remark 12(iii)), Figure 7 illustrates the
distribution of settlement offers that are optimal for some
parameter realizations.
Information Acquisition. Figure 9 compares the expected spin if another data point at a different settlement
rate were available for either segment. As an example of
how to interpret the curves, Figure 9(b) shows that if we
knew the response rate for a 15% settlement offer for
Segment 2, then we could expect a 0.3% spin increase.
The outstanding aggregate balance of the accounts under
consideration amounts to more than $100 million dollars; a 0.3% spin increase therefore has a significant absolute impact on the bank’s collection revenues. Using these
curves, the solution to the information acquisition problem
in §4.2 (for A = 0 1
and r˜ as the response to offers at
the rate of ) for Segment 1 (Segment 2) is to conduct an
additional settlement experiment at the rate of 20% (15%).
where N̄k+1 and M# are specified in the proof of Proposition 3 and Remark 9, respectively, Y = y1 yn T is
a data vector, and W = diagw1 wn is a diagonal
matrix of positive weights.24 For any 0, the elements of
# z can be obtained as a convex hull of its vertices (or
+
“vertex solutions”). Figure 4 depicts the vertex solutions as
well as their associated response and spin curves for the
two account segments, when = 0. Figure 5 shows both
the response-rate and spin distributions, based on a uniform
posterior on iso-difference sets (cf. Remark 12(iii)), inside
a 99.9% confidence region (corresponding to ).
¯
6. Conclusion
Optimal Robust Decisions. We choose ¯ ∈ 3%42%
so as to provide 99.9% confidence (with the smaller of
the two values for Segment 2; cf. Figure 1(b)), and then
solve Equations (8)–(10) to determine an optimal robust
decision with respect to average performance and competitive ratio as criteria. Figure 6 shows the robustness criteria
used in Equation (10) for = .
¯ As illustrated, Segment 1’s
expected spin (respectively, competitive ratio) is maximized
The proposed robust approximation method has been
shown to yield good results in sample-sparse environments.
Our approach has two key features that aim at making
maximum use of the available data points, as few as they
may be. First, the method allows the direct use of structural knowledge about the problem, in the form of first- and
second-order monotonicity constraints. It therefore supplements the data sample with information about the shape
1170
Chehrazi and Weber: Monotone Approximation of Decision Problems
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
Figure 4.
0 −1 0 z with response rates and associated expected income (spin) for Segment 1. (b) Ver(a) Vertices of +
0 −1 0 z with response rates and associated expected income (spin) for Segment 2.
tices of +
Figure 5.
(a) Distribution of response rates for Segment 2 (with = ).
¯ (b) Distribution of spins for Segment 2 (with
= ).
¯
Chehrazi and Weber: Monotone Approximation of Decision Problems
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
1171
Figure 6.
Expected spin and competitive ratio for settlement offers to Segment 1 and Segment 2 ((a)/(b)).
Figure 7.
Probability density of optimal settlement offer to Segment 1 (a) and Segment 2 (b).
Figure 8.
(a) Maximum expected spin and optimal settlement offer as a function of the allowable sample error (for Segment 2). (b) Robustness criteria in Equation (8) (maximized at x∗ ), and in Equation (10) with information
obtained from the data sample z (maximized at x∗ z).
Chehrazi and Weber: Monotone Approximation of Decision Problems
1172
Figure 9.
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
(a) Expected spin with and without extra information at different settlement rates for Segment 1. (b) Expected
spin with and without extra information at different settlement rates for Segment 2.
of the unknown functional relationship. In contrast to most
of the extant literature on shape-preserving approximation,
we do not constrain the data in order to guarantee the existence of shape-preserving approximations (see, e.g., Fritsch
and Carlson 1980), but characterize the parameter domain
consistent with the shape constraints as a function of the
available data. For any given model error relative to the
data, the resulting robust approximation set as a subset of
the monotonicity domain is a bounded polyhedron.
The second key feature of the method is that it considers the shape-preserving approximation problem embedded in a decision problem. By linking the optimization
and identification problems in Equations (8)–(10), an action
and an approximation error (cf. Remark 4) are obtained
simultaneously. All models consistent with the structural
knowledge are considered up to the very point when the
decision maker finally commits to an optimal robust action.
An interesting consequence of this jointness is that to strive
for the smallest-possible approximation error may not be
optimal, given the goal of determining optimal robust decisions which perform well despite possible variations of the
model specification. The relevant trade-off is between the
increased robustness from considering a larger set of models when relaxing the approximation error, and an improved
estimate of the objective function in the decision problem (2) when decreasing the approximation error. The practical application in §5 confirms that it is not always optimal
to minimize the approximation error, which underlines the
importance of the joint consideration of data and decisions.
These findings relate to recent advances in operational
statistics that seek to relate the error metric for approximation to the objective function. A contribution of this paper
is to combine the use of structural knowledge about the
problem (monotonicity constraints) with joint robust identification and decision making.
Appendix A. Proofs
Proof of Proposition 1
Let k 0, 2 ∈ a b , and let s ∈ k 2 be given in the
ˆ
B-spline representation (15), so that s = Nk+1 *.
(i) By Equation (19), because
1
−1
Nk+1 = 0 Nk Dk+1
= 0 kN−k+1 k /21 − 2−k+1 kN3 k /2k+3 − 23 0
condition (21) is clearly sufficient for the nonnegativity
of s 1 .
(ii) We now show that for k ∈ 0 1 2, condition (21)
is also necessary for s = Nk+1 *ˆ to be nondecreasing. Let
s ∈ 0 2. This means that s is a piecewise-constant function. Using Equation (14), we can write s as
s = N1 *
where * = *0 *3 and N1 = 120 21 121 22 123 23+1 is an ordered basis for 0 2, which partitions
unity over a b
. Having *ˆ 0 is equivalent to
*0 *1 · · · *3 which is a necessary condition for s to be nondecreasing. Let s ∈ 1 2. Using Equation (19), it is s 1 =
ˆ where D−1 is a diagonal matrix with positive
0 N1 D2−1 *,
2
entries. Therefore, s 1 0 if and only if *ˆ 0. Now let
s ∈ 2 2. Equation (19) implies that
ˆ
s 1 = 0 N2 D3−1 *
where the components of N2 are given by

1 − x − 20 /21 − 20 if x ∈ 20 21 N−1 2 x =

0
otherwise
Chehrazi and Weber: Monotone Approximation of Decision Problems
1173
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
for i = −1;


x −2i /2i+1 −2i if x ∈ 2i 2i+1 



Ni 2 x = 1−x −2i+1 /2i+2 −2i+1 if x ∈ 2i+1 2i+2 



0
otherwise)
for i ∈ 0 3 − 1, and
N3 2 x =

x − 23 /23+1 − 23 if x ∈ 23 23+1 
0
otherwise)
for i = 3. Therefore, at x = 2i+1 , i ∈ −1 3 we have
s 1 2i+1 = 2*ˆi /2i+2 − 2i , which yields the claim, i.e.,
s 1 x 0 implies that *ˆ 0.
This completes our proof. Proof of Proposition 2
The proof follows directly from the proof of Proposition 1. Proof of Proposition 3
Using Corollary 1, for any given 0, # ∈ 2 , 2 ∈ a b ,
# z is characterized by
and z = xj yj nj=1 , the set +
the inequalities
W N̄k+1 * − Y M# * 0
where W = diagw1 wn is the (diagonal) matrix
of weights, Y T = y1 yn is a data vector, * T =
*−k *3 is the parameter vector, and


N̄k+1 = 

N−k k+1 x1 N−k k+1 xn ···
···
N3 k+1 x1 



(A1)
N3 k+1 xn # z is linearly constrained, and thus a
First note that +
# z is bounded. For
polyhedron. We now show that +
this we assume that, without any loss in generality, w1 =
# z and # = 0, then the
w2 = · · · = wn = 1. Thus, if * ∈ +
constant
#
"
yn − y1 + 2 y2 − y1 + 2 yn − yn−1 + 2
M = max
b−a
x2 − a
b − xn−1
· b − a + maxy1 yn bounds s · ) * uniformly, i.e., is such that sx) * M
for all x ∈ X. The fact that M does not depend on * implies
# z is bounded. immediately that +
Proof of Proposition 4
We can restrict attention to the case where #1 0, because
the proof is symmetric for #1 < 0. Without loss of generality, one can assume that the yj -values of the sample
,
z = xj yj nj=1 are part of a solution to the LP in M1
holds. We
so that the sample monotonicity condition M1
now show that the following condition on the knot vector
# z
2 ∈ a b implies that the robust approximation set +
is nonempty.
N1. The knot vector 2 = 20 21 23+1 ∈ a b is
such that there is at most one data point in the interior of
the support of each Ni k+1 for all i ∈ −k 3.
By Proposition 1 it is enough to show that if a knot
vector 2 satisfies N1, then there exists a *ˆ such that

N̄k+1 Lk *ˆ = Y (A2)
 ˆ
* 0
with Y = y1 yn T and N̄k+1 as in Equation (A1). This
solution can be constructed as follows. First, examine the
matrix N̄k+1 Lk : because Nk+1 forms a basis that partitions
unity on a b
, i.e., Nk+1 1 1T = 1a b
, each row of
N̄k+1 Lk has a structure similar to a cumulative distribution
function, except that it starts from 1 and decreases to 0,
i.e.,


1 × ··· × 0 0
···


··· 1 1 × ··· × 0 ···
N̄k+1 Lk = 
1

···
where × denotes entries between 0 and 1, decreasing from
left to right. The first relation in Equation (A2) becomes,
therefore,

  *ˆ 
−k
1 × ··· × 0 0
···





1
··· 1 1 × ··· × 0 ···





···
*ˆ


y1
 
 
=    
yn
3
(A3)
If we set *ˆ−k = y1 , then *ˆi 0 for i ∈ −k + 1 3
(obtained from the second relation in Equation (A2))
implies that *ˆi = 0 whenever Ni k+1 x1 = 0. By suppressing the columns corresponding to *ˆ−k and the zero *ˆi s
and then subtracting y1 y1 y1 T from both sides, we
obtain (dropping the first row) an equation of exactly the
same structure as Equation (A3). We can therefore repeat
this procedure, similar to Gauss elimination, and thus deterˆ Note that the right-hand side
mine all components of *.
of Equation (A3) after any step is nonnegative as a conse This in turn implies that *ˆ 0, so that a
quence of M1.
solution to Equation (A2) has been obtained. Chehrazi and Weber: Monotone Approximation of Decision Problems
1174
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
Proof of Proposition 5
Figure A.1.
Construction in the proof of Proposition 5.
Analogous to the proof of Proposition 4, we can, without any loss in generality, assume that the sample z =
xj yj nj=1 satisfies the sample monotonicity conditions
and M2.
Furthermore, we assume that #2 = 1. For
M1
#2 = 0, the result specializes to Proposition 4. The proof
proceeds symmetrically for #2 = −1. Note first that in the
implies condition M1,
following construction condition M1
provided that M2 holds. Therefore, we need only to show
that there exists 2 ∈ a b such that the system

N̄k+1 Lk *ˆ = Y 
(A4)
Ik 0 6k *ˆ 0
where Y = y1 yn T and N̄k+1 as in Equation (A1), possesses a solution. Without loss of generality, we assume
that n is odd,26 x1 ∈ 20 21 , and x0 y0 with x0 < 20 is
such that the augmented sample ẑ = xj yj nj=0 satisfies
and M2.
Furthermore, for all j ∈ 1 n/2 we
M1
denote the intersection point of the straight line through the
points x2j−1 y2j−1 and x2j−1 y2j−1 with the straight
line through the points x2j y2j and x2j+1 y2j+1 , by
xj ŷj . First, we show that there exists 2 ∈ a b such
that Equation (A4) has a solution for k = 1, i.e., there exists
a piecewise-linear function, s1 x) *, with breakpoints in
a = 20 21 23 23+1 = b, that passes through xj yj ,
j ∈ 1 n and is consistent with M1 and M2. Then we
derive a sufficient condition over 2 that makes it possible
to smooth the piecewise-linear function s1 x) * to obtain
an element of k 2. Consider the case where k = 1.
N1. The knot vector 2 = 20 21 23+1 ∈ a b satisfies the following two conditions:
(i) k + 1 (or more) knots are placed between x2j−1 and
x2j for all j ∈ 1 n/2; and
(ii) xj ŷj is between 2<j and 2uj , where <j =
min2i x2j−1 i and uj = max2i x2j i.
Under condition N1, the parameter vector *, specified
hereafter, defines a piecewise-linear function, s1 x) * =
N2 x*, that satisfies Equation (A4),
*i =
y2j−1 − y2j−1
x2j−1 − x2j−1
2i+1 − x2j−1 + y2j−1 where xj−1 2i xj for i j ∈ −1 3 ×
1 n/2 + 1, setting x0 = a and xn/2+1 = b. The
intuition behind this construction is shown in Figure A.1
(for n = 5).
Now let k > 1; we show that there exists a spline
sk x) 0 ∈ k 2 that is comprised of the line segments
connecting x2j−1 y2j−1 with x2j−1 y2j−1 , for all j ∈
1 n/2 + 1. Hence, we need only to show that
there is a spline in k 2̂j with 2̂j = 2<j 2uj for
j ∈ 1 n/2 satisfying the following requirements.
First, it connects the line segments s1 x) *2u 2< and
j−1
j
s1 x) *2u 2< , where 2u0 = 20 , and 2<n/2+1 = 23+1 . Secj
j+1
ond, it has k − 1 continuous derivatives at the points 2<j
and 2uj .
Using the B-spline representation of the second derivative of sk x) 0 in k−2 2̂j , the above two requirements
become
2uj
y2j−1 − y2j−1
y2j+1 − y2j
2
sk J) 0 dJ =
−
(A5)
x2j+1 − x2j x2j−1 − x2j−1
2<j
2uj x
2
sk J) 0 dJ dx
2<j
2<j
= s1 2uj ) * − s1 2<j ) * −
d i 2
s x) 02< = 0
j
dxi k
2
sk x) 0 0
y2j−1 − y2j−1
x2j−1 − x2j−1
and
2uj − 2<j (A6)
d i 2
s x) 02u = 0
j
dxi k
0 i < k − 2 (A7)
∀ x ∈ 2<j 2uj (A8)
2
where sk x) 0 = k−1
l=−k+2 Nl k−1 x0l . Equation (A7)
forces 0−k+2 0−1 and 02 0k−1 to zero. Therefore,
2
sk x) 0 can be written in the form
2
sk x) 0 = N0k−1 x00 + N1 k−1 x01 Using the basic properties of B-splines (cf. Appendix B),
we can write Equations (A5) and (A6) as
2uj −1 − 2<j
k−1
00 +
2uj − 2j< +1
k−1
01 = :2j z − :2j−1 z
(A9)
and
2uj −1 − 2j<
k−1
A00 +
2uj − 2j< +1
k−1
B01
≡:uj <j s1 =
$
%&
'
s1 2uj ) * − s1 2<j ) *
2uj − 2<j
−:2j−1 z
(A10)
Chehrazi and Weber: Monotone Approximation of Decision Problems
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
where
A=
k−2
m=0
+
2<j +m+i − 2<j +m
1 k−1−m
1
k − 1 − m k − m i=0
2uj − 2<j
2uj − 2uj −1
2uj − 2<j
and
B=
k−2
m=0
2<j +m+i+1 − 2<j +m+1
1
1 k−1−m
k − 1 − m k − m i=0
2uj − 2<j
Note that 0 :uj <j s1 − :2j−1 z :2j z − :2j−1 z,
because s1 is convex. Furthermore, there always exists a
knot vector 2 that satisfies N1 and the following condition.
N2. The knot vector 2 = 20 23+1 ∈ a b is such
that
B
:uj <j s1 − :2j−1 z
:2j z − :2j−1 z
A
j ∈ 1 n/2
This guarantees the existence of a nonnegative solution 00 01 to Equations (A9)–(A10) and completes our
proof. Appendix B. Basic Properties of
B-Splines
For a more complete list of properties, see, e.g.,
Dierckx (1993).
1. Positivity:
Ni k+1 x 0
for all x.
2. Local support:
Ni k+1 x = 0
if x
2i 2i+k+1 3. Boundary values:
l
l
Ni k+1 2i = Ni k+1 2i+k+1 = 0
l ∈ 0 k − 1
4. Derivative of a B-spline:
"
Ni k+1 x = k
#
Ni k x
Ni+1 k x
−
2i+k − 2i 2i+k+1 − 2i+1
Endnotes
1. For a general discussion of semiparametric models and
their identification, see, e.g., Powell (1994). The standard
econometric approach revolves around estimation, whereas
our method is fundamentally aimed at solving decision
problems robustly.
1175
2. Constraints on the sign of higher-order derivatives can
be imposed in a similar manner.
3. This work is preceded by Wolibner (1951), who
establishes the existence of monotonic interpolating
polynomials.
4. The definition of B-splines dates back to Curry and
Schoenberg (1947).
5. From a technical perspective, we relax the almost ubiquitous assumption in extant results on shape-preserving
spline approximation that the knots be located at the data
points. The restriction of intersample behavior implied by
this assumption artificially constrains the space of feasible models. In view of achieving robustness, being able to
place knots at any location in the choice set is key.
6. See, e.g., Judd (1999, Chapters 6 and 12) for a brief
overview of shape-preserving numerical methods in economics.
7. See, e.g., Härdle and Linton (1994) and Pagan and Ullah
(1999) for surveys of nonparametric estimation techniques.
8. The B-spline approximation method presented in this
paper can, in the limit, be interpreted as an orthogonal-series
method, for the B-spline polynomials defined on an interval
a b
form an orthogonal basis of 0 a b
when the maximum distance between two knots tends to zero and/or the
degree of the B-splines tends to infinity. However, it allows
for shape constraints and does not suffer from most other
drawbacks of truncated orthogonal series.
9. Kernel-based estimation was introduced by Rosenblatt
(1956) and refined by Nadaraya (1964) and Watson (1964).
10. Due to the distortion caused by each sample on its
neighborhood in kernel-based estimation, there is a significant error in the spin at the sample points.
11. This holds, no matter from which distributions the
yj m s are obtained, as long as Lindeberg’s condition (which
guarantees that no one yj m outweighs the other samples)
is satisfied.
12. The results in this paper can be transferred to other
error norms. By the well-known norm-equivalence on n ,
ˆ
for any (weighted) error ê* z = w
1 sx1 ) * − y1 ˆ
wn sxn ) * − yn , which uses the norm ˆ · ˆ instead
of the maximum-norm · , there exist positive constants
F F̄ such that Fe* z ê* z F̄e* z for any admis¯
sible
* and z. ¯
13. Because by assumption A1 the structural objective
function is Lipschitz, an allowable error on the objective
function implies a maximum acceptable error for the component function.
14. Remark 12 in §4.1 provides several ways for obtaining
such beliefs.
15. By the maximum theorem (Berge 1959) the set Xz
is compact as long as ¯ is finite, which by the Weierstrass
theorem (Bertsekas 1995) guarantees that (8) has a solution.
16. Unsurprisingly, the ostrich bias becomes stronger if we
allow the decision maker to first observe the data and then
to choose .
¯ For example, if, in the setting of Remark 5, it
is impossible that * is greater than some given upper bound
1176
*0 > 0, and the decision maker observes the single data
point x 2*0 − x, then it is best for him to choose ¯ = 0,
effectively ignoring the fact that with probability one the
actual * is smaller than *0 .
17. On the relevant interval a b
, the splines generated by
Ni k+1 i ∈ −k 3 are independent of the choice of
the additional knots 2−k 2−1 and 23+2 23+k+1 , as
long as they are such that 2−k 2−k+1 · · · 2−1 20 = a
and b = 23+1 23+2 · · · 23+k 23+k+1 .
18. The proof of this result follows from Theorem 6 of
De Boor (1978/2001, p. 149).
19. To ensure that all terms in Equation (17) are well
defined, we adopt the convention that 0 · = · 0 = 0.
Such expressions may occur if we have coinciding knots,
as, for example, at the boundary of a b
.
20. As De Boor (1978/2001) points out, increasing the
number of knots is a desirable substitute for increasing
the degree k of an approximating polynomial, especially
because the computational complexity increases sharply
in k (creating substantial difficulties for k 20).
21. Given the sample z, the constant # z in Equa¯
tion (25) always exists as a finite nonnegative
number.
Indeed, it is sufficient to approximate the data by a constant spline, e.g., equal to ȳ = y1 + + yn /n, and set
= maxj∈1 n ȳ − yj .
22. If * ? # z, then @* z = 0. If ? # z = ,
then @ · z = 0.
23. A charge-card balance becomes payable in full at the
end of a payment period, contrary to a lending card, which
allows for partial repayment at the due date and a debt
rollover to the next payment period.
24. Based on the sample variances, for Segment 1 we
choose W = diag 04714 1 05, and for Segment 2 it
is W = diag 07071 1 07071. Note that the variance
for the first (trivial) data point 0 1 vanishes, because all
accounts accept zero repayment; this point is imposed as
an interpolation constraint on the response curve.
25. This statement requires that despite the significant
intersample belief uncertainty the decision maker is not
subject to “ambiguity aversion” (Ellsberg 1961). The plausibility of this assumption is strengthened by the fact
that the simple, one-dimensional settlement-rate decision
is usually taken in a “noncomparative context” (Fox and
Tversky 1995).
26. If n is even, one can extend the sample by an arbitrary
and M2.
data point consistent with M1
Acknowledgments
This research was supported in part by American Express
Company and a Stanford Presidential Faculty Grant.
The authors thank an anonymous reviewer, Ashish Goel,
David Luenberger, Kenneth Judd, Narut Sereewattanawoot,
and Yinyu Ye, as well as participants of the 2009
INFORMS Annual Meeting in San Diego, California, and
the 64th European Meeting of the Econometric Society in
Barcelona, Spain, for helpful discussions and comments.
Chehrazi and Weber: Monotone Approximation of Decision Problems
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
References
Anastassiou, G. A., X. M. Yu. 1992. Monotone and probabilistic wavelet
approximation. Stochastic Anal. Appl. 10(3) 251–264.
Beliakov, G. 2000. Shape preserving approximation using least squares
splines. Approximation Theory Its Appl. 16(4) 80–98.
Beliakov, G. 2002. Monotone approximation of aggregation operators using least-squares splines. Internat. J. Uncertainty, Fuzziness,
Knowledge-Based Systems 10(6) 659–676.
Ben-Tal, A., A. Nemirovski. 1998. Robust convex optimization. Math.
Oper. Res. 23(4) 769–805.
Ben-Tal, A., A. Nemirovski. 2002. Robust optimization—Methodology
and applications. Math. Programming 92(3) 453–480.
Berge, C. 1959. Espaces Topologiques et Fonctions Multivoques. Dunod,
Paris. [English translation: 1963, Topological Spaces, Oliver and
Boyd, Edinburgh, UK. Reprinted: 1997, by Dover Publications,
Mineola, NY.]
Bertsekas, D. P. 1995. Nonlinear Programming. Athena Scientific,
Belmont, MA.
Bertsimas, D., M. Sim. 2004. The price of robustness. Oper. Res. 52(1)
35–53.
Besbes, O., R. Phillips, A. Zeevi. 2010. Testing the validity of a demand
model: An operations perspective. Manufacturing Service Oper. Management 12(1) 162–183.
Curry, H. B., I. J. Schoenberg. 1947. On spline distributions and their limits: The Pólya distribution functions. Bull. Amer. Math. Soc. 53(11)
1114.
De Boor, C. 1978/2001. A Practical Guide to Splines. Springer, New York.
De Vore, R. A. 1977a. Monotone approximation by splines. SIAM J. Math.
Anal. 8(5) 891–905.
De Vore, R. A. 1977b. Monotone approximation by polynomials. SIAM J.
Math. Anal. 8(5) 906–921.
Dierckx, P. 1980. An algorithm for cubic spline fitting with convexity
constraints. Computing 24(4) 349–371.
Dierckx, P. 1993. Curve and Surface Fitting with Splines. Oxford University Press, Oxford, UK.
Ellsberg, D. 1961. Risk, ambiguity, and the Savage axioms. Quart. J.
Econom. 75(4) 643–669.
Fox, C. R., A. Tversky. 1995. Ambiguity aversion and comparative ignorance. Quart. J. Econom. 110(3) 585–603.
Fritsch, F. N., R. E. Carlson. 1980. Monotone piecewise cubic interpolation. SIAM J. Numer. Anal. 17(2) 238–246.
Haimes, Y. Y., D. A. Wismer. 1972. A computational approach to the combined problem of optimization and parameter identification. Automatica 8(3) 337–347.
Haimes, Y. Y., L. S. Lasdon, D. A. Wismer. 1971. On a bicriterion formulation of the problems of integrated system identification and system
optimization. IEEE Trans. Systems, Man, Cybernetics 1(3) 296–297.
Hanson, D., G. Pledger. 1976. Consistency in concave regression. Ann.
Statist. 4(6) 1038–1050.
Härdle, W., O. Linton. 1994. Applied nonparametric methods. R. F. Engle,
D. L. McFadden, eds. Handbook of Econometrics, Vol. 4. Elsevier
Science, Amsterdam, 2295–2339.
Judd, K. L. 1999. Numerical Methods in Economics. MIT Press, Cambridge, MA.
Kocić, L. M., G. V. Milovanović. 1997. Shape preserving approximations
by polynomials and splines. Comput. Math. Appl. 33(11) 59–97.
Kvasov, B. I. 2000. Methods of Shape-Preserving Spline Approximation.
World Scientific, London.
Lim, A. E. B., J. G. Shanthikumar, Z. J. M. Shen. 2006. Model uncertainty,
robust optimization, and learning. Tutorials in Operations Research.
INFORMS, Hanover, MD, 66–94.
McAllister, D. F., E. Passow, J. A. Roulier. 1977. Algorithms for computing shape preserving spline interpolations to data. Math. Comput.
31(139) 717–725.
Chehrazi and Weber: Monotone Approximation of Decision Problems
Operations Research 58(4, Part 2 of 2), pp. 1158–1177, © 2010 INFORMS
Nadaraya, E. 1964. On estimating regression. Theory Probab. Its Appl.
9 141–142.
Pagan, A., A. Ullah. 1999. Nonparametric Econometrics. Cambridge University Press, Cambridge, UK.
Passow, E., J. A. Roulier. 1977. Monotone and convex spline interpolation.
SIAM J. Numer. Anal. 14(5) 904–909.
Powell, J. L. 1994. Estimation of semiparametric models. R. F. Engle,
D. L. McFadden, eds. Handbook of Econometrics, Vol. 4. Elsevier
Science, Amsterdam, 2443–2521.
Pruess, S. 1993. Shape preserving C 2 cubic spline interpolation. IMA J.
Numer. Anal. 13(4) 493–507.
Ramsay, J. O. 1988. Monotone regression splines in action. Statist. Sci.
3(4) 425–461.
Robertson, T., F. T. Wright, R. L. Dykstra. 1988. Order Restricted Statistical Inference. Wiley, New York.
1177
Rosenblatt, M. 1956. Remarks on some nonparametric estimates of a density function. Ann. Math. Statist. 27 642–669.
Shannon, C. E. 1949. Communication in the presence of noise. Proc.
Institute Radio Engineers 37(1) 10–21. [Reprinted in: 1998, Proc.
IEEE 86(2) 447–457.]
Shanthikumar, J. G., L. H. Liyanage. 2005. A practical inventory control
policy using operational statistics. Oper. Res. Lett. 33(4) 341–348.
Shisha, O. 1965. Monotone approximation. Pacific J. Math. 15(2) 667–671.
Wang, S.-P., K. L. Judd. 2000. Solving a savings allocation problem by
numerical dynamic programming with shape-preserving interpolation. Comput. Oper. Res. 27(5) 399–408.
Watson, G. 1964. Smooth regression analysis. Sankhya, Ser. A 26
359–372.
Wolibner, W. 1951. Sur un polynôme d’interpolation. Colloquium Mathematicum 2 136–137.