1
Minimax Lower Bound and Optimal Estimation of
Convex Functions in the Sup-norm
Teresa M. Lebair, Jinglai Shen, and Xiao Wang
Abstract—Estimation of convex functions finds broad applications in science and engineering; however, the convex shape
constraint complicates the asymptotic performance analysis of
such estimators. This paper is devoted to the minimax optimal
estimation of univariate convex functions in a given Hölder class.
Particularly, a minimax lower bound in the supremum norm (or
simply sup-norm) is established by constructing a novel family
of piecewise quadratic convex functions in the Hölder class. This
result, along with a recent result on the minimax upper bound,
gives rise to the optimal rate of convergence for the minimax supnorm risk of convex functions with the Hölder order between
one and two. The present paper provides the first rigorous
justification of the optimal minimax risk for convex estimation
on the entire interval of interest in the sup-norm.
I. I NTRODUCTION
There has been an increasing interest in nonparametric
estimation of shape constrained functions (e.g., monotone or
convex functions) in estimation theory, system identification,
and systems and control [4], [10], [18], [21], [23], [28], driven
by various applications in science and engineering. Examples
include reliability engineering, biomedical research, finance,
and astronomy. The goal of shape constrained estimation is
to develop an estimator that preserves a pre-specified shape
property of an underlying true function, e.g., the monotone or
convex property. Such estimators are challenging to analyze
due to the inequality shape constraints, which lead to nonsmooth conditions in estimator characterization and complicate
asymptotic performance analysis.
Progress has been made toward developing and analyzing
shape constrained estimators in estimation theory. For example, estimators that preserve the monotone property have been
extensively studied, e.g., [18], [21], [22], [28]. In the realm of
convex (or concave) estimation, earlier research is focused on
the least squares approach: the least squares convex estimator
is studied and is shown to be consistent in the interior of
the interval of interest [8]. The pointwise rate of convergence
for the least squares convex estimator is developed in [14]
and pointwise asymptotic distributions are characterized in
[5]. In the area of systems and control, related results include
constrained control theoretic splines [4], the moving horizon
approach for constrained estimation [19], and the constrained
optimal control approach [20], [23]. Nevertheless, these papers
do not carry out the asymptotic analysis.
Teresa M. Lebair and Jinglai Shen are with the Department of Mathematics
and Statistics, University of Maryland, Baltimore County, MD 21250, U.S.A.
Email:{teresa.lebair, shenj}@umbc.edu.
Xiao Wang is with the Department of Statistics, Purdue University, West
Lafayette, IN 47907, U.S.A. E-mail: [email protected].
Given a function class Σ, several key questions arise when
evaluating the asymptotic performance of estimators over Σ:
(1) What is the potentially “best” rate of convergence of
estimators (in terms of sample size) uniformly on Σ?
(2) Is the “best” rate of convergence in (1) strict on Σ for
any permissible (e.g., shape preserved) estimator?
These questions form critical research issues in minimax
theory of nonparametric estimation [9], [10], [16], [27]. In
particular, the first question pertains to the minimax upper
bound on Σ, while the second is closely related to the minimax
lower bound on Σ [15]. For unconstrained estimation, the
above questions have been satisfactorily addressed for both
the Sobolev and Hölder classes under the L2 -norm and supnorm (i.e., L∞ -norm); see [3], [9], [13], [15], [16], [27] and
references therein. This has led to well known optimal rates
of convergence over unconstrained function classes (cf. (2) of
Section II). However, if shape constraints are imposed, then
the minimax analysis becomes more complicated and fewer
results have been reported, particularly when the sup-norm
is considered. It is worth mentioning that a shape constraint
generally does not improve the unconstrained optimal rate of
convergence [11], and it is believed that the same optimal rate
holds on a constrained function class although no rigorous
justification has been given for general shape constraints.
The minimax analysis of convex estimation in the L2 -norm
has recently been obtained in [1], [6], [7]. Unlike the L2 norm, the sup-norm characterizes worst-case performance of
an estimator and is another widely studied norm in estimation
theory [27]. The present paper is devoted to minimax optimal
estimation of univariate convex functions on [0, 1] in a Hölder
class with Hölder order r ∈ (1, 2] under the sup-norm.
The recent paper [29, Theorem 4.1] establishes a minimax
upper bound for this function class using (convex) shape
constrained B-spline estimators. However, the minimax lower
bound for this function class remains open for the following
reasons. Based upon information theoretical results on distance
between probability measures in minimax lower bound theory,
it is known that establishing a minimax lower bound amounts
to constructing a family of functions (or hypotheses) from a
function class satisfying a suitable sup-norm separation order
and a small total L2 -distance order [27, Section 2]. While it
is conceived that there exist many such families, the convex
shape constraint considerably limits the choice of a feasible
family under the sup-norm, especially when a higher order
shape constraint is imposed (recalling that roughly speaking,
the convex constraint places a second order constraint on a
function). Therefore, great care needs to be taken to meet both
2
the order conditions and shape constraints.
To overcome the shape constraint induced difficulties, new
techniques are proposed for developing the minimax lower
bound in this paper, which forms a key contribution of the
present paper. Specifically, we construct a family of piecewise
quadratic convex functions (whose derivatives are increasing
and piecewise linear); see Section III. These functions overlap
on most of the interval [0, 1], except on certain small subintervals. By careful selection of the slopes of the derivatives
of these functions and the length of non-overlapping subintervals, we show that the constructed convex functions satisfy the desired order conditions, thus leading to the minimax
lower bound. To the best of our knowledge, this construction
is the first of its kind for minimax convex estimation. The proposed construction process also sheds light on minimax lower
bounds for monotone and higher order derivative constraints.
The paper is organized as follows. In Section II, we present
the main results of the paper. Section III establishes the
minimax lower bound for the rate of convergence under the
sup-norm. Concluding remarks are given in Section IV.
II. P ROBLEM F ORMULATION
AND
M AIN R ESULT
Consider the convex estimation problem:
yi = f (xi ) + σεi ,
i = 1, . . . , n,
(1)
where f : [0, 1] → R is an underlying convex true function,
σ is a positive constant, εi ’s are independent, standard normal
errors, and xi = i/n, i = 1, . . . , n are the equally spaced
design points. Let f ′ (·) denote the derivative of f . Let
n
C := f : [0, 1] → R f ′ (x1 ) − f ′ (x2 ) · x1 − x2 ≥ 0,
o
for almost all x1 , x2 ∈ [0, 1]
be the collection of continuous convex functions which are
differentiable (almost everywhere) on [0, 1], and HLr be the
Hölder class with the Hölder exponent (or order) r ∈ (1, 2]
and the Hölder constant L > 0, namely,
n
HLr := f : [0, 1] → R |f ′ (x1 ) − f ′ (x2 )| ≤ L|x1 − x2 |γ ,
o
∀ x1 , x2 ∈ [0, 1] ,
where γ := r−1 ∈ (0, 1]. Furthermore, let CH (r, L) := C∩HLr
be the collection of functions in both C and HLr .
Given a function g : [0, 1] → R, its supremum norm (or
simply sup-norm) is given by kgk∞ := supx∈[0,1] |g(x)|. For
estimation of unconstrained functions over HLr , it is known
that for a fixed order r, there exists an estimator attaining the
optimal rate of convergence (in terms of sample size n) over
HLr in the sup-norm [26], [27]. In fact, the minimax sup-norm
risk on HLr has the asymptotic order [27, Corollary 2.5]
r
σ 2 log n 2r+1
1
inf sup E kfb − f k∞ ≍ L 2r+1
,
(2)
r
n
fb f ∈HL
where fb denotes an estimator of a true function f , E(·) is
the expectation operator, inf fb denotes the infimum over all
estimators on [0, 1], and for two positive sequences (an ) and
(bn ), an ≍ bn means that an /bn is bounded by two positive
constants from below and above for all n sufficiently large.
The goal of this paper is to establish the same asymptotic
minimax rate on CH (r, L) with r ∈ (1, 2]. Specifically, the
main result of this paper is presented in the following theorem.
Theorem II.1. Let r ∈ (1, 2]. Then there exists a positive
constant C0 such that
r
log n 2r+1
,
(3)
inf sup E kfb − f k∞ ≍ C0
n
fb f ∈CH (r,L)
where inf fb denotes the infimum over all convex estimators on
[0, 1].
To prove this theorem, we first consider the minimax upper
bound achieved by the convex B-spline estimator fbB shown
in [29, Theorem 4.1]. In particular, by choosing a suitable
number of knots for the convex B-spline estimator, there exists
a positive constant Cr depending only on r ∈ (1, 2] such that
r
σ 2 log n 2r+1
1
sup E kfbB − f k∞ ≤ Cr L 2r+1
, ∀ n,
n
f ∈CH (r,L)
where L > 0 is the Hölder constant. This gives the minimax upper bound over CH (r, L). In the next section (cf.
Section III), we further establish the minimax lower bound of
the optimal rate of convergence via a construction procedure
motivated by [27, Theorem 2.10]. Combining these results
yields the desired optimal rate in (3).
It is worth noting that under the L2 -norm and suitable as2
sumptions, the minimax rate of order n− 5 has been established
for a slightly different class of convex functions in [17]. This
rate is faster than the obtained minimax rate of convergence
in the sup-norm, since the sup-norm characterizes worst-case
performance of a convex estimator.
Remark II.1. When r > 2, under certain assumptions on
CH (r, L), the convex estimation is asymptotically close to
unconstrained estimation; see the minimax upper bound in [29,
Theorem 4.2]. Its minimax lower bound can be established
using unconstrained minimax lower bound techniques in [27].
III. M INIMAX L OWER B OUND OF C ONVEX E STIMATORS
In this section, we show the minimax lower bound for
convex estimation of functions in CH (r, L) with r ∈ (1, 2]
in the sup-norm. The key idea of developing such a lower
bound for nonparametric estimators relies on tools for distance
of multiple probability measures or hypotheses [2]; see [27,
Section 2] for detailed discussions. It follows from minimax
theory (e.g., [27, Theorem 2.5]) that establishing a minimax
lower bound over the function class CH (r, L) in the sup-norm
boils down to the construction of a family of functions (or
hypotheses) fj,n , j = 0, 1, . . . , Mn satisfying the following
three conditions:
(C1) each fj,n ∈ CH (r, L), j = 0, 1, . . . , Mn ;
(C2) whenever j 6= k, kfj,n − fk,n k∞ ≥ 2sn > 0, where
sn ≍ (log n/n)r/(2r+1) ;
(C3) there exists a fixed constant c0 ∈ (0, 1/8) such that for
all n sufficiently large,
Mn
1 X
K(Pj , P0 ) ≤ c0 log(Mn ),
Mn j=1
3
Legend
g0,n
g1,n
g2,n
g3,n
6L̄
γ
Kn
2L̄
γ
Kn
*****
• •
•
•
•
•
• •
*****
0
Fig. 1.
4
Kn
...
4L̄
Kn
• • •
•
•
*****
•
•
2L̄
Kn
*****
1
γ
Kn
g0
g1
g2 • • •
g3
• • •
*****
4L̄
γ
Kn
Legend
6L̄
Kn
1 + 4
γ
Kn
Kn
...
• • •
*****
2
γ
Kn
2 + 4
γ
Kn
Kn
Plot of gj,n ’s near the origin when γ ∈ (0, 1).
where Pj denotes the distribution of (Yj,1 , . . . , Yj,n ),
Yj,i = fj,n (Xi ) + ξi , i = 1, . . . , n, Xi = i/n and
the ξi are iid random variables, and K(P, Q) denotes
the Kullback divergence between the two probability
measures P and Q [12], i.e.,
Z
dP
dP, if P ≪ Q
log
.
K(P, Q) :=
dQ
+∞,
otherwise
In addition, we assume that there exists a constant p∗ >
0 (independent of n and fj,n ) such that K(Pj , P0 ) ≤
2
Pn
p∗ i=1 fj,n (Xi )−f0,n(Xi ) . This assumption holds true if
the iid random variables ξi ∼ N (0, σ 2 ) (cf. [27, (2.36)] or [27,
Section 2.5, Assumption B]). Hence, the convex estimation
problem defined in (1) satisfies this assumption.
In other words, once a family of functions {fj,n } satisfying
the above three conditions is constructed, then the following
minimax lower bound over CH (r, L) will hold:
r
n 2r+1
E(kfbn − f k∞ ) ≥ c (4)
lim inf inf sup
n→∞ fb f ∈C (r,L) log n
n
H
for some constant c > 0 depending on r, L, and p∗ only, where
inf fbn denotes the infimum over all convex estimators on [0, 1].
In view of this, the goal of this section is to construct a family
of suitable functions fj,n satisfying (C1)–(C3).
A. Construction of the Desired Functions fj,n
Consider the function class CH (r, L) with r ∈ (1, 2] and
L > 0, and fix c0 ∈ (0, 81 ). Given a sample size n, let Kn
be a positive number depending on n, whose order of n will
be specified below. We construct the desired functions fj,n in
two separate cases:
Case 1: γ := r − 1 ∈ (0, 1). Let
r
L
c0 γ
,
,
L̄ := min
4
12p∗
where p∗ > 0 is defined above. We shall define the functions
fj,n , j = 0, 1, 2, . . . , ⌊Knγ ⌋ as follows, where ⌊·⌋ denotes the
0
Fig. 2.
4
Kn
8
Kn
12
Kn
Plot of gj,n ’s near the origin when γ = 1.
floor function. First we define the auxiliary functions ḡj,n for
j = 0, 1, . . . , ⌊Knγ ⌋. For i = 0, 1, 2, . . . , let
2iL̄Kn−γ + L̄Kn1−γ (x − Kinγ ),
if x ∈ [ Kinγ , Kinγ + K1n );
−γ
(2i + 1)L̄Kn ,
if x ∈ [ Kinγ + K1n , Kinγ + K3n );
ḡ0,n (x) :=
−γ
1−γ
x − ( Kinγ + K3n ) ,
(2i + 1)L̄Kn + L̄Kn
if x ∈ [ Kinγ + K3n , Kinγ + K4n );
2(i + 1)L̄Kn−γ ,
if x ∈ [ Kinγ + K4n , i+1
γ )
Kn
For each j = 1, 2, . . . , ⌊Knγ ⌋, let ḡj,n = ḡ0,n everywhere
j
except on [ j−1
γ ,
γ ) on which ḡj,n is defined as follows:
Kn
Kn
2(j − 1)L̄Kn−γ ,
j−1
1
if x ∈ [ j−1
);
γ ,
γ +
Kn
Kn
Kn j−1
−γ
1−γ
x − ( Knγ + K1n ) ,
2(j − 1)L̄Kn + L̄Kn
ḡj,n (x) :=
j−1
1
3
if x ∈ [ j−1
γ +
γ +
Kn , Kn
Kn );
Kn
2j L̄K −γ ,
j
3
if x ∈ [ j−1
γ +
γ )
n
Kn , Kn
Kn
For each j = 0, 1, 2, . . . , ⌊Knγ ⌋, let gj,n denote the restriction
of ḡj,n to [0, 1].
Case 2: γ = 1. In this case, choose
r
c0
L̄ := min L,
,
12p∗
and define for i = 0, 1, 2, . . . ,
4i
2i
L̄ Kn + L̄(x − Kn )
L̄ 2i+1
ḡ0,n (x) :=
Kn
L̄ 2i+1 + L̄(x − 4i+3 )
Kn
Kn
if x ∈ [ K4in , 4i+1
Kn )
4i+1 4i+3
if x ∈ [ Kn , Kn )
4(i+1)
if x ∈ [ 4i+3
Kn , Kn )
4j
Also define ḡj,n = ḡ0,n everywhere except on [ 4(j−1)
Kn , Kn ),
on which
2(j−1)
4j−3
if x ∈ [ 4(j−1)
L̄ Kn
Kn , Kn )
4j−1
ḡj,n (x) :=
+ L̄(x − 4j−3
if x ∈ [ 4j−3
L̄ 2(j−1)
K
Kn )
Kn , Kn )
L̄ 2j n
4j−1 4j
if x ∈ [ Kn , Kn )
Kn
for j = 1, 2, . . . , ⌊Kn ⌋. Again, for each j, we let gj,n denote
the restriction of ḡj,n to [0, 1].
4
The plots of the functions gj,n , j = 0, 1, 2, . . . , ⌊Knγ ⌋ near
(1) Obviously, each gj,n is nondecreasing, and hence each
the origin constructed above are displayed in Figures 1 and 2 fj,n is convex. Furthermore, to show that each function fj,n
for Case 1 and Case 2 respectively. Note that in these plots, is in the Hölder class HLr , we consider the following three
g0,n often obstructs the view of other gj,n ’s, but if j ≥ 1, then cases:
gj,n never obstructs the view of any other function.
(1.1) 0 < |x − y| ≤ K4n . Then, by (i), we have
γ
Finally, in both the cases, for each j = 0, 1, 2, . . . , ⌊Kn ⌋,
′
′
′
′
|fj,n
(x) − fj,n
(y)|
|fj,n
(x) − fj,n
(y)|
define
=
|x − y|1−γ
Z x
γ
|x − y|
|x − y|
1−γ
gj,n (t) dt,
x ∈ [0, 1].
(5)
fj,n (x) :=
4
0
1−γ
≤ 4L̄ ≤ L.
≤ L̄Kn
Kn
We present the following theorem for the above construction,
whose proof is given in Section III-B.
(1.2) 4 < |x − y| ≤ 1γ . Then, by (ii), we have
Kn
Kn
Theorem III.1. Consider the function class CH (r, L) with
−γ
′
′
1
2r+1
|fj,n
(x) − fj,n
(y)|
4
2L̄Kn−γ
−γ
n
≤
≤
2
L̄K
r ∈ (1, 2], L > 0, and γ := r − 1. Let Kn = log n
n
|x − y|γ
|x − y|γ
Kn
and Mn := ⌊Knγ ⌋. Then the functions fj,n , j = 0, 1, . . . , Mn
≤ 2L̄ ≤ L.
constructed in (5) satisfy conditions (C1)–(C3). Specifically,
for all n sufficiently large,
(1.3) K1nγ < |x − y| ≤ 1. By (iii), we obtain
(1) Each fj,n ∈ CH (r, L);
′
′
′
′
|fj,n
(x) − fj,n
(y)|
|fj,n
(x) − fj,n
(y)|
(2) For all j, k ∈ {0, 1, . . . , Mn } with j 6= k, kfj,n −
≤
≤ L.
r
2r+1
γ
|x
−
y|
|x
−
y|
log n
;
fk,n k∞ = 2sn , where sn ≍
n
PMn
This shows that condition (1) holds.
1
(3) Mn j=1 K(Pj , P0 ) ≤ c0 log(Mn ).
(2) Let j, k ∈ {0, 1, . . . , Mn } with j < k without loss of
generality. It follows from the definitions of gj,n and fj,n and
This theorem, together with the similar argument in [27, Figure 1 that
Theorem 2.5], leads to the lower bound of minimax risk of
′
′
(2.1) if j = 0, then fj,n
(x) = fk,n
(x) for all x ∈ [0, 1] except
convex estimation in (4).
on the set
k − 1 k − 1
4 nk − 1
2 o
B. Proof of Theorem III.1
S0k :=
\
;
γ ,
γ +
γ +
Kn
Kn
Kn
Kn
Kn
Proof: We consider the two cases: γ ∈ (0, 1), and γ = 1.
′
′
Case 1: γ ∈ (0, 1). For all n (and Kn ) sufficiently large, (2.2) if j ≥ 1, then fj,n (x) = fk,n (x) for all x ∈ [0, 1] except
on the set
the following properties of gj,n ’s can be easily verified with
j − 1 j − 1
4 nj − 1
2 o
the help of Figure 1: for any x, y ∈ [0, 1], suppose that
\
,
+
+
S
:=
jk
γ
γ
γ
Kn
Kn
Kn
Kn
Kn
(i) 0 < |x − y| ≤ K4n . Then
[ k − 1 k − 1
4 nk − 1
2 o
,
+
+
\
.
|gj,n (x) − gj,n (y)|
Knγ
Knγ
Kn
Knγ
Kn
max
j
|x − y|
Hence, the set of critical points of fj,n − fk,n is [0, 1] \ Sjk .
|g1,n (x) − g1,n (y)| 1−γ
≤
≤
L̄K
.
Furthermore,
in view of piecewise linearity of gj,n ’s, it is easy
n
−1
−1
|x − y|
x=Kn
, y=2Kn
to see that
′
′
(ii) K4n < |x − y| ≤ K1nγ . Then
(a) fj,n
(x) − fk,n
(x) = gj,n (x) − gk,n (x) >
0 for ′all x ∈
−γ
(x)−
(k−1)Kn +1/Kn, (k−1)Kn−γ +2/Kn , and fj,n
max |gj,n (x) − gj,n (y)|
′
−γ
j
f
(x)
<
0
for
all
x
∈
(k
−
1)K
+
2/K
,
(k −
n
n
k,n
1)Kn−γ + 3/Kn ;
≤ |g0,n (x) − g0,n (y)|x=0, y=4K −1 ≤ 2L̄Kn−γ .
n
′
′
(b) for case (2.2), fj,n
(x) − fk,n
(x) < 0 for
all x ′∈ (j −
1
−γ
(iii) Knγ < |x − y| ≤ 1. Without loss of generality, let x < y
1)Kn + 1/Kn , (j − 1)Kn−γ + 2/Kn , and fj,n
(x) −
′
with y = qKn−γ + s(x, y) for some q ∈ N and 0 ≤
fk,n
(x) > 0 for all x ∈ (j − 1)Kn−γ + 2/Kn , (j −
s(x, y) < Kn−γ . It can be shown that
1)Kn−γ + 3/Kn ;
′
′
(c) gj,n
(x) = gk,n
(x) = 0 for all x ∈ [0, 1] \ Sjk except
|gj,n (x) − gj,n (y)|
max
x = (k − 1)Kn−γ + 2/Kn and x = (j − 1)Kn−γ + 2/Kn
j
|x − y|
(if j ≥ 1).
|g1,n (x) − g1,n (y)| ≤
Moreover,
fj,n (x) = fk,n (x) for x = 0, 1. This shows
−γ
−1
−1
|x − y|
x=Kn , y=qKn +3Kn
that
|f
(x)
− fk,n (x)| achieves a local maximum at x∗ =
j,n
2(q + 1)L̄Kn−γ
2(q + 1)L̄Kn−γ
−γ
≤
≤ 4L̄ ≤ L.
≤
(k − 1)Kn + 2/Kn and/or z ∗ := (j − 1)Kn−γ + 2/Kn
qKn−γ + 2Kn−1
qKn−γ
(the latter holds only if j ≥ 1). Due to the symmetry of
Along with these properties, we show that the three condi- non-overlapping regions of gj,n and gk,n , we have kfj,n −
tions hold as follows:
fk,n k∞ = |fj,n (x∗ ) − fk,n (x∗ )| = |fj,n (z ∗ ) − fk,n (z ∗ )|.
5
Furthermore, it can be verified that fj,n (x) = fk,n (x) at
x = (k − 1)Kn−γ . Therefore,
In light of the above results, we have for each j =
1, . . . , Mn ,
kfj,n − fk,n k∞
2 2 = fj,n (k − 1)Kn−γ +
− fk,n (k − 1)Kn−γ +
Kn
Kn 2
= (fj,n − fk,n ) (k − 1)Kn−γ +
Kn
− (fj,n − fk,n ) (k − 1)Kn−γ Z (k−1)Kn−γ + K2 n
=
gj,n (t) − gk,n (t) dt
K(Pj , P0 ) ≤ p∗
−γ
(k−1)Kn
=
Z
2
−γ
(k−1)Kn
+ K1n
−γ
(k−1)Kn
Z
1
Kn
2
=
L̄Kn−r = 2sn ,
0
(3.2) Let hj := (fj,n − f0,n )2 for j = 1, . . . , Mn . Since Xi =
i/n, it follows from analysis of numerical integration and
condition (2) that for each j,
maxx∈[0,1] |h′j (x)|
hj (Xi ) ≤
hj (x)dx −
n
2n
0
kgj,n − g0,n k∞ · kfj,n − f0,n k∞
2L̄2
.
≤
≤
n
nKn1+2γ
Pn
1
i=1
(3.3) The following holds:
Z
1
0
=2
2
f1,n (x) − f0,n (x) dx
Z
2
Kn
0
=2
Z
1
Kn
0
+2
Z
i=1
Mn
c0 γ
1 X
K(Pj , P0 ) ≤
log(n).
Mn j=1
6
r
where sn := L̄Kn−r /2 = L̄/2 · logn n 2r+1 , and thus condition
(2) holds.
(3) To show this condition, we first collect a few results
about fj,n ’s to be used later:
Z 1
2
fj,n (x)−f0,n (x) dx =
(3.1) For each j = 1, 2, . . . , Mn ,
0
Z 1
2
f1,n (x) − f0,n (x) dx.
Z
(f1,n (x) − f0,n (x))2 dx
2
Kn
L̄Kn1−γ x2
2
2
−(1+γ)
L̄Kn
dx
+ L̄Kn−γ x −
1
Kn
2
2 !2
1
L̄Kn1−γ
x−
dx
−
2
Kn
2L̄2
46
1
43
2 −2γ−3
= 2r+1 · .
+
= 2L̄ Kn
20 60
60
Kn
1
Kn
(fj,n (Xi ) − f0,n (Xi ))2
Z 1
2L̄2
(fj,n (x) − f0,n (x))2 dx + 1+2γ
≤ p∗ n
Kn
Z0 1
2L̄2
(f1,n (x) − f0,n (x))2 dx + 1+2γ
≤ p∗ n
Kn
0 2
46
2nL̄
2L̄2
·
≤ p∗
+
Kn2r+1 60 Kn1+2γ
c0 γ
≤
log(n)
6
for all n sufficiently large, where the last inequality follows
from the definition of L̄ and the order of Kn . Consequently,
L̄Kn1−γ t dt = L̄Kn−(1+γ)
=
0
L̄Kn1−γ t − (k − 1)Kn−γ dt
n
X
Finally, since γ ∈ (0, 1), we have, for all n sufficiently large,
!
1
2r+1
n
log Mn ≥ 0.9γ log Kn = 0.9γ log
log n
γ
n
0.9γ
log
≥ log n.
=
2γ + 3
log n
6
This establishes condition (3) and hence completes the proof
for Case 1.
Case 2: γ = 1. We show that the following three conditions
hold in a similar manner as in Case 1:
(1) Clearly, each fj,n is convex on [0, 1]. Further, it is easy
to show via the definition of gj,n and Figure 2 that for any
|g (x)−g (y)|
0 ≤ x < y ≤ 1, j,n |x−y|j,n
≤ L̄ ≤ L for each j =
0, 1, . . . , ⌊Kn ⌋. This thus implies that each fj,n ∈ CH (r, L),
leading to condition (1).
(2) Let 0 ≤ j < k ≤ Mn := ⌊Kn ⌋. It follows from a
similar argument as in (2) of Case
1 that kfj,n − fk,n k∞ is
achieved at x∗ = 4(k − 1) + 2 /Kn and fj,n (x) = fk,n (x)
at x = 4(k − 1)/Kn . Therefore, we have
kfj,n − fk,n k∞
4(k − 1) + 2 4(k − 1) + 2
− fk,n
= fj,n
Kn
Kn
4(k − 1) 4(k − 1) + 2
− (fj,n − fk,n )
= (fj,n − fk,n )
Kn
Kn
Z 4(k−1)+2
Kn
(gj,n (t) − gk,n (t)) dt
=
=2
=2
4(k−1)
Kn
4(k−1)+1
Kn
Z
Z
4(k−1)
Kn
1
Kn
0
4(k − 1)
L̄ t −
dt
Kn
L̄t dt = L̄Kn−2 = 2sn ,
where sn := L̄Kn−2 /2 = L̄/2 ·
condition (2) holds for γ = 1.
log n
n
52
(for r = 2), and thus
6
(3) First of all, it is easy to see that the conditions in (3.1) [9] A. Juditsky and A. Nazin. Information lower bounds for stochastic adaptive tracking problem under nonparametric uncertainty.
and (3.2) in Case 1 remain valid for γ = 1. To show the
Proc. of the 36th IEEE Conf. Decision and Control, pp. 3476–
condition in (3.3) for γ = 1, we have
3477, San Diego, CA, 1997.
Z K2
Z 1
[10] A. Juditsky and A. Nazin. On minimax approach to non2
2
n
f1,n (x) − f0,n (x) dx = 2
f1,n (x) − f0,n (x) dx
parametric adaptive control. International Journal of Adaptive
0
0
Control and Signal Processing, Vol. 15(2), pp. 153–168, 2001.
[11] J. Kiefer. Optimum rates for non-parametric density and re=2
dx
gression estimates under order restrictions. In: Kallianpur, G.,
0
Krishnaiah, P. R., Ghosh, J.K. (Eds.), Statistics and Probability.
!
2 2
Z K2
North-Holland,
Amsterdam, pp. 419–428, 1982.
n
1
1
L̄
L̄
L̄
x−
x−
−
+
dx [12] S. Kullback. A lower bound for discrimination information in
+2
2
1
2Kn
Kn
Kn
2
Kn
terms of variation. IEEE Trans. on Information Theory, Vol. 13,
Kn
pp. 126–127, 1967.
1
43
[13] O. Lepski and A. Tsybakov. Asymptotically exact nonpara= 2L̄2 Kn−5
.
+
20 60
metric hypothesis testing in sup-norm and at a fixed point.
Probability Theory and Related Fields, Vol. 117, pp. 17–48,
By using these results and a similar argument as in (3) of Case
2000.
1, we have for all n sufficiently large, K(Pj , P0 ) ≤ c60 log(n) [14] E. Mammen. Nonparametric regression under qualitative
for each j = 1, . . . , Mn such that
smoothness assumptions. Annals of Statistics, Vol. 19, pp. 741–
759, 1991.
M
n
c0
1 X
[15] A. Nazin and V. Katkovnik. Minimax lower bound for timeK(Pj , P0 ) ≤
log(n).
varying frequency estimation of harmonic signal. IEEE Trans.
Mn j=1
6
on Signal Processing, Vol. 46(12), pp. 3235–3245, 1998.
[16] A. Nemirovski. Topics in Non-parametric Statistics. Lecture on
Again, in view of
Probability Theory and Statistics, Berlin, Germany: Springer 51 !
Verlag, Vol. 1738, Lecture Notes in Mathematics, 2000.
n
log Mn ≥ 0.9 log Kn = 0.9 log
[17] A. Nemirovski, B. Polyak, and A. Tsybakov. Convergence
log n
rate of nonparametric estimates of maximum-likelihood type.
n 1
Problems of Information Transmission, Vol. 21(4), pp. 17–33,
0.9
log
≥ log n
=
1985.
5
log n
6
[18] J. Pal and M. Woodroofe. Large sample properties of shape
restricted regression estimators with smoothness adjustments.
for all n sufficiently large, we obtain condition (3). This
Statistica
Sinica, Vol. 17, pp. 1601–1616, 2007.
completes the proof for Case 2.
[19] C. Rao, J. Rawlings, and D. Mayne. Constrained state estimation for nonlinear discrete-time systems: Stability and moving
IV. C ONCLUDING R EMARKS
horizon approximations. IEEE Trans. on Automatic Control, Vol.
48(2), pp. 246–258, 2003.
This paper has established the minimax lower bound and
optimal rate of convergence for convex estimators under the [20] J. Shen and T.M. Lebair. Shape restricted smoothing splines via
constrained optimal control and nonsmooth Newton’s methods.
sup-norm. The results developed in this paper shed light on
Automatica, Vol. 53, pp. 216–224, 2015.
further research on shape constrained minimax theory, and can [21] J. Shen and X. Wang. Estimation of shape constrained functions
in dynamical systems and its application to genetic networks.
be extended to general derivative constraints.
Proc. of American Control Conf., pp. 5948–5953, 2010.
Acknowledgements. The authors thank the reviewers and
[22] J. Shen and X. Wang. Estimation of monotone functions via P the associate editor for their comments and suggestions.
splines: A constrained dynamical optimization approach. SIAM
J. on Control and Optimization, Vol. 49(2), pp. 646–671, 2011.
R EFERENCES
[23] J. Shen and X. Wang. A constrained optimal control approach
to smoothing splines. Proc. of the 50th IEEE Conf. Decision
[1] P. Bellec and A. Tsybakov. Sharp oracle bounds for monotone
and Control, pp. 1729–1734, Orlando, FL, 2011.
and convex regression through aggregation. arXiv preprint,
[24] J. Shen and X. Wang. Convex regression via penalized splines:
arXiv:1506.08724, 2015.
a complementarity approach. Proc. of 2012 American Control
[2] T. Cover and J. Thomas. Elements of Information Theory. Wiley,
Conference, pp. 332–337, Montreal, Canada, 2012.
2005.
[3] D. Donoho. Asymptotic minimax risk for sup-norm loss: solu- [25] S. Sun, M. Egerstedt, and C. Martin. Control Theoretic Smoothing Splines. IEEE Trans. on Automatic Control, Vol. 45(12), pp.
tion via optimal recovery. Probability Theory and Related Fields,
2271–2279, 2000.
Vol. 99, pp. 145–170, 1994.
[4] M. Egerstedt and C. Martin. Control Theoretic Splines. Princeton [26] C. Stone. Optimal rate of convergence for nonparametric regression. Annals of Statistics, Vol. 10, pp. 1040–1053, 1982.
University Press, 2010.
[5] P. Groeneboom, F. Jongbloed, and J. Wellner. Estimation of [27] A. Tsybakov. Introduction to Nonparametric Estimation.
Springer, 2010.
a convex function: Characterizations and asymptotic theory.
[28] X. Wang and J. Shen. A class of grouped Brunk estimators and
Annals of Statistics, Vol. 29, pp. 1653–1698, 2001.
penalized spline estimators for monotone regression. Biometrika,
[6] A. Guntuboyina and B. Sen. Covering numbers for convex
Vol. 97(3), pp. 585–601, 2010.
functions. IEEE Trans. on Information Theory, Vol. 59(4), 1957–
[29] X. Wang and J. Shen. Uniform convergence and rate adaptive
1965, 2013.
estimation of convex functions via constrained optimization.
[7] A. Guntuboyina and B. Sen. Global risk bounds and adaptation
SIAM Journal on Control and Optimization, Vol. 51(4), pp.
in univariate convex regression. Probability Theory and Related
2753–2787, 2013.
Fields, Vol. 163(1-2), pp. 379–411, 2015.
[8] P. Hanson and G. Pledger. Consistency in concave regression.
Annals of Statistics, Vol. 4, pp.1038–1050, 1976.
Z
1
Kn
2
L̄x
2
2
© Copyright 2026 Paperzz