Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Coordinate descent algorithms for nonconvex
optimization
Patrick Breheny
University of Kentucky
March 23, 2010
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
High-dimensional studies
Studies in modern biology and medicine often obtain a large
number of potential predictors for each individual:
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
High-dimensional studies
Studies in modern biology and medicine often obtain a large
number of potential predictors for each individual:
Gene expression studies record the abundance of tens of
thousands of gene transcripts
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
High-dimensional studies
Studies in modern biology and medicine often obtain a large
number of potential predictors for each individual:
Gene expression studies record the abundance of tens of
thousands of gene transcripts
Genetic association studies determine an individual’s genotype
at hundreds of thousands of genetic markers
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
High-dimensional studies
Studies in modern biology and medicine often obtain a large
number of potential predictors for each individual:
Gene expression studies record the abundance of tens of
thousands of gene transcripts
Genetic association studies determine an individual’s genotype
at hundreds of thousands of genetic markers
Proteomics, metabolomics, etc. are similar
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Penalized regression
To study the joint effect of these biological features on a
clinical outcome, or to use the features to predict outcomes in
future subjects, we need some sort of model that can
overcome the curse of dimensionality
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Penalized regression
To study the joint effect of these biological features on a
clinical outcome, or to use the features to predict outcomes in
future subjects, we need some sort of model that can
overcome the curse of dimensionality
Penalized regression models, which impose a
constraint/penalty/prior on the regression coefficients, have
proven successful in such applications
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Penalized regression
To study the joint effect of these biological features on a
clinical outcome, or to use the features to predict outcomes in
future subjects, we need some sort of model that can
overcome the curse of dimensionality
Penalized regression models, which impose a
constraint/penalty/prior on the regression coefficients, have
proven successful in such applications
In such models, we seek to minimize
Q(β) = L(X, y, β) +
p
X
Pλ (βj ),
j=1
where L(X, y, β) is the usual loss function of the regression
model, P (·) is a penalty function, and λ is a regularization
parameter that controls the balance between fit and penalty
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Penalties
lasso
MCP
SCAD
P
P'
λ
0
0
0
β
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Penalties
lasso
MCP
SCAD
P
P'
λ
0
− γλ
−λ 0 λ
γλ
0
λ
γλ
β
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Penalties
lasso
MCP
SCAD
P
P'
λ
0
− γλ
−λ 0 λ
γλ
0
λ
γλ
β
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Univariate solutions
The effect of these penalties is most intuitive in the univariate
case, where solutions have a closed form
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Univariate solutions
The effect of these penalties is most intuitive in the univariate
case, where solutions have a closed form
Letting z = x0 y/n denote the simple linear regression
solution, the lasso solution is S(z, λ), where S is the
soft-thresholding operator (Donoho and Johnstone 1994):
z − λ if z > λ
S(z, λ) = 0
if |z| ≤ λ
z + λ if z < −λ.
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
MCP and SCAD solutions
For MCP,
( S(z,λ)
β̂ =
1−1/γ
if |z| ≤ γλ
z
if |z| > γλ,
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
MCP and SCAD solutions
For MCP,
( S(z,λ)
β̂ =
1−1/γ
if |z| ≤ γλ
z
if |z| > γλ,
For SCAD,
β̂ =
S(z, λ)
if |z| ≤ 2λ
if 2λ < |z| ≤ γλ
z
if |z| > γλ.
S(z,γλ/(γ−1))
1−1/(γ−1)
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
MCP and SCAD solutions
For MCP,
( S(z,λ)
β̂ =
1−1/γ
if |z| ≤ γλ
z
if |z| > γλ,
For SCAD,
β̂ =
S(z, λ)
if |z| ≤ 2λ
if 2λ < |z| ≤ γλ
z
if |z| > γλ.
S(z,γλ/(γ−1))
1−1/(γ−1)
Note that, for MCP and SCAD, β̂ is equal to the ordinary
least squares estimator (and therefore unbiased) when z > γλ
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
MCP and SCAD solutions
For MCP,
( S(z,λ)
β̂ =
1−1/γ
if |z| ≤ γλ
z
if |z| > γλ,
For SCAD,
β̂ =
S(z, λ)
if |z| ≤ 2λ
if 2λ < |z| ≤ γλ
z
if |z| > γλ.
S(z,γλ/(γ−1))
1−1/(γ−1)
Note that, for MCP and SCAD, β̂ is equal to the ordinary
least squares estimator (and therefore unbiased) when z > γλ
This leads to attractive asymptotic properties for both
methods
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Optimization
In higher dimensions, however, closed form solutions are
unavailable, so optimization algorithms must be developed
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Optimization
In higher dimensions, however, closed form solutions are
unavailable, so optimization algorithms must be developed
For the lasso, the LARS algorithm (Efron 2004) is a
remarkably efficient method
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Optimization
In higher dimensions, however, closed form solutions are
unavailable, so optimization algorithms must be developed
For the lasso, the LARS algorithm (Efron 2004) is a
remarkably efficient method
For MCP and SCAD, optimization is complicated by the fact
that neither penalty is convex
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Optimization
In higher dimensions, however, closed form solutions are
unavailable, so optimization algorithms must be developed
For the lasso, the LARS algorithm (Efron 2004) is a
remarkably efficient method
For MCP and SCAD, optimization is complicated by the fact
that neither penalty is convex
One approach that has been proposed for nonconvex penalties
(Zou and Li 2008) is to make a local linear approximation
(LLA) to the penalty, thereby yielding an objective function
that can be optimized using LARS
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Coordinate descent algorithms
More recently, coordinate descent algorithms have been shown
to be competitive with the LARS algorithm, particularly in
high dimensions (Friedman et al. 2007; Wu and Lange 2008)
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Coordinate descent algorithms
More recently, coordinate descent algorithms have been shown
to be competitive with the LARS algorithm, particularly in
high dimensions (Friedman et al. 2007; Wu and Lange 2008)
Today’s talk will:
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Coordinate descent algorithms
More recently, coordinate descent algorithms have been shown
to be competitive with the LARS algorithm, particularly in
high dimensions (Friedman et al. 2007; Wu and Lange 2008)
Today’s talk will:
Discuss the development of coordinate descent (CD)
algorithms to fit MCP and SCAD models
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Coordinate descent algorithms
More recently, coordinate descent algorithms have been shown
to be competitive with the LARS algorithm, particularly in
high dimensions (Friedman et al. 2007; Wu and Lange 2008)
Today’s talk will:
Discuss the development of coordinate descent (CD)
algorithms to fit MCP and SCAD models
Investigate convergence and convexity for these algorithms
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Coordinate descent algorithms
More recently, coordinate descent algorithms have been shown
to be competitive with the LARS algorithm, particularly in
high dimensions (Friedman et al. 2007; Wu and Lange 2008)
Today’s talk will:
Discuss the development of coordinate descent (CD)
algorithms to fit MCP and SCAD models
Investigate convergence and convexity for these algorithms
Compare the efficiency of the LLA and CD algorithms
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Coordinate descent algorithms
More recently, coordinate descent algorithms have been shown
to be competitive with the LARS algorithm, particularly in
high dimensions (Friedman et al. 2007; Wu and Lange 2008)
Today’s talk will:
Discuss the development of coordinate descent (CD)
algorithms to fit MCP and SCAD models
Investigate convergence and convexity for these algorithms
Compare the efficiency of the LLA and CD algorithms
Explore the statistical properties of SCAD, MCP, and lasso via
simulation and real data
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Coordinate descent approach
Coordinate descent algorithms optimize a target functon with
respect to a single parameter at a time, cycling through all
parameters until convergence is reached
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Coordinate descent approach
Coordinate descent algorithms optimize a target functon with
respect to a single parameter at a time, cycling through all
parameters until convergence is reached
The key idea is to use the closed form univariate solutions,
but regress xj ’s partial residuals r−j = y − X−j β −j on xj to
obtain coordinate-wise minima
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Coordinate descent approach
Coordinate descent algorithms optimize a target functon with
respect to a single parameter at a time, cycling through all
parameters until convergence is reached
The key idea is to use the closed form univariate solutions,
but regress xj ’s partial residuals r−j = y − X−j β −j on xj to
obtain coordinate-wise minima
The algorithm:
(1) Calculate
z = n−1 x0j r−j
(m)
= n−1 x0j r + βj
(m+1)
(2) Update βj
.
with the univariate solution β̂
(m+1)
(3) Update r ← r − (βj
Patrick Breheny
(m)
− βj
)xj ,
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Efficiency
Time to fit the entire path of solutions (100 replications):
CD
LLA
22
23
24
25
Linear
ρ=0
Linear
ρ = 0.9
Logistic
ρ=0
Logistic
ρ = 0.9
26
27
Median time (s)
25
20
2−5
25
20
2−5
22
23
24
25
26
27
Number of covariates
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Convergence of the coordinate descent algorithms
Proposition
Let {β (m) } denote the sequence of coefficients produced at each
iteration of the coordinate descent algorithm for SCAD and MCP.
For all m = 0, 1, 2, . . . ,
Q(β (m+1) ) ≤ β (m) .
The sequence is therefore guaranteed to converge to a stationary
point β ∗ of the algorithm. Furthermore, any stationary point β ∗ is
both a local minimum and a global coordinate-wise minimum of Q.
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Convexity
For many reasons (speed of convergence, convergence to a
global minimum instead of a local minimum, less variability)
convex objective functions are desirable
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Convexity
For many reasons (speed of convergence, convergence to a
global minimum instead of a local minimum, less variability)
convex objective functions are desirable
Because their penalty functions are nonconvex, the objective
functions of SCAD and MCP are not necessarily convex
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Convexity
For many reasons (speed of convergence, convergence to a
global minimum instead of a local minimum, less variability)
convex objective functions are desirable
Because their penalty functions are nonconvex, the objective
functions of SCAD and MCP are not necessarily convex
However, they are not necessarily nonconvex either – the
convexity of the loss function may overcome the nonconvexity
of the penalty to produce a convex objective function
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Convexity
For many reasons (speed of convergence, convergence to a
global minimum instead of a local minimum, less variability)
convex objective functions are desirable
Because their penalty functions are nonconvex, the objective
functions of SCAD and MCP are not necessarily convex
However, they are not necessarily nonconvex either – the
convexity of the loss function may overcome the nonconvexity
of the penalty to produce a convex objective function
In fact, this will happen if the maximum concavity of the
penalty does not exceed the minimum eigenvalue of X0 X
(Zhang 2007)
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Convexity
For many reasons (speed of convergence, convergence to a
global minimum instead of a local minimum, less variability)
convex objective functions are desirable
Because their penalty functions are nonconvex, the objective
functions of SCAD and MCP are not necessarily convex
However, they are not necessarily nonconvex either – the
convexity of the loss function may overcome the nonconvexity
of the penalty to produce a convex objective function
In fact, this will happen if the maximum concavity of the
penalty does not exceed the minimum eigenvalue of X0 X
(Zhang 2007)
The efficiency simulations presented earlier were restricted to
this scenario – in them, the LLA and CD algorithms
converged to the same path of solutions (the global
minimum) in all 100 replications
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Local convexity
For p > n, the minimum eigenvalue of X0 X is zero; global
convexity is not possible for MCP and SCAD
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Local convexity
For p > n, the minimum eigenvalue of X0 X is zero; global
convexity is not possible for MCP and SCAD
However, global convexity is not necessary in high-dimensional
settings where the number of nonzero coefficients is much less
than p
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Local convexity
For p > n, the minimum eigenvalue of X0 X is zero; global
convexity is not possible for MCP and SCAD
However, global convexity is not necessary in high-dimensional
settings where the number of nonzero coefficients is much less
than p
In such settings, we propose a diagnostic that measures
whether the objective function is convex is the local region of
the parameter space that contains these sparse solutions
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Local convexity
For p > n, the minimum eigenvalue of X0 X is zero; global
convexity is not possible for MCP and SCAD
However, global convexity is not necessary in high-dimensional
settings where the number of nonzero coefficients is much less
than p
In such settings, we propose a diagnostic that measures
whether the objective function is convex is the local region of
the parameter space that contains these sparse solutions
These diagnostics are based on the minimum eigenvalue of
X0A XA , where XA is a modified design matrix containing only
those columns for which βj 6= 0
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Local convexity
For p > n, the minimum eigenvalue of X0 X is zero; global
convexity is not possible for MCP and SCAD
However, global convexity is not necessary in high-dimensional
settings where the number of nonzero coefficients is much less
than p
In such settings, we propose a diagnostic that measures
whether the objective function is convex is the local region of
the parameter space that contains these sparse solutions
These diagnostics are based on the minimum eigenvalue of
X0A XA , where XA is a modified design matrix containing only
those columns for which βj 6= 0
These ideas can be extended to logistic regression also,
although I will skip the details
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
0.5
−1.0
^
β
−1.0 0.5
^
β
2.0
Local convexity diagnostics
1.5
1.0
0.5
λ
Patrick Breheny
1.2
0.8
0.4
λ
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Simulation results: Estimation
Lasso
MCP
0.0
0.5
SCAD
1.0
1.5
2.0
Linear
30
Linear
90
Linear
500
Logistic
30
Logistic
90
Logistic
500
1.5
Relative MSE
1.0
0.5
0.0
1.5
1.0
0.5
0.0
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
β
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Simulation results: Prediction
Lasso
MCP
0.0
0.5
SCAD
1.0
1.5
2.0
Linear
30
Linear
90
Linear
500
Logistic
30
Logistic
90
Logistic
500
1.5
Relative MSPE
1.0
0.5
0.0
1.5
1.0
0.5
0.0
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
β
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Simulation results: Variable selection
Lasso
MCP
False Discovery Rate
0.0
0.5
SCAD
1.0
1.5
2.0
Linear
30
Linear
90
Linear
500
Logistic
30
Logistic
90
Logistic
500
1.0
0.8
0.6
0.4
1.0
0.8
0.6
0.4
0.0
0.5
1.0
1.5
2.0
0.0
0.5
1.0
1.5
2.0
β
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Genetic association data
We also applied these methods to a case/control genetic
association study of age-related macular degeneration
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Genetic association data
We also applied these methods to a case/control genetic
association study of age-related macular degeneration
The data set consisted of 800 subjects (400 cases, 400
control) and genotype calls for 532 genetic markers
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Genetic association data
We also applied these methods to a case/control genetic
association study of age-related macular degeneration
The data set consisted of 800 subjects (400 cases, 400
control) and genotype calls for 532 genetic markers
Obtaining a path of solutions takes ≈ 17 minutes with LLA;
≈ 2.5 seconds with coordinate descent
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Genetic association data
We also applied these methods to a case/control genetic
association study of age-related macular degeneration
The data set consisted of 800 subjects (400 cases, 400
control) and genotype calls for 532 genetic markers
Obtaining a path of solutions takes ≈ 17 minutes with LLA;
≈ 2.5 seconds with coordinate descent
Results:
Penalty
Lasso
MCP
SCAD
Model size
103
7
25
Patrick Breheny
CV error
40.9%
39.4%
40.6%
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Conclusions
There are settings in which nonconvex penalties clearly
outperform the lasso
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Conclusions
There are settings in which nonconvex penalties clearly
outperform the lasso
People have been reluctant to embrace nonconvex penalties
due to:
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Conclusions
There are settings in which nonconvex penalties clearly
outperform the lasso
People have been reluctant to embrace nonconvex penalties
due to:
Concerns about local minima
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Conclusions
There are settings in which nonconvex penalties clearly
outperform the lasso
People have been reluctant to embrace nonconvex penalties
due to:
Concerns about local minima
A lack of efficient algorithms
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Conclusions
There are settings in which nonconvex penalties clearly
outperform the lasso
People have been reluctant to embrace nonconvex penalties
due to:
Concerns about local minima
A lack of efficient algorithms
A lack of publicly available software
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Conclusions (cont’d)
This work is an attempt to resolve some of those concerns, by:
Applying coordinate descent algorithms to nonconvex
optimization problems
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Conclusions (cont’d)
This work is an attempt to resolve some of those concerns, by:
Applying coordinate descent algorithms to nonconvex
optimization problems
Studying the convexity of these optimization problems and
providing a practical diagnostic for locally convex solutions
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
Introduction
Coordinate descent algorithms for SCAD and MCP
Convexity and convergence
Statistical properties
Simulation
Conclusions (cont’d)
This work is an attempt to resolve some of those concerns, by:
Applying coordinate descent algorithms to nonconvex
optimization problems
Studying the convexity of these optimization problems and
providing a practical diagnostic for locally convex solutions
Providing a publicly available R package for computing MCP
and SCAD solutions: ncvreg
Patrick Breheny
Coordinate descent algorithms for nonconvex optimization
© Copyright 2026 Paperzz