Big data techniques applied to very short -term wind power

Big Data Techniques Applied to Very Short-term
Wind Power Forecasting
Ricardo Bessa
Senior Researcher ([email protected])
Center for Power and Energy Systems, INESC TEC, Portugal
Joint work with Laura Cavalcante and Marisa Reis
EWEA Technology Workshop: Wind Power Forecasting 2015
1-2 October 2015, Leuven, Belgium
Introduction
Statistical Framework
Case Study and Numerical Results
Introduction
Vector Autogression (VAR) models can be applied to combine
wind power time series distributed in space
Two important requirements for a practical implementation
Reduce the number of non-null coefficients
Low computational time in large datasets
This work provides the following original contributions
Explores a set of sparse structures for the VAR model
Applies the alternating direction method of multipliers
(ADMM) to estimate the VAR coefficients
Explores parallel computing
2 / 17
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Linear Time Series Models
Lasso-VAR Model and variants
Solving Lasso-VAR with ADMM algorithm
Autoregressive Model
Univariate model: uses past observations from the same time
series
AR(p) - Autoregressive Model of order p
→ forecasts the variable yt given the past p values
yt = c + b1 yt−1 + b2 yt−2 + · · · + +bp yt−p + εt
VAR(p) - Vector Autoregressive Model of order p
→ forecasts the vector of k variables
Yt = (Y1,t , Y2,t , . . . , Yk,t )
Yt = c + B1 Yt−1 + B2 Yt−2 + · · · + +Bp Yt−p + ut
3 / 17
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Linear Time Series Models
Lasso-VAR Model and variants
Solving Lasso-VAR with ADMM algorithm
Least Absolute Shrinkage and Selection Operator
(LASSO)-VAR Model
The Lasso-VAR estimation minimizes the residual sum of
squares subject to an L1 constraint
1
2
kY − BZ kF s.t. kBk1 ≤ t
2
Equivalently, it can be defined in the Lagrangian form as
P
1
kY − BZ k2F + λ kBk1 ,
2
P
P
where kX kp = ( ni=1 |xi |p )1/p , kX k2F = mi=1 nj=1 |xij |2 is the
Frobenius norm and the regularization parameter λ ≥ 0 is
inverse related to t
Fits the regression model and simultaneously performs variable
selection by shrinking regression coefficients to zero
4 / 17
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Linear Time Series Models
Lasso-VAR Model and variants
Solving Lasso-VAR with ADMM algorithm
Lasso-VAR Model: Extensions and Generalizations
Lasso
Extensions
Penalty
Row Lasso
λ B i 1
Matricial Lasso
λ kBk1
Lag Lasso
Group Lasso
Sparse Group
Lasso
5 / 17
λ
Illustration
Pp
l=1 kBl k1
λ
P
i 6=j
k(B1 )ij . . . (Bp )ij k2
P
(1 − α)λ pl=1 kBl kF
+αλ kBk1
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Linear Time Series Models
Lasso-VAR Model and variants
Solving Lasso-VAR with ADMM algorithm
Parameter Estimation and the ADMM Algorithm
The goal is to estimate the sparse matrix of coefficients with a
simple and powerful algorithm
ADMM framework has several advantages
Combines the problem separability offered by the dual ascent
method with the convergence properties of the method of
multipliers
Convex problems with nondifferentiable constraints (as
LASSO) can be easily addressed
Parallel Optimization: break up large datasets into blocks and
carry out the optimization over each block
6 / 17
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Linear Time Series Models
Lasso-VAR Model and variants
Solving Lasso-VAR with ADMM algorithm
ADMM Algorithm
Lasso-VAR:
minimize
1
2
kY − BZ k2F + λ kBk1
ADMM problem form:
1
kY − BZ k2F + λ kHk1
minimize
2
|
{z
} | {z }
f (B)
s.t.
B −H =0
f (H)
Augmented Lagrangian
Lρ (B, H, W ) =
7 / 17
ρ
1
kY − BZ k2F +λ kHk1 +W T (B−H)+ kB − Hk2F
2
2
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Linear Time Series Models
Lasso-VAR Model and variants
Solving Lasso-VAR with ADMM algorithm
Parallel Computing
The goal is to split data and use ADMM to solve the problem
in a distributed manner (with N objective terms)


Z
 1








8 / 17


. . . ZN 



Z2
Z1
Z2
..
.
ZN
→
Split data across features and use
ADMM sharing problem
→
Split data across examples and use
ADMM consensus optimization






Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Linear Time Series Models
Lasso-VAR Model and variants
Solving Lasso-VAR with ADMM algorithm
ADMM and Parallel Computing
Splitting Across Examples
min
PN
i =1
1/2 kYi − Bi Zi k2F + λ kBi k1
{z
} | {z }
|
fi (Bi )
min
PN
s.t
Bi − H = 0
k+1
Bi
H
k+1
k+1
Ui
i =1 fi (Bi )
g(Bi )
+ g (H)
Splitting Across Features
2 P
P
N
min 1/2 Y − N
i =1 λ kBi k1
i =1 Bi Zi +
F
| {z }
{z
}
|
g(
s.t
Bi Zi − Hi = 0
k+1
Nρ k+1
k 2
−U := arg min g(H) +
H − B
F
H
2
Hi
k+1
9 / 17
−H
k+1
k+1
U
Ricardo Bessa
i =1 fi (Bi )
k+1
fi (Bi )
Bi Zi )
PN
Bi
k
i =1
min
ρ k
k 2
:= arg min fi (Bi ) +
Bi − H + Ui Bi
F
2
:= Ui + Bi
PN
P
+ g( N
i =1 Hi )
ρ k
k 2
:= arg min fi (Bi ) +
Bi Zi − Hi + Ui F
Bi
2
N P
ρ X
k
k+1 2
Zi := arg min g( N
Hi − Ui − Bi
i =1 Hi ) +
F
H
2 i =1
k
k+1
:= U + Bi
k+1
Zi − Hi
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Description
Numerical Results
Conclusions
Case Study description
Apply ADMM algorithm to several LASSO-VAR(2) variants in
order to produce wind power forecasts from 1 to 6 hours ahead
Dataset
68 wind farms (same control area)
Training period: 9 months
Test period: 3 months
Time resolution: 1 hour
LASSO and ADMM parameters estimated by 5-fold
cross-validation
Calculate the improvement in terms of Root Mean Squared
Error (RMSE) compared to an Autoregression model - AR(2)
10 / 17
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Description
Numerical Results
Conclusions
RMSE Improvement over AR results
Wind Farm with best improvement
Row L−V
Matricial L−V
Lag L−V
Group L−V
Sparse L−V
No Sparsity
13
Improvement over AR (%)
12
11
10
9
8
7
1
11 / 17
2
3
4
Time Horizon (h)
Ricardo Bessa
5
6
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Description
Numerical Results
Conclusions
RMSE Improvement over AR result
Wind Farm with intermediate improvement
Row L−V
Matricial L−V
Lag L−V
Group L−V
Sparse L−V
No Sparsity
Improvement over AR (%)
9
8
7
6
5
4
1
12 / 17
2
3
4
Time Horizon (h)
Ricardo Bessa
5
6
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Description
Numerical Results
Conclusions
RMSE Improvement over AR result
Wind Farm with worst improvement
Row L−V
Matricial L−V
Lag L−V
Group L−V
Sparse L−V
No Sparsity
Improvement over AR (%)
2
0
−2
−4
−6
−8
1
2
3
4
Time Horizon (h)
5
6
No of wind farms with negative imp. (average over the time horizon): 3
No of wind farms with negative imp. in at least one lead-time: 13
Group LASSO does not have negative imp. in the first two lead-times
13 / 17
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Description
Numerical Results
Conclusions
RMSE Improvement over AR result
Global
Row L−V
Matricial L−V
Lag L−V
Group L−V
Sparse L−V
No Sparsity
Improvement over AR (%)
7
6
5
4
3
2
1
14 / 17
2
3
4
Time Horizon (h)
Ricardo Bessa
5
6
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Description
Numerical Results
Conclusions
Running Time
Lasso
Extensions
Row Lasso
Matricial Lasso
Lag Lasso
Group Lasso
Sparse Lasso
Not
distributed
5.3
1.6
1.1
7.8
11
Distributed
over Examples
1.6
0.5
0.4
1.1
5.5
Table: Time (in sec) to run data divided by a i7 8-cores processor
The same tolerance (1e-3) was used for the ADMM
The error results for each LASSO extension are very similar
15 / 17
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Description
Numerical Results
Conclusions
Final Remarks and Future Work
The adequate choice of a sparse structure can improve the
forecast skill of the VAR model
The case-study results indicate that
Information from selected distributed time series can improve
the forecast error compared to an AR model
The Group LASSO-VAR model achieves the highest global
improvement and the Lag LASSO-VAR model provides the
lowest improvement (mainly for the first lead times)
Future Work
Explore more complex sparse structures
Extend the statistical model to the probabilistic forecast
framework
Apply this framework to other smart grid related problems
16 / 17
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting
Introduction
Statistical Framework
Case Study and Numerical Results
Description
Numerical Results
Conclusions
Acknowledgements
This work was made in the framework of the SusCity project
(“MITP-TB/CS/0026/2013”) financed by national funds through
Fundação para a Ciência e a Tecnologia (FCT), Portugal.
17 / 17
Ricardo Bessa
Big Data Techniques Applied to Wind Power Forecasting