Bottom-up Estimation and Top-down Prediction in Multi-level Models: Solar Energy Prediction Combining Information from Multiple Sources

Bottom-up Estimation and Top-down
Prediction for Multi-level Models:
Solar Energy Prediction Combining
Information from Multiple Sources
Jae-Kwang Kim
Department of Statistics, Iowa State University
Ross-Royall Symposium: Johns Hopkins University
Feb 26, 2016
1/37
Collaborators
I
Youngdeok Hwang (IBM Research)
I
Siyuan Lu (IBM Research)
2/37
Outline
Overview
I
Introduction
I
Modeling approach
I
Application: Solar Energy Prediction
I
Conclusion
3/37
Mountain Climbing for Problem Solving!
Math Problem
Stat Problem
Real Problem
Math Solution
Stat Solution
Real Solution
We need a map (abstraction) to move from problem to solution!
Overview
4/37
Real Problem: Solar Energy Prediction
I
Introduction
Solar electricity is now projected to supply 14% of total
demand of contiguous U.S. by 2030, and 27% by 2050.
5/37
IBM Solar Forecasting
Figure : Sky Camera for short-term forecasting (located at Watson)
I
Introduction
Research program funded the by the U.S. Department of
Energy’s SunShot Initiative.
6/37
Monitoring Network
I
Global Horizontal Irradiance (GHI): The total amount of
shortwave radiation received from above by a horizontal surface.
I
GHI Measurements are being collected every 15 minutes from
1,528 sensor units.
Introduction
7/37
Weather Models
I
Prediction of GHI from widely-used weather models North
American Mesoscale Forecast System (NAM) and Short-Range
Ensemble Forecast (SREF).
I
We want to combine GHI measurements with the weather model
outcomes to obtain the solar energy prediction.
Introduction
8/37
Statistical Model: Basic setup
Model
I
Population is divided into H exhaustive and
non-overlapping groups, where group h has nh units, for
h = 1, . . . , H.
I
For group h, nh units are selected for measurement.
I
From the i-th unit of group h, the measurements and its
associated covariates, (yhij , xhij ), are available for
j = 1, . . . , nhi .
9/37
Multi-level Model
I
Consider level one and level two model,
I
I
I
yhi
∼ f1 (yhi |xhi ; θhi ),
θhi
∼ f2 (θhi |z hi ; ζh ),
yhi = (yhi1 , . . . , yhinhi )> : observations at unit (hi).
>
>
xhi = (x>
hi1 , . . . , xhinhi ) : covariates associated with unit (hi)
(=two weather model outcomes).
z hi : unit-specific covariate.
I
Note that θhi is a parameter in level 1 model, but a random
variable (latent variable) in level 2 model.
I
We can build a level 3 model on ζh if necessary.
ζh ∼ f3 (ζh | qh ; α).
Model
10/37
Data Structure Under Two-level Model
ζh
f2
Model
f2
f2
θh1
θh2
θh3
f1
f1
f1
yh11
..
.
yh21
..
.
yh31
..
.
yh1n1
yh2n2
yh3n3
11/37
Why Multi-level Models?
1. To reflect the reality: To allow for structural heterogeneity
(=variety in big data) across areas.
2. To borrow strength: we need to predict the locations with
no direct measurement.
Model
12/37
Real Problems Become Statistical Problems!
1. Parameter estimation
2. Prediction
3. Uncertainty quantification
Bayesian method using MCMC computation is a useful tool.
Model
13/37
Classical Solutions Do Not Necessarily Work in
Reality!
1. No single data file exists, as they are stored in cloud
(Hadoop Distributed File System).
2. Micro-level data is not always available to the analyst for
confidentiality and security reasons.
3. Classical solution, based on MCMC algorithm, is time
consuming and the computational cost can be huge for big
data.
This is a typical big data problem.
Solution
14/37
New Solution: Divide-and-Conquer Approach
I
Three steps for parameter estimation in each level
1. Summarization: Find a summary (=measurement) for latent
variable to obtain the sampling error model.
2. Combine: Combine the sampling error model and the latent
variable model.
3. Learning: Estimate the parameters from the summary data.
I
Solution
Apply the three steps in level two model and then do these
in level three model.
15/37
Modeling Structure
Site 1
Sensor
Storage
individual
data
Level 1
Unit summary
Site 2
Sensor
Group
Storage
Level 1
Storage
Level 1
Level 2
Summary
Site 3
Sensor
Solution
16/37
Summarization
Solution
I
Find a measurement for θhi .
I
For each unit, treat (xhi , yhi ) as a single data set to obtain
the best estimator θ̂hi of θhi by treating θhi as a fixed
parameter.
I
Obtain the sampling distribution of θ̂hi as a function of θhi ,
θ̂hi ∼ g1 (θ̂hi | θhi ).
17/37
Summarization Step under Two-Level Model Structure
ζh
f2
θh1
g1
θ̂h1
f2
θh2
g1
θ̂h2
f2
θh3
g1
θ̂h3
g1 (θ̂hi | θhi ): Sampling error model, θ̂hi ∼ N(θhi , V̂ (θ̂hi )).
Solution
18/37
Combining
I
The marginal distribution of θ̂hi is
Z
m2 (θ̂hi | z hi ; ζh ) = g1 (θ̂hi | θhi )f2 (θhi | z hi ; ζh )dθhi .
(1)
which is combining g1 (θ̂hi | θhi ) and f2 (θhi | z hi ; ζh ) via
latent variable θhi .
I
Also, the prediction model for the latent variable θhi is
obtained by using Bayes theorem:
p2 (θhi | θ̂hi ; ζh ) = R
Solution
g1 (θ̂hi | θhi )f2 (θhi | zhi ; ζh )
g1 (θ̂hi | θhi )f2 (θhi | zhi ; ζh )dθhi
(2)
19/37
Combining Step
p2
θhi
p2
g1
ζh
f2
m2
θ̂hi
Sampling error model (g1 )+ Latent variable model (f2 )
⇒ Marginal model (m2 ), Prediction model (p2 )
Solution
20/37
Learning
I
Level two model can be learned by EM algorithm: at t-th
iteration, we update ζh by solving
(t+1)
ζ̂h
← arg max
ζh
nh
X
n
o
(t)
Ep2 log f2 (θhi | z hi ; ζh ) θ̂hi ; ζ̂h
i=1
where the conditional expectation is taken with respect to
(t)
(t)
the prediction model p2 in (2) evaluated at ζ̂h , and ζ̂h
denotes the t-th iteration of the EM algorithm.
Solution
21/37
Learning Using EM Algorithm
E-step
θhi
ζ̂h
M-step
θ̂hi
Solution
Zhi
22/37
Bayesian Interpretation
I
Prediction model (2) can be written as
p2 (θhi | θ̂hi ; ζh ) ∝ g1 (θ̂hi | θhi )f2 (θhi | zhi ; ζh ).
Solution
I
Here, f2 (θhi | zhi ; ζh ) can be treated as a prior distribution
and p2 (θhi | θ̂hi ; ζh ) is a posterior distribution that
incorporates the observation of θ̂hi .
I
Use of g1 (θ̂hi | θhi ) instead of full likelihood simplifies the
computation. (Approximate Bayesian Computation).
23/37
Extension to Three Level Model
Model
Level 1
Level 2
Level 3
Measurement
(Data summary)
yhi = (yhi1 , · · · , yhin )
θ̂h = (θ̂h1 , · · · , θ̂hnh )
ζ̂ = (ζ̂1 , · · · , ζ̂H )
Parameter
Latent variable
θhi
ζh
α
θ = (θh1 , · · · , θhnh )
ζ = (ζ1 , · · · , ζH )
We can apply the same three steps to the level three model.
Solution
24/37
Bottom-up Estimation
Level
3
Latent Variable
Model
Sampling Error
Model
Parameter Estimation
f3 (ζh |qh ; α)
ζ̂h ∼ g2 (ζ̂h |ζh )
α̂ = arg maxα
2
f2 (θhi |zhi ; ζh )
1
f1 (yhij |xhij ; θhi )
θ̂hi ∼ g1 (θ̂hi |θhi )
PH
ζ̂h = arg maxζh
h=1
Pnh
θ̂hi = arg maxθhi
i=1
Pnhi
log
R
g2 (ζ̂h |ζh )f3 (ζh |qh ; α)dζh
R
log g1 (θ̂hi |θhi )f2 (θhi |zhi ; ζh )dθhi
j=1
log f1 (yhij |xhij ; θhi )
Figure : An illustration of the Bottom-up approach to parameter
estimation
Solution
25/37
Prediction
I
Our goal is to predict unobserved yhij values from the
above models using the parameter estimates.
I
The best prediction for yhij is
h
n
o
i
∗
ŷhij
= Ep3 Ep2 Ef1 (yhij | xhij , θhi ) | θ̂hi ; ζh | ζ̂h ; α̂ ,
where
p3 (ζh | ζ̂h , α̂) = R
g2 (ζ̂h | ζh )f3 (ζh | qh ; α̂)
g2 (ζ̂h | ζh )f3 (ζh | qh ; α̂)dζh
and
p2 (θhi | θ̂hi , ζh ) = R
I
Solution
g1 (θ̂hi | θhi )f2 (θhi | z hi ; ζh )
g1 (θ̂hi | θhi )f2 (θhi | z hi ; ζh )dθhi
.
The prediction is made in a top-down manner.
26/37
Prediction: Top-down Prediction
α̂
p3
ζ1∗
p2
∗
θ1i
p3
ζ2∗
p2
∗
θ2i
p3
ζ3∗
p2
∗
θ3i
∗ ).
Predict yhij using f1 (yhij | xhij ; θhi
Solution
27/37
Prediction: Top-down Prediction
Level
Latent
Prediction Model
Best Prediction
3
ζh
p3 (ζh | ζ̂h ; α̂)
ζh∗ ∼ p3 (ζh | ζ̂h ; α̂)
2
θhi
p2 (θhi | θ̂hi ; ζh )
∗
θhi
∼ p2 (θhi | θ̂hi ; ζh∗ )
1
yhij
f1 (yhij | xhij ; θhi )
∗
∗
yhij
∼ f1 (yhij |xhij , θhi
)
Figure : Top-down approach to prediction
Solution
28/37
Case study: Application to Solar Energy Prediction
I
We use 15-day long (12/01/2014 – 12/15/2014) data for
analysis.
I
Organized the states into 12 groups.
I
The number of sites in each group, mh , varies between 37
and 321.
Application
29/37
Grouping Scheme
I
Pooling data from nearby sites.
I
Can incorporate complex structure such as distribution
zone.
Application
30/37
Application: Site Level
I
First assume that
yhij
= xhij θhi + hij ,
hij
2
∼ t(0, σhi
, νhi ),
2 is scale parameter and ν is degree of freedom
where σhi
hi
and
θ̂hi | θhi ∼ N(θhi , V hi ),
where V hi = V (θ̂hi ).
I
Application
The degree of freedom is assumed to be unknown and
estimated by the method of Lange et al. (1989).
31/37
Three Level Model
I
Assume level 2 model
θhi ∼ N(βh , Σh ),
and ζh = (βh , Σh )
I
Similarly, level 3 model is
ζh ∼ N(µ, Σ),
and α = (µ, Σ).
Application
32/37
Comparison
I
We compared the performance of the multi-level approach
with three other modeling methods:
I
I
I
I
Application
Site-by-site model: fit a different model for each individual
site
Group-by-group model: fit a different model for each group
One global model: fit a single common model for all sites
using the aggregate data
To evaluate the prediction accuracy, we randomly selected
the 70% of the data to fit the model and tested on the
remaining 30%.
33/37
MSPE Comparison
I
We compare the accuracy
by Mean Squared Prediction
P
Error (MSPE), NT−1 (yhij − ŷhij )2 , where ŷhij are obtained
from four different methods and NT is the size of the test
data set.
MSPE
SD
Multi level
0.297
0.601
Site model
0.298
0.609
Group model
0.406
0.803
Global model
0.383
0.791
Table : Accuracy comparison of the different modeling methods
Application
34/37
Comparison in Detail (nhi ≤ 100 vs > 100)
Mean Squared Error
1.5
Method
1.0
Multilevel
Site Model
Group Model
Global Model
0.5
0.0
<100
>100
Sample Size
Application
35/37
Discussion
I
Motivated from a real problem: A solar energy forecasting
system has been developed.
I
We used a multi-level model approach to address the
practical issues.
There are more issues to be investigated.
I
I
I
I
I
I
Application
Spatial modeling
Estimation of group structure
Preferential sampling of sites
...
The proposed method is promising for handling big data.
36/37
Application
37/37