Bottom-up Estimation and Top-down Prediction for Multi-level Models: Solar Energy Prediction Combining Information from Multiple Sources Jae-Kwang Kim Department of Statistics, Iowa State University Ross-Royall Symposium: Johns Hopkins University Feb 26, 2016 1/37 Collaborators I Youngdeok Hwang (IBM Research) I Siyuan Lu (IBM Research) 2/37 Outline Overview I Introduction I Modeling approach I Application: Solar Energy Prediction I Conclusion 3/37 Mountain Climbing for Problem Solving! Math Problem Stat Problem Real Problem Math Solution Stat Solution Real Solution We need a map (abstraction) to move from problem to solution! Overview 4/37 Real Problem: Solar Energy Prediction I Introduction Solar electricity is now projected to supply 14% of total demand of contiguous U.S. by 2030, and 27% by 2050. 5/37 IBM Solar Forecasting Figure : Sky Camera for short-term forecasting (located at Watson) I Introduction Research program funded the by the U.S. Department of Energy’s SunShot Initiative. 6/37 Monitoring Network I Global Horizontal Irradiance (GHI): The total amount of shortwave radiation received from above by a horizontal surface. I GHI Measurements are being collected every 15 minutes from 1,528 sensor units. Introduction 7/37 Weather Models I Prediction of GHI from widely-used weather models North American Mesoscale Forecast System (NAM) and Short-Range Ensemble Forecast (SREF). I We want to combine GHI measurements with the weather model outcomes to obtain the solar energy prediction. Introduction 8/37 Statistical Model: Basic setup Model I Population is divided into H exhaustive and non-overlapping groups, where group h has nh units, for h = 1, . . . , H. I For group h, nh units are selected for measurement. I From the i-th unit of group h, the measurements and its associated covariates, (yhij , xhij ), are available for j = 1, . . . , nhi . 9/37 Multi-level Model I Consider level one and level two model, I I I yhi ∼ f1 (yhi |xhi ; θhi ), θhi ∼ f2 (θhi |z hi ; ζh ), yhi = (yhi1 , . . . , yhinhi )> : observations at unit (hi). > > xhi = (x> hi1 , . . . , xhinhi ) : covariates associated with unit (hi) (=two weather model outcomes). z hi : unit-specific covariate. I Note that θhi is a parameter in level 1 model, but a random variable (latent variable) in level 2 model. I We can build a level 3 model on ζh if necessary. ζh ∼ f3 (ζh | qh ; α). Model 10/37 Data Structure Under Two-level Model ζh f2 Model f2 f2 θh1 θh2 θh3 f1 f1 f1 yh11 .. . yh21 .. . yh31 .. . yh1n1 yh2n2 yh3n3 11/37 Why Multi-level Models? 1. To reflect the reality: To allow for structural heterogeneity (=variety in big data) across areas. 2. To borrow strength: we need to predict the locations with no direct measurement. Model 12/37 Real Problems Become Statistical Problems! 1. Parameter estimation 2. Prediction 3. Uncertainty quantification Bayesian method using MCMC computation is a useful tool. Model 13/37 Classical Solutions Do Not Necessarily Work in Reality! 1. No single data file exists, as they are stored in cloud (Hadoop Distributed File System). 2. Micro-level data is not always available to the analyst for confidentiality and security reasons. 3. Classical solution, based on MCMC algorithm, is time consuming and the computational cost can be huge for big data. This is a typical big data problem. Solution 14/37 New Solution: Divide-and-Conquer Approach I Three steps for parameter estimation in each level 1. Summarization: Find a summary (=measurement) for latent variable to obtain the sampling error model. 2. Combine: Combine the sampling error model and the latent variable model. 3. Learning: Estimate the parameters from the summary data. I Solution Apply the three steps in level two model and then do these in level three model. 15/37 Modeling Structure Site 1 Sensor Storage individual data Level 1 Unit summary Site 2 Sensor Group Storage Level 1 Storage Level 1 Level 2 Summary Site 3 Sensor Solution 16/37 Summarization Solution I Find a measurement for θhi . I For each unit, treat (xhi , yhi ) as a single data set to obtain the best estimator θ̂hi of θhi by treating θhi as a fixed parameter. I Obtain the sampling distribution of θ̂hi as a function of θhi , θ̂hi ∼ g1 (θ̂hi | θhi ). 17/37 Summarization Step under Two-Level Model Structure ζh f2 θh1 g1 θ̂h1 f2 θh2 g1 θ̂h2 f2 θh3 g1 θ̂h3 g1 (θ̂hi | θhi ): Sampling error model, θ̂hi ∼ N(θhi , V̂ (θ̂hi )). Solution 18/37 Combining I The marginal distribution of θ̂hi is Z m2 (θ̂hi | z hi ; ζh ) = g1 (θ̂hi | θhi )f2 (θhi | z hi ; ζh )dθhi . (1) which is combining g1 (θ̂hi | θhi ) and f2 (θhi | z hi ; ζh ) via latent variable θhi . I Also, the prediction model for the latent variable θhi is obtained by using Bayes theorem: p2 (θhi | θ̂hi ; ζh ) = R Solution g1 (θ̂hi | θhi )f2 (θhi | zhi ; ζh ) g1 (θ̂hi | θhi )f2 (θhi | zhi ; ζh )dθhi (2) 19/37 Combining Step p2 θhi p2 g1 ζh f2 m2 θ̂hi Sampling error model (g1 )+ Latent variable model (f2 ) ⇒ Marginal model (m2 ), Prediction model (p2 ) Solution 20/37 Learning I Level two model can be learned by EM algorithm: at t-th iteration, we update ζh by solving (t+1) ζ̂h ← arg max ζh nh X n o (t) Ep2 log f2 (θhi | z hi ; ζh ) θ̂hi ; ζ̂h i=1 where the conditional expectation is taken with respect to (t) (t) the prediction model p2 in (2) evaluated at ζ̂h , and ζ̂h denotes the t-th iteration of the EM algorithm. Solution 21/37 Learning Using EM Algorithm E-step θhi ζ̂h M-step θ̂hi Solution Zhi 22/37 Bayesian Interpretation I Prediction model (2) can be written as p2 (θhi | θ̂hi ; ζh ) ∝ g1 (θ̂hi | θhi )f2 (θhi | zhi ; ζh ). Solution I Here, f2 (θhi | zhi ; ζh ) can be treated as a prior distribution and p2 (θhi | θ̂hi ; ζh ) is a posterior distribution that incorporates the observation of θ̂hi . I Use of g1 (θ̂hi | θhi ) instead of full likelihood simplifies the computation. (Approximate Bayesian Computation). 23/37 Extension to Three Level Model Model Level 1 Level 2 Level 3 Measurement (Data summary) yhi = (yhi1 , · · · , yhin ) θ̂h = (θ̂h1 , · · · , θ̂hnh ) ζ̂ = (ζ̂1 , · · · , ζ̂H ) Parameter Latent variable θhi ζh α θ = (θh1 , · · · , θhnh ) ζ = (ζ1 , · · · , ζH ) We can apply the same three steps to the level three model. Solution 24/37 Bottom-up Estimation Level 3 Latent Variable Model Sampling Error Model Parameter Estimation f3 (ζh |qh ; α) ζ̂h ∼ g2 (ζ̂h |ζh ) α̂ = arg maxα 2 f2 (θhi |zhi ; ζh ) 1 f1 (yhij |xhij ; θhi ) θ̂hi ∼ g1 (θ̂hi |θhi ) PH ζ̂h = arg maxζh h=1 Pnh θ̂hi = arg maxθhi i=1 Pnhi log R g2 (ζ̂h |ζh )f3 (ζh |qh ; α)dζh R log g1 (θ̂hi |θhi )f2 (θhi |zhi ; ζh )dθhi j=1 log f1 (yhij |xhij ; θhi ) Figure : An illustration of the Bottom-up approach to parameter estimation Solution 25/37 Prediction I Our goal is to predict unobserved yhij values from the above models using the parameter estimates. I The best prediction for yhij is h n o i ∗ ŷhij = Ep3 Ep2 Ef1 (yhij | xhij , θhi ) | θ̂hi ; ζh | ζ̂h ; α̂ , where p3 (ζh | ζ̂h , α̂) = R g2 (ζ̂h | ζh )f3 (ζh | qh ; α̂) g2 (ζ̂h | ζh )f3 (ζh | qh ; α̂)dζh and p2 (θhi | θ̂hi , ζh ) = R I Solution g1 (θ̂hi | θhi )f2 (θhi | z hi ; ζh ) g1 (θ̂hi | θhi )f2 (θhi | z hi ; ζh )dθhi . The prediction is made in a top-down manner. 26/37 Prediction: Top-down Prediction α̂ p3 ζ1∗ p2 ∗ θ1i p3 ζ2∗ p2 ∗ θ2i p3 ζ3∗ p2 ∗ θ3i ∗ ). Predict yhij using f1 (yhij | xhij ; θhi Solution 27/37 Prediction: Top-down Prediction Level Latent Prediction Model Best Prediction 3 ζh p3 (ζh | ζ̂h ; α̂) ζh∗ ∼ p3 (ζh | ζ̂h ; α̂) 2 θhi p2 (θhi | θ̂hi ; ζh ) ∗ θhi ∼ p2 (θhi | θ̂hi ; ζh∗ ) 1 yhij f1 (yhij | xhij ; θhi ) ∗ ∗ yhij ∼ f1 (yhij |xhij , θhi ) Figure : Top-down approach to prediction Solution 28/37 Case study: Application to Solar Energy Prediction I We use 15-day long (12/01/2014 – 12/15/2014) data for analysis. I Organized the states into 12 groups. I The number of sites in each group, mh , varies between 37 and 321. Application 29/37 Grouping Scheme I Pooling data from nearby sites. I Can incorporate complex structure such as distribution zone. Application 30/37 Application: Site Level I First assume that yhij = xhij θhi + hij , hij 2 ∼ t(0, σhi , νhi ), 2 is scale parameter and ν is degree of freedom where σhi hi and θ̂hi | θhi ∼ N(θhi , V hi ), where V hi = V (θ̂hi ). I Application The degree of freedom is assumed to be unknown and estimated by the method of Lange et al. (1989). 31/37 Three Level Model I Assume level 2 model θhi ∼ N(βh , Σh ), and ζh = (βh , Σh ) I Similarly, level 3 model is ζh ∼ N(µ, Σ), and α = (µ, Σ). Application 32/37 Comparison I We compared the performance of the multi-level approach with three other modeling methods: I I I I Application Site-by-site model: fit a different model for each individual site Group-by-group model: fit a different model for each group One global model: fit a single common model for all sites using the aggregate data To evaluate the prediction accuracy, we randomly selected the 70% of the data to fit the model and tested on the remaining 30%. 33/37 MSPE Comparison I We compare the accuracy by Mean Squared Prediction P Error (MSPE), NT−1 (yhij − ŷhij )2 , where ŷhij are obtained from four different methods and NT is the size of the test data set. MSPE SD Multi level 0.297 0.601 Site model 0.298 0.609 Group model 0.406 0.803 Global model 0.383 0.791 Table : Accuracy comparison of the different modeling methods Application 34/37 Comparison in Detail (nhi ≤ 100 vs > 100) Mean Squared Error 1.5 Method 1.0 Multilevel Site Model Group Model Global Model 0.5 0.0 <100 >100 Sample Size Application 35/37 Discussion I Motivated from a real problem: A solar energy forecasting system has been developed. I We used a multi-level model approach to address the practical issues. There are more issues to be investigated. I I I I I I Application Spatial modeling Estimation of group structure Preferential sampling of sites ... The proposed method is promising for handling big data. 36/37 Application 37/37
© Copyright 2026 Paperzz