Statistical Downscaling and Modelling Using Sparse Variable Selection Methods Climate Adaptation Flagship Aloke Phatak, Harri Kiiveri, Carmen Chan, Bryson Bates, & Steve Charles Outline • Why Variable Selection? • Rapid Variable Elimination (RaVE) • Examples • I. Rainfall Occurrence • Sparse logistic regression • RaVE as a ‘pre-filter’ • II. Variable Selection for Extremes • Future Work 11th IMSC, 12–16 July 2010 Why Variable Selection? • In constructing empirical models of climatic variables, e.g., rainfall, temperature, we may have some idea of the drivers of the response of interest, but we often don’t. • Variable selection in statistical downscaling and modelling methods • ‘Expert knowledge’, model-selection criteria, and trial-and-error • • • • NHMM – Hughes et al. (1999); Kirshner (2005) GLM – Chandler and Wheater (2002) Regression models (SDSM) – Wilby and Dawson, 2007; Hessami et al., 2008 BHM for extremes – Palmer et al. (2010) • Can generally only consider a ‘small’ number of potential variables • It would be useful to have automatic variable selection methods for selecting a parsimonious set of explanatory variables from a potentially large set of e.g., gridded variables • Little work done on automatic variable selection for extreme values As always, keep in mind limitations of models from observational data 11th IMSC, 12–16 July 2010 Rapid Variable Elimination (RaVE) • Platforms for generating high-dimensional data have led to the situation where the number of observations, n, is much less than the number of variables, p. So, selecting a small set of explanatory variables that explains the response of interest is very challenging • Conventional methods such as best-subset selection tend to be inefficient, unstable, and slow (Breiman, 1996) • Tibshirani (1996): Seminal paper on implicit variable selection method known as LASSO (Least absolute shrinkage and selection operator) • For linear regression, LASSO boils down to a penalized least squares procedure: 11th IMSC, 12–16 July 2010 Rapid Variable Elimination (RaVE) • Platforms for generating high-dimensional data have led to the situation where the number of observations, n, is much less than the number of variables, p. So, selecting a small set of explanatory variables that explains the response of interest is very challenging • Conventional methods such as best-subset selection tend to be inefficient, unstable, and slow (Breiman, 1996) • Tibshirani (1996): Seminal paper on implicit variable selection method known as LASSO (Least absolute shrinkage and selection operator) • For linear regression, LASSO boils down to a penalized least squares procedure: • NB: the ridge estimator arises from: 11th IMSC, 12–16 July 2010 Rapid Variable Elimination (RaVE) • LASSO has a Bayesian interpretation, and that led to the use of Bayesian hierarchical priors for the vector of coefficients • In RaVE, the prior captures the assumption that although there may be many more variables than observations, the ‘true’ number of effective parameters (non-zero coefficients) is actually very small • The prior is a Normal-Gamma prior, formulated as: (Kiiveri, H.K. (2008). BMC Bioinformatics, 9:195.) 11th IMSC, 12–16 July 2010 Rapid Variable Elimination (RaVE) • RaVE includes LASSO as a special case ( yields sparser models • Estimation: ), and for , • The posterior of , the vector of parameters of primary interest, , the vector of parameters of secondary interest, and , given data is • By treating as missing data, we use an EM algorithm to maximize the log posterior to obtain maximum a posteriori (MAP) estimates of the vectors and given values of hyperparameters • Can be used for a wide variety of models • NB For some recent work putting regularization into a fully Bayesian framework and comparing with penalized likelihood, see • Kyung et al. (2010). Bayesian Analysis, 5 (2), 369–412 • Fahrmeir et al. (2010). Stat. Comput., 20 (2), 203–219 • Griffin and Brown (2010). Bayesian Analysis, 5 (1), 171–188 11th IMSC, 12–16 July 2010 Example I – Rainfall Occurrence • Half-year (MJJASO) rainfall records from stations in South Australia from 1958–2006 • Atmospheric data: • NCEP-NCAR reanalysis data at 2.5° x 2.5° resolution across 7 x 8 grid • 7 potential predictor variables in each grid box: SLP, HGT and DTD at 500, 700 and 850 hPa • Total of 392 (7 x 8 x 7) potential predictors • Strategy: • Site-by-site logistic regression: • Model-building data: 1986 – 2006; Test data: 1958–1985 • Use n-fold cross-validation over a grid of k and b values • Assessment: reliability plots, ROC curves; interannual performance and wet- and dry-spell length frequencies based on simulations 11th IMSC, 12–16 July 2010 Example I – Study Area 11th IMSC, 12–16 July 2010 Example I – Selecting Hyperparameters 11th IMSC, 12–16 July 2010 Example I – Selected Variables (Station 2) 11th IMSC, 12–16 July 2010 Example I – Performance on Test Set (Station 2) 11th IMSC, 12–16 July 2010 Example I – Comparison With NHMM (Station 2) 11th IMSC, 12–16 July 2010 Example I – Comparison With NHMM (Station 2) 11th IMSC, 12–16 July 2010 Example 1 – Summary of Results • For all stations, RaVE selected variables in expected regions that have sensible interpretations • 11 – 18 variables selected, slight differences between stations • Results comparable to NHMM, sometimes better • Single-site, not multi-site! • Extensions: • Multi-site • Interpretation easier if spatially contiguous regions of variables were to be selected • Have also used RaVE as a ‘pre-filter’ for selecting variables for an NHMM – results comparable, slightly better • Holy grail – apply sparsity prior to NHMM? IEMSS 2010, 5 July 2010 Variable Selection for Extreme Values • If we have a series of block maxima, and they do not change over time, then we can estimate the parameters of the GEV distribution using, say, maximum likelihood, to obtain estimates • If, however, some of these parameters change over time, we have to postulate and then fit a model for this change • So, in modelling the location parameter of a GEV distribution, we write: • Can use RaVE to select variables in the linear predictor – need first and second derivatives of log-likelihood with respect to the linear predictor 11th IMSC, 12–16 July 2010 Example II • Extreme rainfall in NWWA: is it changing over time, and can we find a stable relationship with a small set of predictors? • Exploratory, use predictor(s) in more sophisticated models, ... • Wet season (NDJFMA) rainfall records from 19 stations in Kimberley and Pilbara from 1958–2007. • Atmospheric data: • NCEP-NCAR reanalysis data at 2.5° x 2.5° resolution across 11 x 9 grid • 20 potential predictor variables in each grid box: T, DTD, GPH, SH, N-S and E-W components of wind speed at 3 pressure levels; and MSLP and TT, measured on the day corresponding to the maximum rainfall • n = 47, p = 1980 • Strategy: • Diagnostic plots to determine whether extremes are changing • Variable selection using RaVE for location parameter model with constant scale and shape parameters 11th IMSC, 12–16 July 2010 Example II – Smoothing of Block Maxima Station 1 (Kimberley): NDJFMA maxima with smoothed location parameter (method of Davison and Ramesh, 2000) 11th IMSC, 12–16 July 2010 Example II • RaVE depends on two hyper-parameters, k and b • where there is plenty of data, some form of cross-validation can be used • here, we carry out variable selection for a grid of k and b values, and then use diagnostics to assess over-fitting • With n = 47 and p = 1980, how many variables would it be sensible to fit? • Rule-of-thumb: at least five observations for every parameter fitted (Huber, 1980), so no more than 5–8. • With RaVE, selecting more than about 6 – 8 variables results in severe overfitting. • Generally insensitive to value of b, but very sensitive to k. 11th IMSC, 12–16 July 2010 Example – Selected Variables (Station 1) Station 1 (Kimberley): 3 variables selected – DTD at 850 hPa and SH at 700 hPa. Coefficients are significant. 11th IMSC, 12–16 July 2010 Example Station 1 (Kimberley): Estimated location (not mean!) with pointwise 95% CI; constant scale and shape 11th IMSC, 12–16 July 2010 Summary • Demonstrated proof-in-principle fast variable selection for extreme values when n << p • Sensible results obtained • Picking variables at random does not yield significant coefficients, neither does using, e.g., ENSO • Much more work to be done: • Block maxima are wasteful – r-largest order statistics, point process likelihood • Multi-site models – dependency networks based on sparse regression • Interpretability – we would expect regions of variables to influence the outcome; modify the prior to force contiguous regions to be selected • Fused LASSO (Tibshirani et al. (2005) – additional constraints • Bayesian fused LASSO – Kyung et al. (2010) • Diagnostics – selection of hyperparameters k and b, goodness-of-fit measures 11th IMSC, 12–16 July 2010 Mathematics, Informatics and Statistics Aloke Phatak Phone: +61 8 9333 6184 Email: [email protected] Web: www.csiro.au/cmis Thank you Contact Us Phone: 1300 363 400 or +61 3 9545 2176 Email: [email protected] Web: www.csiro.au
© Copyright 2025 Paperzz