A VALIDATOR’S GUIDE TO MODEL RISK MANAGEMENT BEST PRACTICES FOR COSTEFFECTIVE REGULATORY COMPLIANCE Release 1.0 January 2017 TABLE OF CONTENTS MODEL VALIDATION: IS THIS SPREADSHEET A MODEL? 2 VALIDATING VENDOR MODELS – SPECIAL CONSIDERATIONS 7 PREPARING FOR MODEL VALIDATION – IDEAS FOR MODEL OWNERS 10 4 QUESTIONS TO ASK WHEN DETERMINING MODEL VALIDATION SCOPE PERFORMANCE TESTING: BENCHMARKING VS BACKTESTING 14 VALIDATING MODEL INPUTS – HOW MUCH IS ENOUGH? 23 CONTRIBUTORS 26 ABOUT RISKSPAN 28 19 As model validators, we frequently find ourselves in the middle of debates between spreadsheet owners and enterprise risk managers over the question of whether a particular computing tool rises to the level of a “model.” To the uninitiated, the semantic question, “Is this spreadsheet a model?” may appear to be largely academic and inconsequential. But its ramifications are significant, and getting the answer right is of critical importance to model owners, to enterprise risk managers, and to regulators. STAKEHOLDERS OF MODEL VALIDATION In the most important respects, the incentives of these stakeholder groups are aligned. Everybody has an interest in knowing that the spreadsheet in question is functioning as it should and producing accurate and meaningful outputs. Appropriate steps should be taken to ensure that every computing tool does this, regardless of whether it is ultimately deemed a model. But classifying something as a model carries with it important consequences related to cost and productivity, as well as overall model risk management. It is here where incentives begin to diverge. Owners and users of spreadsheets in particular are generally inclined to classify them as simple applications or end-user computing (EUC) tools whose reliability can (and ought to) be ascertained using testing measures that do not rise to the level of formal model validation procedures required by regulators.1 These formal procedures can be both expensive for the institution and onerous for the model owner. Models require meticulous documentation of their 1 In the United States, most model validations are governed by one of the following sets of guidelines: 1) OCC 2011-12 (institutions regulated by the OCC), 2) FRB SR-11 (institutions regulated by the Federal Reserve) and 3) FHFA 2013-07 (Fannie Mae, Freddie Mac, and the Federal Home Loan Banks). These documents have much in common and the OCC and FRB guidelines are identical to one another. 2 approach, economic and financial theory, and code. Painstaking statistical analysis is frequently required to generate the necessary developmental evidence, and further cost is then incurred to validate all of it. Enterprise risk managers and regulators, who do not necessarily feel these added costs and burdens, may be inclined to err on the side of classifying spreadsheets as models “just to be on the safe side.” But incurring unnecessary costs is not a prudent course of action for a financial institution (or any institution). And producing more model validation reports than is needful can have other unintended, negative consequences. Model validations pull model owners away from their everyday work, adversely affecting productivity and, sometimes, quality of work. Virtually every model validation report identifies issues that must be reviewed and addressed by management. Too many unnecessary reports containing findings that are comparatively unimportant can bury enterprise risk managers and distract them from the most urgent findings. DEFINITION OF A MODEL So what, then, are the most important considerations in determining which spreadsheets are in fact models that should be subject to formal validation procedures? OCC and FRB guidance on model risk management defines a model as follows:2 A quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates. The same guidance refers to models as having three components: 1. An information input component, which delivers assumptions and data to the mode 2. A processing component, which transforms inputs into estimates 3. A reporting component, which translates the estimates into useful business information This definition and guidance leaves managers with some latitude. Financial institutions employ many applications that apply mathematical concepts to defined inputs in order to generate outputs. But the existence of inputs, outputs, and mathematical concepts alone does not necessarily justify classifying a spreadsheet as a model. Note that the regulatory definition of a model includes the concept of quantitative estimates. The term quantitative estimate implies a level of uncertainty about the outputs. If an application is generating outputs about which there is little or no uncertainty, then one can argue the output is not a quantitative 2 See footnote 1. 3 estimate but, rather, simply a defined arithmetic result. While quantitative estimates typically result from arithmetic processes, not every defined arithmetic result is a quantitative estimate. For example, a spreadsheet that sums all the known balances of ten bank accounts as of a given date, even if it is supplied by automated feeds, and performs the summations in a completely lights-out process, likely would not rise to the level of a model requiring validation because it is performing a simple arithmetic function; it is not generating a quantitative estimate.3 In contrast, a spreadsheet that projects what the sum of the same ten bank balances will be as of a given future date (based on assumptions about interest rates, expected deposits, and decay rates, for example) generates quantitative estimates and would therefore qualify as a model requiring validation. Management and regulators would want to have comfort that the assumptions used by this spreadsheet model are reasonable and that they are being applied and computed appropriately. IS THIS SPREADSHEET A MODEL? We have found the following questions to be particularly enlightening in helping our clients determine whether a spreadsheet should be classified as 1) a model that transforms inputs into quantitative estimates or 2) a non-model spreadsheet that generates defined arithmetic results. QUESTION 1: DOES THE SPREADSHEET PRODUCE A DEMONSTRABLY “RIGHT” ANSWER ? A related question is whether benchmarking yields results that are comparable, as opposed to exactly the same. If spreadsheets designed by ten different people can reasonably be expected to produce precisely the same result (because there is only one generally accepted way of calculating it), then the result probably does not qualify as a quantitative estimate and the spreadsheet probably should not be classified as a model. Example 1 (Non-Model): Mortgage Amortization Calculator: Ten different applications would be expected to transform the same loan amount, interest rate, and term information into precisely the same amortization table. A spreadsheet that differed from this expectation would be considered “wrong.” We 3 Management would nevertheless want to obtain assurances that such an application was functioning correctly. This, however, can be achieved via less intrusive means than a formal model validation process. This might be addressed via conventional auditing, SOX reviews, or EUC quality gates. All of these are less intrusive. 4 would not consider this output to be a quantitative estimate and would be inclined to classify such a spreadsheet as something other than a model. Example 2 (Model): Spreadsheet projecting the expected UPB of a mortgage portfolio in 12 months: Such a spreadsheet would likely need to apply and incorporate prepayment and default assumptions. Different spreadsheets could compute and apply these assumptions differently, without one particularly necessarily being recognized as “wrong.” We would consider the resulting UPB projections to be quantitative estimates and would be likely to classify such as spreadsheet as a model. Note that the spreadsheets in both examples tell their users what a loan balance will be in the future. But only the second example layers economic assumptions on top of its basic arithmetic calculations. Economic assumptions can be subjected to verification after the fact, which relates to our second question: QUESTION 2: CAN THE SPREADSHEET’S OUTPUT BE BACK-TESTED? Another way of stating this question would be, “Is back-testing required to gauge the accuracy of the spreadsheet’s outputs?” This is a fairly unmistakable indicator of a forward-looking quantitative estimate. A spreadsheet that generates forward-looking estimates is almost certainly a model and should be subjected to formal model validation. Back-testing would not be of any particular value in our first (non-model) example, above, as the spreadsheet is simply calculating a schedule. In our second (model) example, however, back-testing would be an invaluable tool for judging the reliability of the prepayment and default assumptions driving the balance projection. QUESTION 3: IS THE SPREADSHEET SIMPLY APPLYING A DEFINED SET OF BUSINESS RULES? Spreadsheets are sometimes used to automate the application of defined business rules in order to arrive at a prescribed course of action. This question is a corollary to the first question about whether the spreadsheet produces output that is, by definition, “correct.” Examples of business-rule calculators are spreadsheets that determine a borrower’s eligibility for a particular loan product or loss mitigation program. Such spreadsheets are also used to determine how much of a haircut to apply to various collateral types based on defined rules. These spreadsheets do not generate quantitative estimates and we would not consider them models subject to formal regulatory validation. 5 SHOULD I VALIDATE THIS SPREADSHEET? All spreadsheets that perform calculations should be subject to review. Any spreadsheet that produces incorrect or otherwise unreliable outputs should not be used until its errors are corrected. Formal model validation procedures, however, should be reserved for spreadsheets that meet certain criteria. Subjecting non-model spreadsheets to model validation unnecessarily drives up costs and dilutes the findings of bona fide model validations by cluttering enterprise risk management’s radar with an unwieldy number of formal issues requiring tracking and resolution. Spreadsheets should be classified as models (and validated as such) when they produce forward-looking estimates that can be back-tested. This excludes simple calculators that do not rely on economic assumptions or apply business rules that produce outputs that can be definitively identified before the fact as “right” or “wrong.” We believe that the systematic application of these principles will alleviate much of the tension between spreadsheet owners, enterprise risk managers, and regulators as they work together to identify those spreadsheets that should be subject to formal model validation. 6 Many of the models we validate on behalf of our clients are developed and maintained by third-party vendors. These validations present a number of complexities that are less commonly encountered when validating “home-grown” models. These often include: 1. 2. 3. 4. 5. Inability to interview the model developer Inability to review the model code Inadequate documentation Lack of developmental evidence and data sets Lack of transparency into the impact custom settings Notwithstanding these challenges, the OCC’s Supervisory Guidance on Model Risk Management (OCC 2011-12)1 specifies that, “Vendor products should nevertheless be incorporated into a bank’s broader model risk management framework following the same principles as applied to in-house models, although the process may be somewhat modified.” The extent of these modifications depends on the complexity of the model and the cooperation afforded by the model’s vendor. We have found the following general principles and practices to be useful. 7 VALIDATING VENDOR MODELS Vendor documentation is not a substitute for model documentation: Documentation provided by model vendors typically includes user guides and other materials designed to help users navigate applications and make sense of outputs. These documents are written for a diverse group of model users and are not designed to identify and address particular model capabilities specific to the purpose and portfolio of an individual bank. A bank’s model documentation package should delve into its specific implementation of the model, as well as the following: Discussion of the model’s purpose and specific application, including business and functional requirements achieved by the model Discussion of model theory and approach, including algorithms, calculations, formulas, functions and programming; Description of the model’s structure Identification of model limitations and weaknesses Comprehensive list of inputs and assumptions, including their sources Comprehensive list of outputs and reports and how they are used, including downstream systems that rely on them Description of testing (benchmarking and back-testing) Because documentation provided the vendor is likely to include very few if any of these items, it falls to the model owner (at the bank) to generate this documentation. While some of these items (specific algorithms, calculations, formulas, and programming, for example) are likely to be deemed proprietary and will not be disclosed by the vendor, most of these components are obtainable and should be requested and documented. Model documentation should also clearly lay out all model settings (e.g., knobs) and justification for the use of (or departure from) vendor default settings. Testing results should be requested of the vendor: OCC 2011-12 states that “Banks should expect vendors to conduct ongoing performance monitoring and outcomes analysis, with disclosure to their clients, and to make appropriate modifications and updates over time.” Many vendors publish the results of their own internal testing of the model. For example, a prepayment model vendor is likely to include back-testing results of the model’s forecasts for certain loan cohorts against actual, observed prepayments. An automated valuation model (AVM) vendor might publish the results of testing comparing the property values it generates against sales data. If a model’s vendor does not publish this information, model validators should request it and document the response in the model validation report. Where available, this information should be obtained and incorporated into the model validation process, along with a discussion of its applicability to data the bank is modeling. Model validators should attempt to replicate the results of these studies, where feasible, and use them to enhance their own independent benchmarking and back-testing activities. 8 Developmental evidence should be requested of the vendor: OCC 2011-12 directs banks to “require the vendor to provide developmental evidence explaining the product components, design, and intended use.” This should be incorporated into the bank’s model documentation. Where feasible, model validators should also ask model vendors to provide information about data sets that were used to develop and test the model. Contingency plans should be maintained: OCC 2011-12 cites the importance of a bank’s having “as much knowledge in-house as possible, in case the vendor or the bank terminates the contract for any reason, or if the vendor is no longer in business. Banks should have contingency plans for instances when the vendor model is no longer available or cannot be supported by the vendor.” For simple applications whose inner workings are well understood and replicable, a contingency plan may be as simple as Microsoft Excel. This requirement can pose a significant challenge, however, for banks that purchase off-the-shelf asset-liability and market risk models and do not have the in-house expertise to quickly and adequately replicate these models’ complex computations. Situations such as this argue for the implementation of reliable challenger models, which not only assist in meeting benchmarking requirements, but can also function as a contingency plan backup. Consult the model risk management group during the process of procuring any application that might possibly be classified as a “model”: In a perfect world, model validation considerations would be contemplated as part of the procurement process. An agreement to provide developmental evidence, testing results, and cooperation with future model validation efforts would ideally figure into the negotiations before the purchase of any application is finalized. Unfortunately, our experience has shown that banks often acquire what they think of as a simple third-party application, only to be informed after the fact, by either a regulator or the model risk management group, that they have in fact purchased a model requiring validation. A model vendor, particularly one not inclined to think of its product as a “model,” may not always be as responsive to requests for development and testing data after sale if those items have not been requested as a condition for the sale. It is therefore a prudent practice for procurement departments to have open lines of communication with model risk management groups so that the right questions can be asked and requirements established prior to application acquisition. 9 Though not its intent, model validation can be disruptive to model owners and others seeking to carry out their day-to-day work. We have performed enough model validations over the past decade to have learned how cumbersome the process can be to business unit model owners and others we inconvenience with what at times must feel like an endless barrage of touch-point meetings, documentation requests and other questions relating to modeling inputs, outputs, and procedures. We recognize that the only thing these business units did to deserve this inconvenience was to devise or procure a methodology for systematically improving how something gets estimated. In some cases, the business owner of an application tagged for validation may view it simply as a calculator or other tool, and not as a "model." And in some cases we agree with the business owner. But in every case, the system under review has been designated as a model requiring validation either by an independent risk management department within the institution or (worse) by a regulator, and so, the validation project must be completed. As with so many things in life, when it comes to model validation preparation, an ounce of prevention goes a long way. Here are some ideas model owners might consider for making their next model validation a little less stressful. 10 OVERALL MODEL DOCUMENTATION Among the first questions we ask at the beginning of a model validation is whether the model has been validated before. In reality, however, we can make a fairly reliable guess about the model's validation history simply by reading the model owner's documentation. A comprehensive set of documentation that clearly articulates the model's purpose, its inputs' sources, how it works, what happens to the outputs and how the outputs are monitored is an almost sure sign that the model in question has been validated multiple times. In contrast, it's generally apparent that the model is being validated for the first time when our initial request for documentation yields one or more of the following: An 800-page user guide from the model’s vendor, but no internally developed documentation or procedures Incomplete (or absent) lists of model inputs with little or no discussion of how inputs and assumptions are obtained, verified, or used in the model No discussion of the model’s limitations Perfunctory monitoring procedures, such as, "The outputs are reviewed by an analyst for reasonableness" Vague (or absent) descriptions of the model's outputs and how they are used Change logs with just one or two entries No one likes to write model documentation. There never seems to be enough time to write model documentation. Compounding this challenge is the fact that model validations frequently seem to occur at the most inopportune moments for model owners. A bank's DFAST models, for example, often undergo validation while the business owners who use them are busy preparing the bank's DFAST submission. This is not the best time to be tweaking documentation and assembling data for validators. Documentation would ideally be prepared during periods of lower operational stress. Model owners can accomplish this by predicting and staying in front of requests from model risk management by independently generating documentation for all their models that satisfies the following basic criteria: Identifies the model's purpose, including its business and functional requirements, and who is responsible for using and maintaining the model Comprehensively lists and justifies of the model's inputs and assumptions Describes the model's overall theory and approach, i.e., how the model goes about transforming the inputs and assumptions into reliable outputs (including VBA or other computer code if the model was developed in house) Lays out the developmental evidence supporting the model Identifies the limitations of the model Explains how the model is controlled—who can access it, who can change it, what sorts of approvals are required for different types of changes Comprehensively identifies and describes the model’s outputs, how they are used, and how they are tested 11 Any investment of time beforehand to incorporate the items above into the model’s documentation will pay dividends when the model validation begins. Being able to simply hand this information over to the validators will likely save model owners hours of attending follow-up meetings and fielding requests. Additional suggestions for getting the model’s inputs and outputs in order follow below. MODEL INPUTS All of the model’s inputs and assumptions need to be explicitly spelled out, as well as their relevance to the model, their source(s), and any processes used to determine their reliability. Simply emailing an Excel file containing the model and referring the validator to the ‘Inputs’ tab is probably going to result in more meetings, more questions, and more time siphoned out of the model owner’s workday by the validation team. A useful approach for consolidating inputs and assumptions that might be scattered around different areas of the model involves the creation of a simple table that captures everything a validator is likely to ask about each of the model’s inputs and assumptions. Input/Assumption Location (screen/tab) Source Purpose How Verified 2-Yr/10-Yr Rates ‘YldCrv’ Tab Bloomberg Forecast Secondary Mortgage Rates AD-Co Forecast prepayments Input 2 Assumption 2 Prepayment Screen … … … … … … Weekly spot check of two random values against TradeWeb Back-test study provided by vendor … … … … … … … CPR Curve Swap Systematically capturing all of the model’s inputs and assumptions in this way enables the validators to quickly take inventory of what needs to be tested without having to subject the model owner to a timeconsuming battery of questions designed to make sure they haven’t missed anything. MODEL OUTPUTS Being prepared to explain to the validator all the model’s outputs individually and how each is used in reporting and downstream applications greatly facilitates the validation process. Accounting for all the uses of every output becomes more complicated when they are used outside the business unit, including as inputs to another model. At the discretion of the institution’s model risk management group, it may be 12 sufficient to limit this exercise only to uses within the model owner’s purview and to reports provided to management. As with inputs, this can be facilitated by a table: Output Location (screen/tab) Report(s) containing it Output Purpose Benchmarking/ Backtesting Proc. 1-day VaR ‘Output’ tab Daily VaR Report Back-testing of VaR vs. actual P/L over 12-month look-back. Market Value Position Screen Portfolio Summary Option-Adjusted Spread Duration Risk Metrics Screen Risk Metrics Screen Risk Report Convexity Risk Metrics Screen Rate Shock Summary … … … Capture potential daily loss with 99% confidence Report current portfolio value Discount rate determination Measure asset sensitivity to interest rate changes Measure asset sensitivity to interest rate changes … Rate Shock Summary Benchmarked monthly against pricing service Benchmarked semi-annually against Bloomberg Benchmarked semi-annually against Bloomberg Benchmarked semi-annually against Bloomberg … Outputs that impact directly on financial statements are especially important. Model validators are likely to give these outputs particular scrutiny and model owners would do well to be prepared to explain not only how such outputs are computed and verified, but how the audit trails surrounding them are maintained, as well. To the extent that outputs are subjected to regular benchmarking, back-testing, or sensitivity analyses, these should be gathered as well. A SERIES OF SMALL INVESTMENTS A model owner might look at these suggestions and conclude that they seem like a lot of work just to get ready for a model validation. We agree. Bear in mind, however, that the model validator is almost certain to ask for these things at some point during the validation, when, chances are, a model owner is likely to wish she had the flexibility to do her real job. Making a series of small time investments to assemble these items well in advance of the validators’ arrival not only will make the validation more tolerable for model owners, but will likely improve the overall modeling process as well. 13 Model risk management is a necessary undertaking for which model owners must prepare on a regular basis. Model risk managers frequently struggle to strike an appropriate cost-benefit balance in determining whether a model requires validation, how frequently a model needs to be validated, and how detailed subsequent and interim model validations need to be. The extent to which a model must be validated is a decision that affects many stakeholders in terms of both time and dollars. Everyone has an interest in knowing that models are reliable, but bringing the time and expense of a full model validation to bear on every model, every year is seldom warranted. What are the circumstances under which a limited-scope validation will do and what should that validation look like? We have identified four considerations that can inform your decision on whether a full-scope model validation is necessary: 1. 2. 3. 4. What about the model has changed since the last full-scope validation? How have market conditions changed since the last validation? How mission-critical is the model? How often have manual overrides of model output been necessary? 14 WHAT CONSTITUTES A MODEL VALIDATION Comprehensive model validations consist of three main components: conceptual soundness, ongoing monitoring and benchmarking, and outcomes analysis and back-testing.1 A comprehensive validation encompassing all these areas is usually required when a model is first put into use. Any validation that does not fully address all three of these areas is by definition a limited-scope validation. 1 Comprehensive validations on ‘black box’ models developed and maintained by third-party vendors are therefore problematic because the mathematical code and formulas are not typically available for review (in many cases a validator can only hypothesize the cause and effect relationships between the inputs and outputs based on a reading of the model’s documentation). Ideally, regular comprehensive validations are supplemented by limited-scope validations and outcomes analyses on an ongoing, interim basis to ensure that the model performs as expected. KEY CONSIDERATIONS FOR MODEL VALIDATION There is no ‘one size fits all’ question for determining how often a comprehensive validation is necessary, versus when a limited-scope review would be appropriate. Beyond the obvious time and cost considerations, model validation managers would benefit from asking themselves a minimum of four questions in making this determination: QUESTION 1: WHAT ABOUT THE MODEL HAS CHANGED SINCE THE LAST FULLSCOPE VALIDATION? Many models layer economic assumptions on top of arithmetic equations. Most models consist of three principal components: 1. inputs (assumptions and data) 2. processing (underlying mathematics and code that transform inputs into estimates) 3. output reporting (processes that translate estimates into useful information) Changes to either of the first two components are more likely to require a comprehensive validation than changes to the third component. A change that materially impacts how the model output is computed, either by changing the inputs that drive the calculation or by changing the calculations themselves, is 1 In the United States, most model validations are governed by the following sets of guidelines: 1) OCC 2011-12 (institutions regulated by the OCC), and 2) FRB SR-11 (institutions regulated by the Federal Reserve). These guidelines are effectively identical to one another. Model validations at Government-sponsored enterprises, including Fannie Mae, Freddie Mac, and the Federal Home Loan Banks, are governed by Advisory Bulletin 2013-07, which, while different from the OCC and Fed guidance, shares many of the same underlying principles 15 more likely to merit a more comprehensive review than a change that merely affects how the model’s outputs are interpreted. For example, say I have a model that assigns a credit rating to a bank’s counterparties on a 100-point scale. The requirements the bank establishes for the counterparty are driven by how the model rates the counterparty. Say, for example, that the bank lends to counterparties that score between 90 and 100 with no restrictions, between 80 and 89 with pledged collateral, between 70 and 79 with delivered collateral, and does not lend to counterparties scoring below a 70. Consider two possible changes to the model: 1. Changes in model calculations that result in what used to be a 65 now being a 79. 2. Changes in grading scale that result in a counterparty that receives a rating of 65 now being deemed creditworthy. While the second change impacts the interpretation of model output and may require only a limited-scope validation to determine whether the amended grading scale is defensible, the first change is almost certain to require that the validator go deeper ‘under the hood’ for verification that the model is working as intended. Assuming that the inputs did not change, the first type of change may be the result of changes to assumptions (e.g., weighting schemes) or simply a revision to a perceived calculation error. The second is a change on the reporting component, where a comparison of the model’s forecasts to those of challenger models and back-testing with historical data may be sufficient for validation. Not every change that affects model outputs necessarily requires a full-scope validation. The insertion of recently updated economic forecasts into a recently validated model may require only a limited set of tests to demonstrate that changes in the model estimates are consistent with the new economic forecast inputs. The magnitude of the impact on output also matters. Altering several input parameters that results in a material change to model output is more likely to require a full validation. QUESTION 2: HOW HAVE MARKET CONDITIONS CHANGED SINCE THE LAST VALIDATION? Even models that do not change at all require periodic, full-scope validations because macroeconomic conditions or other external factors call one or more of the model’s underlying assumptions into question. The 2008 global financial crisis is a perfect example. Mortgage credit and prepayment models prior to 2008 were built on assumptions that appeared reasonable and plausible based on market observations prior to 2008. Statistical models based solely on historical data before, during, or after the crisis are likely to require full-scope validations as their underlying datasets are expanded to capture a more comprehensive array of observed economic scenarios. It doesn’t always have to be bad news in the economy to instigate model changes that require full-scope validations. The federal funds rate has been hovering near zero since the end of 2008. With a period of gradual and sustained recovery potentially on the horizon, many models are beginning to incorporate rising interest rates into their current forecasts. These foreseeable model adjustments will likely require 16 more comprehensive validations geared toward verifying that model outputs are appropriately sensitive to the revised interest rate assumptions. QUESTION 3: HOW MISSION-CRITICAL IS THE MODEL? The more vital the model’s outputs are to financial statements or mission-critical business decisions, the greater the need for frequent and detailed third-party validations. Model risk is amplified when the model outputs inform reports that are provided to investors, regulators, or compliance authorities. Particular care should be given when deciding whether to partially validate models with such high-stake outputs. Models whose outputs are used for internal strategic planning are also important. That being said, some models are more critical to a bank’s long-term success than others. Ensuring the accuracy of the risk algorithms used for DFAST stress testing is more imperative than the accuracy of a model that predicts wait times in a customer service queue. Consequently, DFAST models, regardless of their complexity, are likely to require more frequent full-scope validations than models whose results likely undergo less scrutiny. QUESTION 4: HOW OFTEN HAVE MANUAL OVERRIDES OF MODEL OUTPUT BEEN NECESSARY? Another issue to consider revolves around the use of manual overrides to the model’s output. In cases where expert opinion is permitted to supersede the model outputs on a regular basis, more frequent fullscope validations may be necessary in order to determine whether the model is performing as intended. Counterparty credit scoring models, cited in our earlier example, are frequently subjected to manual overrides by human underwriters to account for new or other qualitative information that cannot be processed by the model. The decision of whether it is necessary to revise or re-estimate a model is frequently a function of how often such overrides are required and what the magnitude of these overrides tends to be. Models that frequently have their outputs overridden should be subjected to more frequent full-scope validations. And models that are revised as a result of numerous overrides should also likely be fully validated, particularly when the revision includes significant changes to input variables and their respective weightings. FULL OR PARTIAL MODEL VALIDATION Model risk managers need to perform a delicate balancing act in order to ensure that an enterprise’s models are sufficiently validated while keeping to a budget and not overly burdening model owners. In many cases, limited-scope validations are the most efficient means to this end. Such validations allow for the continuous monitoring of model performance without bringing in a Ph.D. with a full team of experts to opine on a model whose conceptual approach, inputs, assumptions, and controls have not changed 17 since its last full-scope validation. While gray areas abound and the question of full versus partial validation needs to be addressed on a case-by-case basis, the four basic considerations outlined above can inform and facilitate the decision. Incorporating these considerations into your model risk management policy will greatly simplify the decision of how detailed your next model validation needs to be. An informed decision to perform a partial model validation can ultimately save your business the time and expense required to execute a full model validation. 18 When someone asks you what a model validation is, what is the first thing you think of? If you are like most, then you would immediately think of performance metrics — those quantitative indicators that tell you not only if the model is working as intended, but also its performance and accuracy over time and compared to others. Performance testing is the core of any model validation and generally consists of the following components: Benchmarking Back-testing Sensitivity Analysis Stress Testing Sensitivity analysis and stress testing, while critical to any model validation’s performance testing, will be covered by a future article. This post will focus on the relative virtues of benchmarking versus backtesting—seeking to define what each is, when and how each should be used, and how to make best use of the results of each. 19 BENCHMARKING Benchmarking is when the validator is providing a comparison of the model being validated to some other model or metric. The type of benchmark utilized will vary, like all model validation performance testing does, with the nature, use, and type of model being validated. Due to the performance information it provides, benchmarking should always be utilized in some form when a suitable benchmark can be found. CHOOSING A BENCHMARK Choosing what kind of benchmark to use within a model validation can sometimes be a very daunting task. Like all testing within a model validation, the kind of benchmark to use depends on the type of model being tested. Benchmarking takes many forms and may entail comparing the model’s outputs to: The model’s previous version An externally produced model A model built by the validator Other models and methodologies considered by the model developers, but not chosen Industry best practice Thresholds and expectations of the model’s performance One of the most used benchmarking approaches is to compare a new model’s outputs to those of the version of the model it is replacing. It remains very common throughout the industry for models to be replaced due to a deterioration of performance, change in risk appetite, new regulatory guidance, need to capture new variables, or the availability of new sets of information. In these cases, it is important to not only document but also prove that the new model performs better and does not have the same issues that triggered the old model’s replacement. Another common benchmarking approach compares the model’s outputs to those of an external “challenger” model (or one built by the validator) which functions with the same objective and data. This approach is likely to return more apt output comparisons than those generated by benchmarking against older versions that are likely to be out of date since the challenger model is developed and updated with the same data as the champion model. Another benchmark set which could be used for model validation includes other models or methodologies reviewed by the model developers as possibilities for the model being validated but ultimately not used. Model developers as best practice should always list any alternative methodologies, theories, or data which were omitted from the model’s final version. Additionally, model validators should always leverage their experience and understanding of the current best practices throughout the industry, along with any analysis previously completed on similar models. Model validation should then take these alternatives and use them as benchmarks to the model being validated. 20 Model validators have multiple, distinct ways to incorporate benchmarking into their analysis. The use of the different types of benchmarking discussed here should be based on the type of model, its objective, and the validator’s best judgment. If a model cannot be reasonably benchmarked, then the validator should record why not and discuss the resulting limitations of the validation. BACK-TESTING Back-testing is used to measure model outcomes. Here, instead of measuring performance with a comparison, the validator is specifically measuring whether the model is both working as intended and is accurate. Back-testing can take many forms based on the model’s objective. As with benchmarking, backtesting should be a part of every full-scope model validation to the extent possible. WHAT BACK-TESTS TO PERFORM As a form of outcomes analysis, back-testing provides quantitative metrics which measure the performance of a model’s forecast, the accuracy of its estimates, or its ability to rank-order risk. For instance, if a model produces forecasts for a given variable, back-testing would involve comparing the model’s forecast values against actual outcomes, thus indicating its accuracy. A related function of model back-testing evaluates the ability of a given model to adequately measure risk. This risk could take any of several forms, from the probability of a given borrower to default to the likelihood of a large loss during a given trading day. To back-test a model’s ability to capture risk exposure, it is important first to collect the right data. In order to back-test a probability of default model, for example, data would need to be collected containing cases where borrowers have actually defaulted in order to test the model’s predictions. Back-testing models that assign borrowers to various risk levels necessitate some special considerations. Back-testing these and other models that seek to rank-order risk involves looking at the model’s performance history and examining its accuracy through its ability to rank and order the risk. This can involve analyzing both Type 1 (false positive) and Type 2 (false negative) statistical errors against the true positive and true negative rates for a given model. Common statistical tests used for this type of backtesting analysis include, but are not limited to, a Kolmogorov-Smirnov score (KS), a Brier score, or a Receiver Operating Characteristic (ROC). 21 BENCHMARKING VS. BACKTESTING Back-testing measures a model’s outcome and accuracy against real-world observations, while benchmarking measures those outcomes against those of other models or metrics. Some overlap exists when the benchmarking includes comparing how well different models’ outputs back-test against realworld observations and the chosen benchmark. This overlap sometimes leads people to mistakenly conclude that model validations can rely on just one method. In reality, however, back-testing and benchmarking should ideally be performed together in order to bring their individual benefits to bear in evaluating the model’s overall performance. The decision, optimally, should not be whether to create a benchmark or to perform back-testing. Rather, the decision should be what form both benchmarking and back-testing should take. While benchmarking and back-testing are complementary exercises that should not be viewed as mutually exclusive, their outcomes sometimes appear to produce conflicting results. What should a model validator do, for example, if the model appears to back-test well against real-world observations but do not benchmark particularly well against similar model outputs? What about a model that returns results similar to those of other benchmark models but does not back-test well? In the first” scenario, the model owner can derive a measure of comfort from the knowledge that the model performs well in hindsight. But the owner also runs the very real risk of being “out on an island” if the model turns out to be wrong. The second scenario affords the comfort of company in the model’s projections. But what if the models are all wrong together? Scenarios where benchmarking and back-testing do not produce complementary results are not common, but they do happen. In these situations, it becomes incumbent on model validators to determine whether back-testing results should trump benchmarking results (or vice-versa) or if they should simply temper one another. The course to take may be dictated by circumstances. For example, a model validator may conclude that macro-economic indicators are changing to the point that a model which back-tests favorably is not an advisable tool because it is not tuned to the expected forward-looking conditions. This could explain why a model that back-tests favorably remains a benchmarking outlier if the benchmark models are taking into account what the subject model is missing. On the other hand, there are scenarios where it is reasonable to conclude that back-testing results trump benchmarking results. After all, most firms would rather have an accurate model than one that lines up with all the others. As seen in our discussion here, benchmarking and back-testing can sometimes produce distinct or similar metrics depending on the model being validated. While those differences or similarities can sometimes be significant, both benchmarking and back-testing provide critical complementary information about a model’s overall performance. So when approaching a model validation and determining its scope, your choice should be what form of benchmarking and back-testing needs to be done, rather than whether one needs to be performed versus the other. 22 In some respects, the OCC 2011-12/SR 11-7 mandate to verify model inputs could not be any more straightforward: “Process verification … includes verifying that internal and external data inputs continue to be accurate, complete, consistent with model purpose and design, and of the highest quality available.” From a logical perspective, this requirement is unambiguous and non-controversial. After all, the reliability of a model’s outputs cannot be any better than the quality of its inputs. From a functional perspective, however, it raises practical questions around the amount of work that needs to be done in order to consider a particular input “verified.” Take the example of a Housing Price Index (HPI) input assumption. It could be that the modeler obtains the HPI assumption from the bank’s finance department, which purchases it from an analytics firm. What is the model validator’s responsibility? Is it sufficient to verify that the HPI input matches the data of the finance department that supplied it? If not, is it enough to verify that the finance department’s HPI data matches the data provided by its analytics vendor? If not, is it necessary to validate the analytics firm’s model for generating HPI assumptions? It depends. Just as model risk increases with greater model complexity, higher uncertainty about inputs and assumptions, broader use, and larger potential impact, input risk increases with increases in input 23 complexity and uncertainty. The risk of any specific input also rises as model outputs become increasingly sensitive to it. VALIDATING MODEL INPUTS BEST PRACTICES So how much validation of model inputs is enough? As with the management of other risks, the level of validation or control should be dictated by the magnitude or impact of the risk. Like so much else in model validation, no ‘one size fits all’ approach applies to determining the appropriate level of validation of model inputs and assumptions. In addition to cost/benefit considerations, model validators should consider at least four factors for mitigating the risk of input and assumption errors leading to inaccurate outputs. Complexity of inputs Manual manipulation of inputs from source system prior to input into model Reliability of source system Relative importance of the input to the model’s outputs (i.e., sensitivity) CONSIDERATION 1: COMPLEXITY OF INPUTS The greater the complexity of the model’s inputs and assumptions, the greater the risk of errors. For example, complex yield curves with multiple data points will be inherently subject to greater risk of inaccuracy than binary inputs such as “yes” and “no.” In general, the more complex an input is, the more scrutiny it requires and the “further back” a validator should look to verify its origin and reasonability. CONSIDERATION 2: MANUAL MANIPULATION OF INPUTS FROM SOURCE SYSTEM PRIOR TO INPUT INTO MODEL Input data often requires modification from the source system to facilitate input into the model. More handling and manual modifications increases the likelihood of error. For example, if a position input is manually copied from Bloomberg and then subjected to a manual process of modification of format to enable uploading to the model, there is a greater likelihood of error than if the position input is extracted automatically via an API. The accuracy of the input should be verified in either case, but the more manual handling and manipulation of data that occurs, the more comprehensive the testing should be. In this example, more comprehensive testing would likely take the form of a larger sample size. In addition, the controls over the processes to extract, transform, and load data from a source system into the model will impact the risk of error. More mature and effective controls, including automation and reconciliation, will decrease the likelihood of error and therefore likely require a lighter verification procedure. 24 CONSIDERATION 3: RELIABILITY OF SOURCE SYSTEMS More mature and stable source systems generally produce more consistently reliable results. Conversely, newer systems and those that have produced erroneous results increase the risk of error. The results of previous validation of inputs, from prior model validations or from third parties, including internal audit and compliance, can be used as an indicator of the reliability of information from source systems and the magnitude of input risk. The greater the number of issues identified, the greater the risk, and the more likely it is that a validator should seek to drill deeper into the fundamental sources of source data. CONSIDERATION 4: OUTPUT SENSITIVITY TO INPUTS No matter how reliable an input data’s source system is deemed to be, or the amount of manual manipulation to which an input is subjected, perhaps the most important consideration is the individual input’s power to affect the model’s outputs. Returning to our original example, if a 50 percent change in the HPI assumption has only a negligible impact on the model’s outputs, then a quick verification against the report supplied by the finance department may be sufficient. If, however, the model’s outputs are extremely sensitive to even small shifts in the HPI assumption, then additional testing is likely warranted— perhaps even to include a validation of the analytics vendor’s HPI model (along with all of its inputs). A COST-EFFECTIVE MODEL INPUT VALIDATION STRATEGY When it comes to verifying model inputs, there is no theoretical limitation to the lengths to which a model validator can go. Model risk managers, who do not have unlimited time or budgets, would benefit from applying practical limits to validation procedures using a risk-based approach to determine the most cost-effective strategies to ensure that models are sufficiently validated. Applying the considerations listed above on a case-by-case basis will help validators appropriately define and scope model input reviews in a manner commensurate with appropriate risk management principles. 25 Tim Willis is an experienced model validator and mortgage industry analyst with extensive experience consulting financial institutions of various sizes. He has managed projects validating many different financial models and developed business requirements for a range of technological solutions throughout the mortgage asset value chain. Tim is the author of numerous mortgage industry benchmarking and market research studies and regularly oversees consulting engagements for banks, loan servicers, the GSEs, and U.S. Government agencies. He has held previous positions at First American Corporation, Fannie Mae, and KPMG Consulting. Tim holds a Master of Business Administration from George Washington University and a Bachelor of Arts from Brigham Young University. Chris Welsh has 5 years of experience in the financial services industry, including two years in mortgage banking. His areas of expertise include complex problem solving, quantitative analytics and model validation. Prior to joining RiskSpan, Chris was an analyst with the Global Index Group at NASDAQ OMX. His daily job responsibilities there included the design and modeling of custom index funds. He has also authored and published blogs, economic research reports and industry pieces. He is proficient in Microsoft Excel (VBA) and R programming, and has recently completed a base programming certification in SAS. He also has over 10 years of engineering experience, including project management duties. Chris holds dual master’s degrees in Banking & Financial Services Management from Boston University and Physics from Clark University (Worcester, MA), as well as a bachelor’s degree in Physics/Mathematics from Wheeling (WV) Jesuit College. Steve Sloan Director, CPA, CIA, CISA, CIDA, has extensive experience in the professional practices of risk management and internal audit, collaborating with management and audit committees to design and implement the infrastructures to obtain the required assurances over risk and controls. He prescribes a disciplined approach, aligning stakeholders’ expectations with leading practices, to maximize the return on investment in risk functions. Steve holds a Bachelor of Science from Pennsylvania State University and has multiple certifications. 26 Nick Young is a Quantitative Modeling Manager at Riskspan who has more than 8 years of experience as a quantitative analyst and economist. At Riskspan, Nick performs model development, validation, and governance on a wide variety of models including those used for capital planning, reserve/impairment, credit origination and management and CCAR/DFAST stress testing. Prior to joining RiskSpan, Nick worked at U.S. Bank where he was a senior model risk manager with governance, model validation, and model development responsibilities for the Bank’s Basel II Pillar 1 A-IRB models, Pillar II ICAAP models, credit origination and account management, market risk value-at-risk (VaR), reserve and impairment, and CCAR stress testing. He has a PhD (ABD) in economics from American University, MA in economics from American University and dual BAs in both economics and political science magna cum laude with history and music minors from the University of Oklahoma. 27 Founded by industry veterans, RiskSpan offers end-to-end solutions for data management, risk management analytics, and visualization on a highly secure, fast, and fully scalable platform that has earned the trust of the industry’s largest firms. Combining the strength of subject matter experts, quantitative analysts, and technologists, the RiskSpan platform integrates a range of data-sets–including both structured and unstructured–and off-the-shelf analytical tools to provide you with powerful insights and a competitive advantage. We empower our clients to make accurate, data-driven decisions based on facts and analysis. We make data beautiful. CONTACT Email Shawn Eliav at [email protected] to be connected with our experts. 28
© Copyright 2026 Paperzz