Harnessing Traditional and Alternative Credit Data

WHITEPAPER
Harnessing Traditional and
Alternative Credit Data:
Credit Optics 5.0
March 2013
1
Table of Contents
Introduction2
CREDIT OPTICS 5.0
The Advantages of the ID Network
3
Building Credit Optics 5.0 5
Performance Data 6
Population Segmentation 6
Modeling Algorithm 7
Model Training 7
Variable Selection 8
Model Performance Results 9
Conclusion 13
Introduction
Lenders and service providers are once again focusing on controlled growth and adjusting
to a lending environment that has forever changed. Today’s regulatory and competitive
pressures make it more important than ever for credit providers to make informed, effective
risk assessments. To make the most attractive, yet profitable offers to increasingly in-demand
consumers, it is no longer sufficient to use only traditional credit history data. Conventional
credit scores have been shown to provide a limited view of consumer behavior and its
associated risk. To deliver smart, targeted, offers for credit and services, organizations need
comprehensive and current visibility into consumer risk, which is attainable only through the
combination of traditional and alternative forms of credit data. Credit Optics® from ID Analytics®
combines the power of traditional and alternative data to develop optimal credit decisions,
allowing organizations to grow their portfolios while controlling exposure to risk.
Credit Optics is a FCRA-compliant alternative credit score that provides the powerful,
differentiated insights organizations require to develop a more complete picture of an
individual’s creditworthiness across the entire customer lifecycle. The score is designed to
function either as a standalone credit score or as a ‘plus one’ score capable of meaningfully
enhancing the strength of existing custom and traditional credit scores. The newest version of
the Credit Optics score leverages proprietary cross-industry event and performance data to
provide a significant leap forward in predictive power, while continuing to be uncorrelated with
traditional scores – ensuring it contributes additive benefit when inserted into existing credit
strategies and policies.
This paper discusses how the newest and most powerful version of Credit Optics was built,
including the analytic development process, model performance, stability, and impact of
performance for ID Analytics clients. Credit Optics has been deployed across several industries
including financial services, auto lending, subprime lending, and telecommunications. Using
Credit Optics in combination with traditional credit scores, clients can achieve improvements
in all of the credit decisions which impact a lender’s bottom line, including prescreen, approval,
credit line, and account management decisions.
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
2
The Advantages of the
ID Network
The unique risk perspective of Credit Optics is driven by the ID Network®, a repository of
consumer behavior data from a wider range of industries than other leading sources. A real
time, cross-industry compilation of consumer risk information, the ID Network enables
ID Analytics to deliver reliable, high-resolution visibility into how a consumer behaves across
industries over time. By combining the traditional and alternative credit data found in the
ID Network with world class analytics, Credit Optics is able to deliver a more accurate
assessment of credit risk for more consumers than traditional credit scores alone.
There is no question that traditional
data assets used to create credit
scores provide value in assessing a
consumer’s credit risk. Usage data,
payment behavior and delinquency
information on lending products
provide valuable indicators into the
likelihood of future consumer behavior
on loans and services.
What makes Credit Optics special is
the addition of alternative data not
typically provided in a credit score.
Alternative data assets provide a
broader view of a consumer’s credit
behavior beyond financial services
accounts into other industries and
payment vehicle types. This broader
view, when input into advanced
analytic models provides a more
robust prediction of a consumer’s creditworthiness. Credit Optics includes usage and payment
behavior on wireless phones, utilities and cable, as well as use and payment behavior on payday
loans and other sub-prime lending vehicles. Credit Optics also includes data on demand
deposits accounts and alternative payment data, which can indicate a preference for
non-traditional lending institutions.
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
3
4
Credit Optics®: A Complete Picture of the Consumer
Mortgages
Homeownership and overall debt load
Traditional Data
Credit Card
Utilization and responsible debt use
Auto Loans
Overall debt load
Wireless / Cable / Utility
Monthly obligation and responsibility
Alternative Information
Alternative Payments
Indication of preference for non-traditional lending institutions
Payday / Subprime Lending
Indication of financial instability
Checking / Savings (DDA)
Responsible use of assets
The use of both traditionally predictive and alternative credit data enables Credit Optics
to deliver a unique and predictive new perspective on consumer credit risk that is highly
effective when used as a ‘plus one’ score in addition to an existing credit bureau score or
custom credit score.
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
Building Credit Optics 5.0
Credit Optics has been developed similarly for use in the bankcard, auto lending, and
telecommunications industry. This whitepaper addresses the bankcard version of Credit
Optics 5.0, which was developed specifically for use in prescreen, acquisition, and portfolio
management credit decisions. To develop this model, ID Analytics employed a modeling
technique known as LogitBoost, an extension of the popular AdaBoost method. This method
uses a series of small segmentation trees that negate the need for upfront population
segmentation. The training population contained bankcard inquiries that resulted in
active trades.
The timeframe selected for the model development sample included the first quarter of 2008
through the second quarter of 2011. This time period is important because it covers a rapidly
changing credit environment containing elements of a deep recession as well as its ensuing
recovery. The development sample included a representative population of over 9.4 million
inquiries that resulted in active trades and combined eight different data sources. Having such
a broad sample of data helps deliver the most robust model possible, which provides high
performance during varying economic conditions.
Characteristics for Credit Optics were calculated based on a view of an applicant’s information
from the ID Network, thus not restricted to just financial trades. The layers and complexity
of the data and interactions available through the ID Network mean that many thousands of
variables are generated as candidates to the final model. This data was combined with the many
traditional credit bureau-type attributes, also from the time of application. These may include
elements such as number of bankcard lines open, utilization, and highest credit limit. The unique
insight available through the combination of traditional and alternative credit data leads to a
credit score that performs better than either approach alone.
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
5
a. Performance Data
6
Credit Optics uses a ‘bad’ tag consistent with industry standards: ninety days past due (DPD)
within twelve months of the date of inquiry. Charge-offs, bankruptcies and collections are all
also included in the definition of bad. ‘Good’ accounts were defined as accounts with no more
than thirty days past due within a twelve month time window. All other accounts were excluded
from the modeling population as ‘indeterminate.’
The 9.4 million inquiry sample included portfolios with vastly different bad rates, coming from all
denominations of creditworthiness. The portfolios included in the final consortium had bad rates
ranging from below 1% to over 10%.
b. Population Segmentation
Testing the robustness of the model warranted dividing the population into three distinct time
periods. The early holdout set and the late holdout set were complete ‘out-of-time’ samples
not used in the development of the model. The early holdout set was selected from Q1 of 2008,
representative of applications made during a declining credit landscape. The late holdout set
was selected from Q2 of 2011.
The remaining applications consisted of the model development set; this was subdivided into
various training, testing and holdout sets as shown in Figure 1. All candidate models were
evaluated independently on the three sets: early out-of-time, late out-of-time and in-time
holdout.
The only non-time dependent dimension of segmentation was dividing the population into
‘hit’ versus ‘no-hit’ segments based on the applicant’s presence or lack of traditional credit
relationships. Where an account was deemed a ‘hit’, traditional credit variables were used to
enhance the ID Network-derived variables. In the ‘no-hit’ group the ID Network-derived from
non-traditional financial relationships were used. Data such as cell phone and payday loan
information from the U.S. consumer base is an example of non-traditional data. While these
‘no-hits’ would typically be underserved by traditional credit scores, Credit Optics 5.0 provides a
very reasonable determination as to how these applications would perform.
Figure 1
Credit Optics 5.0: Model Development Sample (‘000s)
SEGMENTATION
HIT
NO HIT
TOTAL
Early Holdout
2,529
279
2,808
Model Development Set
5,534
531
6,065
Late Holdout
528
35
563
Total
8,591
845
9,436
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
c. Modeling Algorithm
The foundational algorithm used to train the Credit Optics 5.0 model is known as LogitBoost,
which is an extension of the popular AdaBoost modeling method. This method uses a log
likelihood loss based on a logit representation of the probability of bad as a function of the
optimized quantity. The key feature of the method is that the solution of the optimization
problem is represented as a sum of simple classifiers (in this case, regression trees). Each
classifier in the sequence is determined by an optimization of the loss function with a prior
determined by the sum of the previous classifiers, which effectively reweight the examples,
focusing on those that were poorly classified. This weighting is not ad hoc; rather it derives
directly from the optimization of the log likelihood loss function in the presence of a prior.
The distinctive characteristics of LogitBoost relative to AdaBoost are the use of the log
likelihood rather than exponential loss function, as well as implementing a Newton update
with each iteration, which provides a more robust approach to the optimal solution. The final
algorithm contained a set of proprietary modifications to the LogitBoost approach that added
quantitative improvement to the model.
d. Model Training
Many combinations of parameters were explored in building the latest Credit Optics model.
These parameters included differential bad sampling, time-decay weighting, scoring
initialization, adjusting model parameters and using novel proprietary transformations of the
data. Pre-segmentation strategies were also tested, however a model using two segments, hit
versus no-hit, as described above, was found to be the best performing and most robust.
Each model was tested on four different training datasets, selected from the ‘model
development set’ time period and including several hundreds of thousands of accounts chosen
from each of the different datasets. The models were also compared to other models built on
custom versions of the datasets and were shown to be comparable in terms of performance.
Candidate variables were considered in several rounds of testing, to ensure that the model
included a set of variables that contributed significantly to the model’s performance and to
ensure that only variables that did not appear to have an interaction with the calendar or credit
landscape at the time were selected – this step was critical in ensuring a model that was the
most robust. The final candidate model was one that performed best on population weighted
average across the datasets on the held-out population.
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
7
e. Variable Selection
8
The model’s performance across time periods and portfolios was enhanced by a careful
selection of the variables to be included in the final model. For variable selection, ID Analytics
employs a backward selection methodology with a model-driven performance metric. In
addition to the typical scrutiny applied to any variables to be included in credit models, each
candidate variable was rigorously tested to see if there was any disparate effect with regards
to credit regime – effectively whether a variable had a different effect on the score produced
during the recession rather than at other times. Figure 2 provides an example of one such
variable, which was excluded from the model. In this example, the relative ratio of ‘bads’ and
‘goods’ is dependent on time – much higher during the recessionary period - which could
adversely affect the model’s performance over time. A number of such variables were excluded.
Figure 2
Mean Number of Auto Trades
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
2008
2009
Good Accounts
2010
Bad Accounts
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
2011
4. Model Performance Results
9
The overall model performance, considering all hits and no-hits over the entire holdout set
(much of it out-of-time) had a maximum overall KS of 39. Breaking down the performance data
by credit class, the model has KS values ranging from 27 to 46, with a good performance in all
three time periods, as shown in Figure 3.
Figure 3
Credit Optics 5.0: Maximum KS
CREDIT CLASS
EARLY HOLD-OUT
IN TIME
LATE HOLD-OUT
Prime
45
46
38
Near-Prime
41
39
34
Sub-Prime
28
31
27
It’s worth noting that the KS values shown within each credit class are lower than the maximum
KS values observed on files spanning all credit classes. The lower KS values within credit
classes are expected, as the KS statistic will typically decrease as populations become more
homogenous.
While different credit classes exhibit different levels of performance, they were consistently
within a percent or two of a custom model generated using a specific dataset/time period
combination. There was no significant bias in performance by credit class. As seen below,
prime portfolios have a KS in the mid-40s, a near prime portfolio performs in the low-40s and a
mid-subprime also performs in the low-30s. The more subprime (as determined by overall badrate) dataset had a generally lower performance, but this was a function of the greater number
of no-hits present in that portfolio. Indeed, this specific portfolio had a no-hit rate of thirty
percent, against the less than ten percent for the entire population.
Figure 4 breaks down the Figure 3 results to show performance on the “hit” and “no-hit”
populations:
Figure 4
Credit Optics 5.0: Maximum KS
CREDIT CLASS
HIT
Prime
46
Near-Prime
39
Sub-Prime
39
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
NO HIT
31
Perhaps as important as the model’s overall predictive strength, is the lift that it provides when
used in conjunction with a traditional credit score as a plus one. Figure 5 demonstrates the lift
provided to a Top 10 Bankcard issuer who tested Credit Optics both as a standalone credit score
and as a plus one with its existing credit score. Credit Optics, when used as a plus one, provided
a significant lift over using either Credit Optics or the traditional credit score as a standalone risk
assessment tool.
Figure 5
Credit Optics 5.0: KS Results for a Top 10 Issuer
MODEL
MAXIMUM KS
Traditional Score
42
Traditional Score + Credit Optics
53
While the bureau score the issuer was using performs well, combining it with Credit Optics
makes a powerful difference. Figure 6 shows additional detail illustrating how Credit Optics
provides meaningful new information capable of significantly improving an organization’s risk
decisions.
Figure 6
Traditional Risk Scores
Credit Optics 5.0: Maximum KS
High
Risk
2
3
Below 595
6.1%
3.9%
3.6%
3.5%
595 – 654
5.2%
4.1%
3.9%
3.8%
655 – 694
4.7%
3.0%
2.5%
1.8%
1.8%
4
6
7
8
9
Low
Risk
3.2%
3.1%
3.2%
3.0%
2.3%
1.3%
3.3%
2.6%
2.5%
2.1%
2.0%
1.7%
1.5%
2.9%
1.6%
1.5%
1.4%
1.0%
1.2%
2.1%
5
Total
695 – 719
4.4%
2.0%
1.6%
1.5%
1.4%
1.0%
1.0%
0.8%
0.8%
0.5%
1.5%
720 – 734
3.3%
1.5%
1.2%
1.0%
0.8%
0.7%
0.7%
0.5%
0.5%
0.5%
1.0%
735 – 749
2.5%
1.0%
0.8%
0.8%
0.6%
0.5%
0.6%
0.3%
0.5%
0.2%
0.8%
0.6%
750 – 759
1.6%
0.8%
0.7%
0.5%
0.3%
0.5%
0.3%
0.3%
0.2%
0.2%
760 – 769
1.2%
0.6%
0.5%
0.3%
0.3%
0.2%
0.3%
0.2%
0.1%
0.1%
0.5%
770 – 785
1.0%
0.3%
0.5%
0.5%
0.3%
0.2%
0.2%
0.2%
0.2%
0.1%
0.3%
Above 785
0.8%
0.3%
0.2%
0.2%
0.1%
0.1%
0.1%
0.1%
0.1%
0.1%
0.2%
4.3%
2.9%
1.6%
1.3%
0.9%
0.7%
0.6%
0.5%
0.3%
0.2%
1.3%
Total
Accounts with similar performance and vastly different Risk Scores
As seen above, Credit Optics and the traditional risk score are not highly correlated (correlation
coefficient = 0.642). This means that Credit Optics has additive power to differentiate high risk
consumers who score low risk with a traditional credit score (important for risk assessment) and
low risk consumers who score high risk with a traditional credit score (important for expanding
growth populations).
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
10
Figure 6 illustrates the ability of the Credit Optics score to separate risk within traditional score
bands. This can be seen by examining each row, where a population traditionally viewed
as homogenous by a traditional score is further segmented into more granular populations
of varying risk as identified by the Credit Optics score. Across rows one can discern many
examples similar to the one highlighted where the overlay of the Credit Optics risk assessment
identifies populations of similar risk but significantly different traditional scores. In each of these
instances, Credit Optics is providing the issuer with refined insight into consumer credit risk
required to make more informed, profitable lending decisions.
Further evidence of the robust performance of Credit Optics is that the model’s odds remain
consistent through time. Figure 7 demonstrates the log-odds ratio by decile of the entire
scored population, ordered by score (from high to low) using an equal binning method. A logodds of two corresponds to an odds ratio of 100 (very low probability of going bad as per the
definition), which is consistent with being given a low score. What’s important to note is that
these lines are reasonably consistent across time.
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
11
Figure 7
2
12
Credit Optics Log-Odds Ratio by Population Decile:
Prime Portfolio
1.5
2009
1
2010
2011
0.5
0
-0.5
Low Risk
Credit Optics Decile
High Risk
Observing the results at the portfolio-level, as seen in Figure 8 for the subprime portfolio, the
lines are even more consistent. The odds are slightly worse at each decile with respect to the
overall population, especially in later deciles, which would be expected for a subprime portfolio.
The decrease in odds by decile is consistent through time, showing slightly improving overall
odds between 2008 and 2010, which reflects the credit environment.
Figure 8
Credit Optics Log-Odds Ratio by Population Decile:
SubPrime Portfolio
3
2008
2.5
2009
2
2010
1.5
1
-0.5
0
Low Risk
Credit Optics Decile
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
High Risk
Conclusion
As lenders, telecommunications providers and utility companies focus on acquiring and
retaining highly profitable consumer relationships while controlling credit risk and complying
with all regulatory requirements, there is a need to achieve a new level of visibility into a
consumer’s credit profile. Traditional credit scores cannot provide a complete view of the
consumer’s credit history due to the limited data used to calculate these scores. To identify,
acquire and cultivate the right consumers for credit and service offers, organizations need more
complete, up-to-date access into a consumer’s risk assessment. This is achieved through the
unique combination of traditional and alternative credit data.
Credit Optics is designed to accurately predict credit risk on its own, boost the power of
traditional credit scores, and provide actionable intelligence on the emerging market. Using
the newest version of Credit Optics, clients can achieve increased revenue and decreased
credit losses through improvements in prescreen, approval, credit line, pricing, and portfolio
management decisions. Credit Optics is built to be FCRA compliant, while also being
transparent and consumer-friendly, enabling organizations to incorporate Credit Optics into
current credit risk strategies with complete confidence.
For more information on how Credit Optics can help your company confidently make more
informed, profitable lending decisions, contact us today at [email protected],
858-312-6200, or visit www.idanalytics.com
HARNESSING TRADITIONAL AND ALTERNATIVE CREDIT DATA: CREDIT OPTICS 5.0
13
www.idanalytics.com
© 2014 ID Analytics. All rights reserved.