Machine learning tehniques for credit risk modeling in practice Balaton Attila OTP Bank Analysis and Modeling Department 2017.02.23. „Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data…”* * Wikipedia New, complex databases needed new modeling tools Powerful models Internal databases GIRINFO Utility companies Comlex, large database to be analyzed Retailers Social networks Better GINI Machine learning Wide range dataset about costumer behavior Recognise connection between different variables 3 Why Machine Learning: to mine new, large complex datasets The actual phenomenon Machine Learning Traditional Stats BAD BAD GOOD GOOD Traditional stats will fit a predetermined (linear, quadratic, logarithmic) function to the data ML algorithms do not use predetermined function so that they can build a model closely to fit with data Self-learning Not available Self-learning possible to some extent (variable weight can be changed automatically) Regular expert supervision needed Dataset and Complexity Adequate for well-structured databases Can’t handle complex, poorly structured datasets Works well with small or poorly-structured datasets Recognizes complex patterns Easy to interpret the results and the effect of explanatory variables Model interpretation requires expertise Less computationally intensive Demands more computational power Description Intrepetation of results Hardware capacity 4 Need of interpretability Completely understandable Well comprehensible „Black box” sec min hour day week month year Forecast timeframe New risk models is a sub-project of the Banks’ Digital Strategy • OBRU, OBR application models • OBR application • models Internal development of at least two scorecards with AMM Internal OTP HU application • development of a models new scorecard using • Trainings for AMM techniques • Regular trainings about subsidiaries about the alternative AMM techniques • • AMM belongs to the folclor during model development • Subsidiaries test AMM techniques • External teams support the validation process • Establishing a Python server Jun. 2016 • Involving OTP HU Big Data enviroment • Weblog and detailed transactional data for fraud prevention Dec. 2016 Jun. 2017 Dec. 2017 6 Machine Learning in practice Machine Learning techniques can not replace the whole „classic” model development lifcycle We tested the following Machine Learning techniques… Random forest: „average” of a lot of random trees. Did not perform well for scorecards. Support Vector Machine: the goal is to find the appropriate hyperplane with optimal separational power. The extended version with kernel functions had capacity issues so it was only a supporting algorithm combined with regression. Neural network: can not be used for real time decision making Boosting techniques: supervised repeating of „weak classificators” (decision trees, regression, …) lead to a stronge classificator Main types of boosting: • AdaBoost: underweight well classified and overweight misclassified elements in every round • LogitBoost: special type of Adaboost where the loss function is logistic • Gradient Boosting Trees: special type of LogitBoost, loss function is decreased along the gradient Random Forest AdaBoost • Base classifiers: C1, C2, …, CT • In step m, find best Cm in predefined class using weights wi • Error rate: N m wi Cm ( xi ) yi i 1 • Importance of a classifier: 1 1 m m ln 2 m AdaBoost • Weight update: wi ( m 1) wi( m ) Zm exp m m exp if Cm ( xi ) yi if Cm ( xi ) yi where Z m ensures wi ( m 1) 1 • Classification: T C * ( x i ) arg max m Cm ( x i ) y y m 1 AdaBoost example from “A Tutorial on Boosting” by Yoav Freund and Rob Schapire AdaBoost example AdaBoost example Compute , α and the weight of the instances in Round 2 AdaBoost example AdaBoost example Main AdaBoost idea and a new idea • “Shortcomings” are identified by high weight data points • The new model (e.g. stump) is fit irrespective to previous predictions • In next iteration, learn just the residual of present model F(x): • Fit a model to (x1; y1 – F(x1)); (x2; y2 – F(x2)); …; (xn; yn – F(xn)) • Regression, no longer classification! LogitBoost: The additive logistic regression model • Logistic regression learns linear combination of classifiers for the “log odds-ratio” M 1 P y 1 | x log F ( x) f (m) ( X ) 2 P y 1 | x m 1 • The logit transformation guarantees that for any F(x), p(x) is a probability in [0,1]. inverting, we get: p( x) : P y 1 | x 1 1 e 2 F ( x ) • Function of real label y – p(x) will be instance weigth LogitBoost Algorithm Step 1: Initialization committee function: F ( 0) ( x) 0 p ( 0) ( x) 1 initial probabilities: 2 Step 2: LogitBoost iterations for m=1,2,...,M repeat: A. Fitting the weak learner: 1. Compute working response and weights for i=1,...,n wi( m ) p ( m 1) (1 p ( m 1) ) z (m) i yi p ( m 1) ( xi ) wi( m ) 2. Fit a stump f (m ) by weighted regression f ( m ) arg min f n (m) (m) 2 w ( z f ( x )) i i i i 1 Optimization is no longer for error rate but for (root) mean squared error (RMSE) LogitBoost Algorithm B. Updating and classifier output F (m) C ( xi ) F (m) p ( m 1) 1 (m) ( xi ) f ( xi ) 2 ( xi ) Sign F ( m) ( xi ) (m) ( xi ) 1 1 e 2 F ( m ) ( xi ) Regression Tree, Regression Stump Regression stump example: TIME_FROM_FIRST_SEND <= 25.5 true false 0.14627427718697633 -0.14621897522944 Gradient Boosting As in LogitBoost • Iterative prediction y*m • Residuals: y*m+1 = y*m + h(x) where h(x) is a simple regressor, e.g. stump, shallow tree New idea: optimize by gradient descent • If we minimize the mean squared error to true values y averaged over training data: Derivative of (y*-y)2 can be computed and will be proportional to the error y*-y 2 𝑑 𝑦𝑖 − 𝐹𝑗 𝑥𝑖 = 2 𝑦𝑖 − 𝐹𝑗 𝑥𝑖 = 2 ∙ residual 𝑑𝐹𝑗 𝑥𝑖 𝑗 𝑗 Well… this is just LogitBoost without the logistic loss function Other loss functions • Squared loss is overly sensitive to outliers • Absolute loss more robust to outliers but has infinite derivative 𝐿 𝑦; 𝐹 = |𝑦 − 𝐹(𝑥)| • Huber loss 1 (𝑦 − 𝐹)2 𝐿 𝑦; 𝐹 = 2 𝛿( 𝑦 − 𝐹 − 𝛿/2) if 𝑦 − 𝐹 ≤ 𝛿 if 𝑦 − 𝐹 > 𝛿 Negative gradient is −𝑔 𝑥𝑖 𝑑𝐿 𝑦; 𝐹 𝑥𝑖 = 𝑑𝐹 𝑥𝑖 𝑦 − 𝐹(𝑥𝑖 ) = 𝛿sign(𝑦 − 𝐹(𝑥𝑖 )) if 𝑦 − 𝐹(𝑥𝑖 ) ≤ 𝛿 if 𝑦 − 𝐹(𝑥𝑖 ) > 𝛿 Two main Gradient Boosting Loss Functions Deviance (as in Logistic Regression) 1 1 + 𝑒 −𝐹(𝑥) Exponential (as in AdaBoost) 1 1 y F ( x) exp ln 2 y F ( x ) Python server OTP Python server (2016Q2) Red Hat Linux with Anaconda Jupyter notebook - More effective than localhost enviroments - Background mode - Web application Application credit fraud prevention SAS Fraud Solution – Logical system architecture System architecture and working logic of OTPHU’s fraud system Hybrid approach Social Network Analysis Predictive Modeling Anomaly Detection Automated Business Rules Alert Generation Process Entity matching Rejection Process Rejection Application data Historical data Behavioural data Entity matching Network building Scenario scores Fraud scoring Manual investigation Approval Approval Investigation List of suspicious applications Scenario analysis Network analysis Thank you for your attention!
© Copyright 2026 Paperzz