Maximize AUC in Default Prediction: Modeling and Blending Liang Sun [email protected] Tomonori Honda [email protected] Vesselin Diev, Gregory Gancarz, Jeong-Yoon Lee, Ying Liu, Mona Mahmoudi, Raghav Mathur, Shahin Rahman, Steve Wickert, Xugang Ye, Hang Zhang Abstract In this paper we present models and blending algorithms to maxmize the Area Under Receiver Operating Characteristic (ROC) curve in default prediction. We summarize all techniques and algorithms we have applied in the Give Me Some Credit competition so that future users can benefit from our experiences in this competition, including feature creation algorithms, single model constructions and blending algorithms. In particular, in this paper we highlight the following aspects: (i) different feature creation methods we explored, (ii) diverse packages utilized for single model building and the parameter tuning for each model, (iii) different blending, and (iv) additional postprocessing methods. 1. Introduction The recent Give Me Some Credit (GMSC ) competition1 organized by Kaggle focused on predicting the probability that some bank customers would experience financial distress in the next two years. This is a very interesting classification problem having limited (11) original features and low target rate ( 5%) which represents customers who actually default. The approaches developed for this competition are applicable not only to the credit & risk community, but also to other industries with similar classification problems. In this paper, we summarize all techniques and algorithms we applied in the GMSC competition, including feature creation, single models and blending models. In particular, we would like to highlight (i) different feature creation methods, (ii) diverse packages 1 http://www.kaggle.com/c/GiveMeSomeCredit Appearing in Proceedings of the 1 st Technical and Analytical Conference of Opera Solutions, San Diego, USA, 2012. and algorithms utilized for single model building, (iii) different blending, and (iv) additional postprocessing of these models. During the GMSC competition, several data sets were created. One of the main differences between these data sets was the handling of missing values and outliers. Additionally, many data sets transformed original features into weight of evidences. In particular, a novel effective variable 2D weight of evidence was created to capture the underlying information. Based on the created data sets, many different single models were developed; e.g., random forest, gradient boosting decision tree, alternating decision tree, logistic regression, artificial neural networks, support vector machine, elastic net, and k-nearest neighbors. In addition, we applied residual postprocessing to improve the peformance of single models. For example, the residual of the Gradient Boosted Machine (GBM) model was successfully modeled using a random forest model and improved performance was achieved. Many different blending algorithms were explored during the GMSC competition. The transformation which converts the probability of default into a rank tends to improve the performance of blending algorithms when the evaluation criterion is AUC. We have designed a class of statistical aggregation algorithms to combine a set of predictions together. In order to effectively utilize the public leaderboard score for each individual prediction, we further propose the rank-based oracle blending. Compared with the statistical aggregation algorithms, the weight of each single model is a function of its public AUC score so that better models have larger weights in rank-based oracle blending. Note that blending is typically the last step in creating final predictions. However, we can further utilize the blending result to improve the performance of single models and the overall performance. One approach we followed was to use predictions from blending as new targets for single models. This can help remove target outliers and improve performance. Another approach we followed was to determine the problematic Maximize AUC in Default Prediction: Modeling and Blending population (that segment of data which does not rank order well) using the blended results and build separate models for this problematic population. Note that the original blended prediction can be a new key feature for this problematic segment. It works best when we build many different single models again for this problematic segment. A key requirement to implement this approach is that predictions on training, validation and test data sets should be available for each single model. 2. Evaluation Criterion: Area Under ROC Curve The evaluation criterion for this GMSC competition was the Area Under ROC Curve (AUC). The Receiver Operating Characteristic (ROC) curve of a decision function f plots the true positive rate on the y-axis against the false postive rate on the x-axis. Thus the ROC curve characterizes every possible trade-off between false positives and true positives that can be achieved by comparing f (x) to a threshold. Note that the ROC curve is a 2-dimensional measure of classification performance, and the AUC is a scalar measure to summarize the ROC curve. Formally, the AUC is equal to the probability that the decision function f assigns a higher value to a randomly drawn positive example x+ than to a randomly drawn negative example x− , i.e., AU C(f ) = P r(f (x+ ) > f (x− )). (1) Theoretically, the AUC refers to the true distribution of positive and negative instances, but it is usually estimated from samples. It can be shown that the normalized Wilconxon-Mann-Whitney statistic gives the maximum likelihood estimate of the true AUC given n+ positive and n− negative examples (Yan et al., 2003): Pn+ Pn− ˆ C(f ) = AU i=1 j=1 1f (x+ )>f (x− ) n+ n− i j , (2) where 1fi >fj is the indictator function which is 1 if fi > fj and 0 otherwise. In fact, the two sums in Eq. (2) iterate all pairs of positive and negative examples. Each pair that satisfies f (x+ ) > f (x− ) contributes 1/(n+ n− ) to the overall AUC estimate. As a result, maximizing AUC is equivalent to maximizing the number of pairs satisfying f (x+ ) > f (x− ). Note that the number of all pairs is O(n2 ) where n is the sample size in the training data set. We will discuss below how to focus attention on the problematic population, which happens to be the top 20% people with highest blended scores for the GMSC Competition. The AUC can also be calculated in a few different ways, e.g., numerically integrating the ROC curve. Another alternative to compute it is via the Wilcoxon rank sum test as follows: + + n (n 2 ˆ C(f ) = U 1 − AU n+ n− +1) (3) where U 1 is the sum of ranks of members in the positive class (or members who defaulted). This AUC equation helps us to blend models in the ranking space rather than the probability space. 3. Data Description and Feature Creation 3.1. Raw Data Description There were 10 raw feature variables and 1 target variable. These raw feature variables were the following: RevolvingUtilizationOfUnsecuredLines, Age, NumberOfTime30-59DaysPastDueNotWorse, DebtRatio, MonthlyIncome, NumberOfOpenCreditLinesAndLoans, NumberOfTimes90DaysLate, NumberRealEstateLoansOrLines, NumberOfTime6089DaysPastDueNotWorse, NumberOfDependents. The meaning of these variables should be selfexplanatory. Note that there were some missing values in the variables DebtRatio and MonthlyIncome. Also there were some obvious outliers in a few variables. 3.2. Data Imputation and Feature Creation During the GMSC competition, at least 12 different data sets were created and the main differences were in the handling of the missing values and outliers. In some data sets, new derived features such as the weight of evidence were also created. In this subsection, we focus on the imputation of missing values and the new variable 2D weight of evidence. In a few data sets that utilized binning, missing values were assigned separate bins. In some data sets, the missing values of the MonthlyIncome and NumberOfDependent variables were imputed with their median values. In other data sets, the missing value of the MonthlyIncome was imputed by regressing on the remaining variables. Additionally, in some data sets, special attention was given to how to handle the missing values in the variable MonthlyIncome and the unreasonably huge values in the variable DebtRatio that were almost always accompanying the missing values in MonthlyIncome. It is speculated that huge values in DebtRatio were the actual monthly debt instead of DebtRatio since a reasonable DebtRatio should be less Maximize AUC in Default Prediction: Modeling and Blending than 1 or close to 1. Thus, it is reasonable to assume that when the MonthlyIncome was missing, the actual MonthlyDebt was used as the DebtRatio since the MonthlyIncome is used as the denominator in computing DebtRatio. As a result, the MonthlyDebt, where MonthlyIncome was not missing was calculated by: MonthlyDebt = MonthlyIncome × Ratio (4) For those with missing MonthlyIncome, the MonthlyIncome was set 0, and the MonthlyDebt took the original value in DebtRatio. This demonstrates the usefulness of spending time for analyzing raw data. One of the common transformations for the original variables is weight of evidences. Many of the generated data sets contain weight of evidences for each original variable separately after binning them. Additionally, one data set utilized moving bins rather than static bins by determining local weight of evidence using a fixed percentile around each individual member’s variable values. In the credit and risk community, it is well-known that high credit line utilization and high number of days past due often lead to serious delinquency. Therefore, a 2D weight of evidence on these two variables could be a predictive variable. A new data set was created with this 2D weight of evidence as variable dlqUtil. First the three variables NumberOfTimes90DaysLate, NumberOfTime60-89DaysPastDueNotWorse, and NumberofTime30-59DaysPastDueNotWorse, which are related to delinquency, were added up to create new variable dlq. Then it was crossed with the variable RevolvingUtilizationOfUnsecuredLines to produce the risk table. The 2D risk table is presented in Table 1. It can be observed that the risk increases as the number of times past due increases. Also the risk increases as the credit line utilization increases. One interesting feature noticeable is that people with strictly zero utilization of credit line have higher risk than those with slight utilization (0-0.1). These are well-known phenomena in the credit and risk community and are thought to be from the fact that people with no credit utilization are inexperienced with financial management and tend to be higher risk. 4. Single Models Based on the data sets created in Section 3.2, numerous single models were built that investigate the data from different perspectives. In this competition, the investigated single models include: 1. Linear regression and its variants, e.g., Ordinary Least Squares (OLS) and Elastic Net (Zou & Table 1. Risk table crossing total times past due and credit line utilization. Util \ Dlq 0 0 - 0.1 0.1 - 0.3 0.3 - 0.6 0.6 - 0.8 0.8 - 0.9 0.9 - 0.95 0.95 - 1 1+ 0 0.01430 0.00873 0.01773 0.03625 0.06380 0.08148 0.09560 0.08108 0.13738 1 0.05628 0.05012 0.07252 0.10145 0.13140 0.18105 0.18083 0.24508 0.35062 2 0.18868 0.15085 0.13501 0.18249 0.26316 0.26346 0.32273 0.32388 0.40761 3+ 0.38095 0.28483 0.25829 0.34615 0.42464 0.43797 0.48795 0.51308 0.61748 Hastie, 2005). 2. Logistic Regression. 3. Decision Tree based algorithms, such as Random Forest, Gradient Boosting Tree, and Alternating Decision Tree. 4. Artificial Neural Networks, e.g., Multi-Layer Neural Networks (MLNs) and Restricted Boltzmann Machine (RBM). 5. Classifiers based on Bayesian statistics, e.g., Naive Bayes classifier. 6. Support Vector Machine (SVM). 7. k-Nearest Neighbor (kNN). Note that this is not a complete list but it outlines the most important single models attempted in this competition. Since different models have different assumptions, some data sets will work better with one model than with others. For example, Decision Tree based algorithms can handle outliers better than Neural Network based algorithms, but they tend to introduce overfitting in the model training. In general, the best performing single models were Decision Tree based models like Gradient Boosting Tree, Random Forest, Alternating Decision Tree, and REPTree. The next best algorithm was Artificial Neural Networks. These models were followed by Naive Bayes, SVM, Elastic Net, and Logistic, which produced models with similar accuracy. These results imply that the data set may still contain outliers even with extensive outlier treatments and also showed that there may be significant coupling that must be exploited by the model. As an example of a single model building effort, we will demonstrate the best single model, which is gradient boosted decision tree with residual postprocessing by random forest, and another method (SVMperf) that optimizes AUC directly. Maximize AUC in Default Prediction: Modeling and Blending 4.1. Gradient Boosted Decision Tree with Residual Postprocessing by Random Forest is a complexity factor that balances complexity of the model. Gradient boosted decision tree (Friedman, 2001) is a machine learning technique for regression problems which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. The gradient boosting method can also be used for classification problems by reducing them to regression with a suitable loss function. In this approach, the residual of the GBM is further modeled using a random forest (Breiman, 2001). Specifically, a GBM model was built first to predict the labels of members in the training set, i.e., YT rain = f GBM (XT rain )+T rain , where is the residual from the GBM model. The parameters for the GBM model were as follows: 1500 trees, shrinkage parameter 0.01, depth of each tree 14, and minimal size of each leaf node 10. In the second stage, a random forest (RF) model was then built to model the residuals . Random forest (or random forests) is an ensemble classifier that consists of many decision trees and outputs the class that is the mode of the class’s output by individual trees. The RF model we built could be 0 denoted as T rain = f RF (XT rain ) + T rain . The parameters of the RF model were as follows: 3 variables in each tree, 500 trees, and the minimal size of the leaf nodes 300. Thus, the prediction on the test set was ŶT est = GBM GBM (XT est ), and RF ŶTGBM T est , where ŶT est = f est + ˆ RF RF ˆ T est = f (XT est ). 4.2. Support Vector Machine A variation of SVM called SVMperf (Joachims, 2006) maximizes the AUC directly as the algorithm minimizes 1-AUC as its loss function. A typical SVM loss function based on minimizing the error rate is given by: min w,ζi ≥0 n CX 1 T w w+ ζi 2 n i=1 ! (5) such that yi (wT xi ) ≥ 1 − ζi , ∀i ∈ {1, . . . , n} (6) where w is the weight vector, ζi are the slack variables indicating the amount of error we are allowing, and C A loss function based on minimizing 1-AUC (maximizing AUC) is: + 1 C min wT w + + − 2 n n w,ζi ≥0 − n X n X ζi,j (7) i=1 j=1 such that (wT xi ) − (wT xj ) ∀i ∈ {1, ..., n+ } (8) ∀j ∈ {1, ..., n− } ≥ 1 − ζi,j , ⇔ ∀(i, j) : yi > yj One can see that the constraints relate directly to Equation 8 for AUC. The problem is that in this formulation we have O(n2 ) such constraints for all pairs of observations. Joachims (2006) proposed a new approximation algorithm (called cutting plane algorithm) that reduces the number of constraints to O(n) thus making the overall algorithm complexity O(n log n) driven by a sorting operation. This algorithm starts with an empty set of constraints W and keeps the most violated constraint in each iteration. 5. Blending Algorithms Note that the performance measure in the GMSC competition was AUC, which is different from the RMSE criterion widely used in many competitions such as the Heritage Health Prize and Netflix. There are broadly two types of blending algorithms, one that focuses on blending similar models and another that focuses on blending a wide variety of different models. For blending of similar models, we have successfully implemented (1) Genetic Algorithms to boost the performance of Artificial Neural Networks and (2) Voting Method to boost Naive Bayes Classifier and Particle Filter. These approaches are designed to search through a particular model space to optimize the AUC directly (see GMSC Technical Report (Diev et al., 2012)). In the rest of this section, we discuss implemented blending algorithms, including genetic algorithms using populations of feedforward neural networks, statistical aggregation of predictions, and oracle blending that incorporates AUC scores in the public leaderboard as feedback. Maximize AUC in Default Prediction: Modeling and Blending 5.1. Genetic Algorithms Using Populations of Neural Networks Artificial neural networks trained using backpropagation (Rumelhart et al., 1986) typically seek to reduce the mean squared error between actual and expected values during training. The metric for the GMSC competition, however, was AUC. Training by backpropagation normally produces a neural network model with good AUC as well, but these two performance metrics are not completely aligned. For competitions such as the GMSC where even incremental gains are important, we may be able to boost the performance of neural nets by optimizing AUC directly. Genetic Algorithms (GAs) are inspired by evolution and natural selection in biology (see, for example, (Mitchell, 1996)). We can evolve populations of neural nets. The great advantage of doing so is that we may use AUC directly as the fitness function. We encoded each neural net population member as a single linear chromosome of real-valued numbers representing the ordered weights and biases of each network unit. In the simplest approach, all population members maintain the same network topology (inputs, numbers of hidden units, and connectivity), though topological variations can be accommodated as well. The driving mechanism for fitness increases in the population is the creation of new population members that are fitter than any of their parents. The population represents a pool of genetic diversity from which newly born entities can draw, and we used both single-point mutations and multi-point crossovers to generate new genetic diversity. We evaluated several different mutation schemes, and found the most effective for this competition to be mutation by an amount proportional to the current value of the allele and a randomly selected lognormal mutation factor. This scheme avoids large disruptions in the scale of alleles, while exploring occasional larger deviations than Gaussian perturbations would provide. We explored a range of combinations for the tunable parameters of the GA runs, including population size, number of hidden units, crossover rate, mutation rate, and various other parameters. Results for some representative runs are shown in Table 2. 5.2. Statistical Aggregation The basic idea behind statistical aggregation is the socalled wisdom of crowds (Mitchell, 1997). Statistical aggregation works best when predictions from different models produce random unbiased error by looking at data from different perspectives. Ideally, error will be Table 2. GMSC Results for Sample GA Runs Using Neural Networks. GA Run garoc 7 garoc 8 garoc 9 Pop. Size 500 800 1000 Public AUC 0.862817 0.862783 0.862814 Private AUC 0.868301 0.868267 0.868182 reduced proportional to the square root of the number of predictions. The statistical aggregation of predictions methods we applied include: (1) Mean Blending, (2) Median Blending, and (3) Expected Value Blending. These aggregation algorithms can be performed in both (a) the probability space which includes original predictions from each single model and (b) the ranking space which converts the probability into the ranking order in each prediction. Since AUC is a function of ranking rather than probability, it is reasonable to perform aggregation in the ranking space. We compared mean, median, and expected value blending algorithms in both the probability space and the ranking space and the results are summarized in Table 3. It can be observed that the aggregation using ranking usually achieves better AUC performance. Furthermore, expected value aggregation considers covariance of the predictions, but tends to underperform compared to mean and median blending. One possible reason is that the Gaussian distribution assumption is too strong to be true for real data. Table 3. Comparison of AUC for Statistical Aggregation Blending Algorithms. Num Model 32 63 Alg. EV Mean Median EV Mean Median Probability Space Public Private 0.8614 0.8678 0.8620 0.8679 0.8622 0.8680 0.8609 0.8670 0.8626 0.8684 0.8624 0.8681 Ranking Space Public Private 0.8621 0.8678 0.8621 0.8680 0.8622 0.8681 0.8610 0.8664 0.8628 0.8686 0.8628 0.8684 5.3. Rank Based Oracle Blending Rank Based Oracle Blending is an extension of the statistical aggregation algorithms. Compared with the statistical aggregation algorithms, the weight of each single model is a function of its public AUC score so that better models have larger weights in rank based oracle blending. Note that we assume that the public AUC score reflects private AUC score accurately (which may not be true if the populations underlying Maximize AUC in Default Prediction: Modeling and Blending public and private scores have a different distribution). Formally, given N predictions {~ pi , i = 1, ...N }, the mathematical formulation for the rank based oracle is given in Eq. (9): P w(AU Ci )R(~ pi (j)) Rf (j) = i P , (9) w(AU C i) i where R is the operator which maps the prediction from the probability space to the ranking space (in ascending order), Rf is final ranking, AU Ci is the model i’s public AUC score, and w represents weighting as a function of AU Ci . Note that the final ranking is normalized by dividing by the sum of the weights. Specifically, the following weighting functions were explored: wa (AU Ci ) wb (AU Ci ) = = (AU Ci − AU C) trained two models on the jl-WOE (jl-11) data set. Both models used gradient boosting trees as the first stage and then postprocessing by training a random forest model on the residual of the GBM scores and the true target. The differece between these two models is that different targets were used at the GBM stage: one used the true target and the other used the blended results as target. The AUC scores are summarized in Table 4. Table 4. Comparison of AUC between two models using different targets. Target True target Blending target Training 0.89169 0.88925 Validation 0.86199 0.86369 Test 0.85961 0.86099 (10) 2 (11) 6.2. Top 20% Segmentation 4 Another approach successfully boosting the blended model is a method which segmented the problematic population first and then built new models on this segmentation. In the GMSC competition, we focused on the top 20% of the population that is likely to default. The intuition behind this approach is that classifying 80% of the population as most likely NOT to default is much easier than classifying the remaining 20% of the population that is ranked near the borderline. Note that this cutoff will be different for different problems and is highly dependent on the problem details. (AU Ci − AU C) wc (AU Ci ) = (AU Ci − AU C) (12) wd (AU Ci ) = (13) we (AU Ci ) = R(AU Ci ) 1 , N − R(AU Ci ) + 1 (14) where AU C is a constant which should be optimized for particular problems. However, most of these weighting formulas are not robust to mediocre predictions and require prescreening to reduce the noise. Interestingly, we , which is motivated by weighting for the modified Borda count used in Nauru, is significantly more robust to the mediocre predictions and correlated predictions. 6. Boosting Model Performance by using Blended Results The blending is not the last step in our modeling process. In our modeling, the blended results are further utilized to improve the performance of single models. Specificially, we have attempted two approaches: 1) use the blended result as the new target and retrain the single model, which can remove the outliers and lead to a better single model; 2) isolate the problematic population using blended results. 6.1. Single Model Trained on Blending AUC Scores In this approach, after training several single models and building the blending model, the target on the training data set is replaced with the prediction of the blending model and the new target is used to retrain single models. The effectiveness of this approach is confirmed by our empirical results. Specifically, we Formally, we assume that {~ ptrn pvld i , i = 1, ...N }, {~ i ,i = tst 1, ...N }, {~ pi , i = 1, ...N } are sets of training, validation, and test predictions from N single models, respectively. Note that this approach requires the predictions on the training data set for all single models. For N predictions on the training, validation and test data sets, we aggregate them using either rank based oracle blending or statistical aggregation. Specifically, we applied weighted mean oracle blending with inverse of rank of test AUC score as the weight, i.e., we in Eq. (14), to aggregate them. Note that in training, validation and test set aggregation the same weight was used for a specific single model. Based on the aggregated results, we can identify the top 20% population (most likely to default). We denote the top 20% of the population in training, validation, and test data trn vld tst sets as IDslct , IDslct , and IDslct , respectively. Denote trn vld tst X , X , and X as features for these three sets, respectively. Thus, we can build new models for the trn vld top 20% population using X trn (IDslct ), X vld (IDslct ), tst tst and X (IDslct ). An interesting observation is that the weighted mean ranking and the weighted standard deviation of the Maximize AUC in Default Prediction: Modeling and Blending ranking from original blended single models (that utilized the whole population rather than the top 20%) can be used as new features. The addition of these two features resulted in significant improvement of AUC on the top 20% population for many models, such as logistic regression, SVM, neural networks, and random forest. We summarize the performance comparison before and after adding these two features for several variants of logistic regression and SVM in Table 5. It can be observed in Table 5 that SVM and logistic regression have been boosted from an AUC score of 0.68 (which is worse than the AUC score of the top 20% of the population using the blended model for the whole population) to 0.75. 7.1. Software and Data All algorithms have been implemented and are available on SVN2 . Acknowledgments These algorithms were developed during the Give Me Some Credit (GMSC) Competition. We would like to thank everyone who contributed to the GMSC competition including: Jacob Spoelstra, Jenny Zhang, Abhikesh Nag, Dan Nabutovsky, Kevin Chen, William Roberts, Kimi Minnick, Jason Lu, Michael Kennedy, Michael Alton, Yonghui Chen and Yun Wang. References Table 5. The AUC scores of different algorithms for the top 20% of populations on the validation data set before and after adding these two new features based on mean and standard deviation of rankings. Algorithm `2 -reg Logistic regression `2 -reg `2 -loss SVM `2 -reg `1 -loss SVM `1 -reg `2 -loss SVM `1 -reg Logistic regression Before 0.68127 0.68106 0.68036 0.67620 0.67767 After 0.75247 0.74899 0.72437 0.75036 0.75261 We performed median blending on 8 models with the best performance for the top 20% population and combined that with the oracle blended prediction from the bottom 80% population. AUC was improved from 0.86165 to 0.86204. 7. Conclusions We have summarized some of the selected contributions that we provided during the GMSC Competition. For feature creation, we focused on the significance of imputation of missing values as well as weight of evidence transformations including 2D weight of evidences. In the single model section, we demonstrated the variety of the models that can be built. Also, we showed how to boost the single model performance by residual post-processing by describing the best single model. We have elaborated on our exploration of different blending algorithms for both similar models and different model predictions. Finally, we have discussed our investigation on utilizing blending results to boost the performance further. These different techniques provide single coherent guides for future competition and also we believe that we can refine these approaches further to improve our model performance in the future. Breiman, L. Random Forest. Machine Learning, 45 (1):5–32, 2001. Diev, V., Gancarz, G., Honda, T., Lee, J. Y., Liu, Y., Mahmoudi, M., Mathur, R., Rahman, S., Sun, L., Wickert, S., Ye, X., and Zhang, H. Technical Report for Give Me Some Credit competition. Technical Report, Opera Solutions, San Diego, CA, January 2012. Friedman, J. H. Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5):1189–1232, 2001. Joachims, T. Training Linear SVMs in Linear Time. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2006. Mitchell, M. An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA USA, 1996. Mitchell, T. M. Machine Learning. McGraw-Hill, New York, 1997. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning Representations by Back-Propagating Errors. Nature, 323:533–536, 1986. Yan, L., Dodier, R. H., Mozer, M., and Wolniewicz, R. H. Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic. In Proceedings of the 20th International Conference on Machine Learning, pp. 848–855, 2003. Zou, H. and Hastie, T. Regularization and Variable Selection via the Elastic Net. Journal Of the Royal Statistical Society: Series B, 67(2):301–320, 2005. 2 http://subversion/repo/GMSC
© Copyright 2026 Paperzz