Credit Card Applicants’ Credibility Prediction with Decision Tree Dan Xiao Jerry Yang Agenda Project Goal Project Domains Project Method Implementation Conclusions Experience Project Goal The objective of this project is to build a decision tree to predict the classification (good or bad) of credit card applicants. Project Domains There are around 4000 records There are 15 attributes – Credit_Card_Debt – Highest_Credit_Card_APR – Monthly_Car_Pmt – Monthly_Income – Monthly_Mortage – Martial_Status Project Domains Attributes continued – No_of_Credit_Cards – Year_of_Employment – Citizenship – Home_Ownership – Accounts – Sex – Race – Results Project Method Classification and Prediction – Construct a model (Decision Tree) with the training data – Apply the testing data to the model (Decision Tree) to predict the applicants’ credibility Implementation Software WEKA Classifier Filter Learning Schemes -ZeroR -oneR -M5 -J48 Implementation (Continued) Relevance Analysis Data Cleaning – File conversion – Missing data – Outlier Implementation (Continued) Testing Dataset – File conversion – Data testing • Percentage split • Supplied test set • Cross-validation Percentage split Split Ratio Confidence Number Of Leaves Number of Nodes Correct Classification 25% 0.25 30 54 94.77% 25% 0.15 3 5 94.77% Supplied Data test Confidence Number of Leaves Number of Nodes Correct Classification 0.15 3 5 94.67% 0.25 30 54 96.08% Cross-validation Fold Number Confidence Number of Leaves Number of Nodes Correct Classification 5 0.15 3 5 94.28% 10 0.15 3 5 94.21% 15 0.15 3 5 94.46% Cross-validation (Continued) Fold Number Confidence Number of Leaves Number of Nodes Correct Classification 5 0.25 30 54 94.28% 10 0.25 30 54 94.21% 15 0.25 30 54 94.46% Conclusions We are satisfied with the accuracy (correct classification) Cross-validation is good for use when the dataset is small. The pruned tree models have huge difference with confidence 0.15 and 0.25 Future Work Entropy-based discretization can reduce Overfitting Accuracy is not satisfied with big dataset Data bias can’t be avoid completely Experience Original dataset New dataset Results Thank You
© Copyright 2026 Paperzz