Credit Card Applicants` Credibility Prediction

Credit Card Applicants’ Credibility
Prediction with Decision Tree


Dan Xiao
Jerry Yang
Agenda






Project Goal
Project Domains
Project Method
Implementation
Conclusions
Experience
Project Goal

The objective of this project is to build a
decision tree to predict the classification
(good or bad) of credit card applicants.
Project Domains
There are around 4000 records
 There are 15 attributes

– Credit_Card_Debt
– Highest_Credit_Card_APR
– Monthly_Car_Pmt
– Monthly_Income
– Monthly_Mortage
– Martial_Status
Project Domains

Attributes continued
– No_of_Credit_Cards
– Year_of_Employment
– Citizenship
– Home_Ownership
– Accounts
– Sex
– Race
– Results
Project Method

Classification and Prediction
– Construct a model (Decision Tree) with the
training data
– Apply the testing data to the model
(Decision Tree) to predict the applicants’
credibility
Implementation

Software WEKA
Classifier
Filter
Learning Schemes
-ZeroR
-oneR
-M5
-J48
Implementation (Continued)


Relevance Analysis
Data Cleaning
– File conversion
– Missing data
– Outlier
Implementation (Continued)

Testing Dataset
– File conversion
– Data testing
• Percentage split
• Supplied test set
• Cross-validation
Percentage split
Split Ratio Confidence
Number
Of Leaves
Number of
Nodes
Correct
Classification
25%
0.25
30
54
94.77%
25%
0.15
3
5
94.77%
Supplied Data test
Confidence
Number of Leaves
Number of Nodes
Correct
Classification
0.15
3
5
94.67%
0.25
30
54
96.08%
Cross-validation
Fold Number Confidence
Number of
Leaves
Number of
Nodes
Correct
Classification
5
0.15
3
5
94.28%
10
0.15
3
5
94.21%
15
0.15
3
5
94.46%
Cross-validation (Continued)
Fold Number Confidence
Number of
Leaves
Number of
Nodes
Correct
Classification
5
0.25
30
54
94.28%
10
0.25
30
54
94.21%
15
0.25
30
54
94.46%
Conclusions



We are satisfied with the accuracy
(correct classification)
Cross-validation is good for use when
the dataset is small.
The pruned tree models have huge
difference with confidence 0.15 and
0.25
Future Work

Entropy-based discretization can reduce
Overfitting

Accuracy is not satisfied with big
dataset

Data bias can’t be avoid completely
Experience



Original dataset
New dataset
Results
Thank You