Predicting Academic Performance of University Students

Predicting Academic
Performance of University
Students
BIT 5534: Applied Business Analytics & Business Intelligence
Team 4: Michael Cerney, Chris Kopinski, and Chris Stewart
Agenda
1. Business Problem
2. Data Understanding
3. Data Preparation
4. Modeling (Ordinary Least Squares + Decision Tree)
5. Results
6. Conclusion
Business Problem
University vs High School GPA?
Topic: University student attrition and retention
Business Problem: Identify characteristics/attributes of success
or failure for University students
Measure of Success: GPA Performance
School
Business
Engineering
Liberal Arts
Sciences
Social Sciences
University GPA
2.52
2.46
2.42
2.51
2.36
High School GPA
3.04
3.03
3.02
2.99
3.00
What’s to be gained?
1. Universities can have more successful student recruitment
and selection
Miles from home?
2. Universities create programs that encourage success of
current students
Accommodations?
Business Problem
Data Understanding
Data Preparation
Modeling
Part-time work hours?
Results
Conclusion
Data Understanding
 Dataset acquired from JMP textbook
website
 Attributes are student-centric:
 Ex. GPA, College, Accommodations, etc.
 GPA identified as dependent variable
 Threshold for academic success based on
GPA but not intentionally defined
Business Problem
Data Understanding
Data Preparation
Sample of Variable Dictionary
Attribute Name
Variable Type
Attribute Description
GPA
Continuous
GPA while attending
university
Miles from Home
Continuous
Distance from campus
Accomodations_Dorm
Continuous
Student lives in dorm
Attends Office
Hours_Never
Continuous
Student never attends
office hours
College_Business
Continuous
Student majors in
Business
Class_Freshmen
Continuous
Student is a freshmen
Modeling
Results
Conclusion
Data Preparation
Data Consolidation
Data Cleaning
• Data Selection:
• Academic variables
of interest
identified
• Missing Values
Report:
• No reported values
missing
• Outlier Detection
Report:
• No reported
outliers
Business Problem
Data Understanding
Data Preparation
Data
Transformation
Data Reduction
• Removed variable
Return
• Removed 23 records
that included the
Return attribute
• Dummy variable
creation:
• College
• Attends Office
Hours
• Accommodations
• Class
Modeling
Results
Conclusion
Modeling – Ordinary Least Squares
OLS Model
Independent Variable Estimates
 Statistically significant and positively
correlated with GPA based on (p < 0.05):
 College_Business, College_Sciences,
College_Engineering
 Statistically significant and negatively
correlated with GPA (p < 0.05):
 Accomodations_Off-campus
Business Problem
Data Understanding
Data Preparation
Modeling
Results
Conclusion
Modeling – Decision Tree
Decision Tree contains 7 splits and 1077
records:
1.
First split = yes/no for Business
School
a.
2.
1
Business school in general has
higher GPA (2.56 vs 2.44)
Second split = Business School +
yes/no on off campus
accomodations
a.
2
Off campus has a higher GPA (2.62
vs 2.51)
Business Problem
Data Understanding
Data Preparation
Modeling
Results
Conclusion
Results – K Fold Cross-Validation
 K-Fold Cross-Validation Technique
 Dataset was partitioned into five
subsets (215 records each)
 Each training set contain 80% of
the dataset, formed into a unique
combination of subsets
 Each validation set contain 20% of
the dataset (1 unique subset)
Business Problem
Data Understanding
Training
Sample
Subsets
Validation
Sample
Subset
1
A,B,C,D
1
E
2
B,C,D,E
2
A
3
C,D,E,A
3
B
4
D,E,A,B
4
C
5
E,A,B,C
5
D
Data Preparation
Modeling
Results
Conclusion
Results
 OLS Model Results
 Decision Tree Model Results
 Confirmed the statistically significant
variables (Business, Engineering, Science)
 Living off campus showed statistical
significance for higher GPA scores
 An average (P<0.1) for 5 out of 5 training
sets identified that Attends Office Hours
Never, displayed a high correlation to
lower GPA scores
Business Problem
Data Understanding
Data Preparation
 Students that were enrolled into the college
of business and science, and that lived off
campus, maintain a higher GPA compared to
the students that lived on campus
 Students that were enrolled into the college
of engineering, and lived on campus,
performed better than the students that
lived off campus
Modeling
Results
Conclusion
Conclusion
 Encourage on-campus students to attend
office hours
 Obtain new perspectives by interviewing
students and staff to further investigate
low GPA scores
 Promote mentorship program
 Increase GPA scores and lower attrition
rates
Business Problem
Data Understanding
Data Preparation
Modeling
Results
Conclusion