Predicting Real-Time Percent Enrollment Increase
__________________
Mark Hamner
Texas Woman’s University
Department of Mathematics and Computer Science
Preet Ahluwalia
Credit Risk Analyst-AmeriCredit
Texas Woman’s University
Denton . Dallas . Houston
Year 2005 Facts
Total Enrollment – 11,344
Undergrad – 6,266
Campus Enrollment
Denton –9,157
Graduate (Masters) – 4,369
Doctoral - 709
Dallas – 921
Houston – 1,266
Female – 10,368
Male – 976
59 academic programs
(19 doctoral)
Outline
Problem Definition
Predicting Student Enrollment at Time ‘t’ Using Historical Data
1.
Enrollment Process - For Newly Enrolled
2.
The predictive problem
3.
Logistic Prediction Model
a. Data Issues and programming Solutions
4.
Quadratic Prediction Model
a. Exploratory analysis to Identify Patterns
5.
Combine for overall Prediction: Results
Enrollment
• Enrollment predictions can be broken into two fundamental
pieces:
Newly
Enrolled
Students
Re-Enrolling/
Continuing
Students
• The focus of this paper is the prediction of Newly Enrolled
students.
New Students: Enrollment Process
All Prospective Students
Applicants
FTIC Transfer Graduate Others
Admitted to
TWU
New12th
Day
Enrolled
Idea Behind Enrollment Prediction at Time = t
Enrollment Prediction at Time ‘t’
Let Time = t denote the prediction date
For Applicants Before t , we will have data
For Applicants after time t (denoted by t’) , we will not have data
Predict
Time
Begin
Prediction
Predict
t
Total Enrollment = Enroll_t + Enroll_t’
Fall 12th
Day
Weekly Partition of Prediction Interval
The prediction interval will be broken up into weekly Intervals
The diagram below illustrates prediction at Week = 5
At Week = 5 we have 35 more days of applicant data than at Week = 0
Predict
Week
0
Predict
5
Total Enroll = Enroll_t + Enroll_t’
17
Enroll_t
Pt = {1, 2, …, Nt} -- Finite set of applicants at week = t
k Pt
Enrollment is a dichotomous response variable – yk
yk = 1 (student enrolled), yk = 0 (student did not enroll)
Enrollment of all applicants at week = t ,
Enroll _ t
Nt
yk
k 1
Model Dichotomous Variable
For each yk, k Pt
let θk represent the probability that yk = 1
There exists applicant information for each individual:
xk = (x1k, x2k, …, xpk) = (Distancek, SATk,…, Major_Ratiok)
Use Logistic Regression to model θk
Logistic Regression Model
• The probability of student k enrolling is
k (x k )
e Lk
1 e Lk
Lk = β0 + β1 Distancek + β2 SATk +…+ βp Major_Ratiok
These are predictor variables
Predict Enroll_t
y1
y
Let Y be the random vector of responses: Y 2
y Nt
Thus,
Enroll _ t 1Y
Nt
yk
k 1
Note: 1 is a Nt x 1 vector of ones
Estimated Enroll_t is …
E ( y1 )
E ( y ) Nt
2
Tˆt E ( Enroll _ t ) 1 E ( Y) 1
(x )
k k
k 1
E ( y Nt )
Logistic Model
• Predictor variables: Distance, DOB, Major_Ratio, SAT_M,
SAT_V, Gender, Personal, etc.
• What variables will get picked for model building?
Programming and Variable Selection
Use SAS to create possibly significant variables
and dummy code categorical variables
SAS Programming:
Exploratory and Variable Creation
Example: Major_Ratio, Ethnic, etc.
Start Saturated
Model
Backward Selection
Yes
Slightly different variables are selected
Drop
Predictor
for: FTIC, Transfer, and Graduate.
No
Stop
Fitted Model
FTIC Variable Selection
Variable Name
Variable Type
Variable Description
Twelve
Response
1 if enrolled; 0 otherwise
Distance♦
Explanatory
Continuous variable
SAT_M, SAT_V, ACT
Explanatory
Continuous Variable; SAT Math score, SAT Verbal score, Act
Score
Give ACT♦
Explanatory
1 if score provided; 0 otherwise
Program Ratio♦
Explanatory
Continuous variable
Major Ratio♦
Explanatory
Continuous variable
Date of Birth
Explanatory
Continuous variable
Gender♦
Explanatory
1 if female; 0 for male
Apply Early♦
Explanatory
1 if apply before January 1; 0 otherwise
E1, E2, E3, E4, E5, E6,
E7
Explanatory
Dummy variables for Ethnicity
Personal♦
Explanatory
Discrete Variable; Number of key information available for a
student
Case Study-Logistic Model Prediction
Applicant data for 2003 to predict 2004 FTIC by weekly time intervals
Week Total Apply
0
1,877
1
1,896
2
1,930
3
1,951
4
1,975
5
1,994
6
2,005
7
2,026
8
2,039
9
2,058
10
2,065
11
2,081
12
2,097
13
2,111
14
2,118
15
2,122
16
2,123
17
2,146
FTIC 2004 Enrollment
Predict Enroll_t Actual Enroll
608
578
615
584
623
594
632
606
638
613
644
620
647
623
655
634
659
638
665
647
667
650
669
653
673
661
678
668
680
671
681
673
681
674
690
687
• The Logistic Model does not predict after week = t
Off
30
31
29
26
25
24
24
21
21
18
17
16
12
10
9
8
7
3
% Off
1.6%
1.6%
1.5%
1.3%
1.3%
1.2%
1.2%
1.0%
1.0%
0.9%
0.8%
0.8%
0.6%
0.5%
0.4%
0.4%
0.3%
0.1%
Enrollment after Week = t
• Total Enrollment = Enroll_t + Enroll_t’
• At any week = t, we need to predict Enroll_t’
• Identify historical relationships that may be helpful
Applicant Versus Enrolled by Year
• Both applications and enrollment have been increasing
• Notice enrollment yield is decreasing
9,000
8,000
7,000
6,000
5,000
4,000
3,000
2,000
1,000
0
Enroll
%Enroll
66.4%
48.1%
46.7%
43.5%
41.7%
39.4%
34.6%
1999
2000
2001
2002
2003
2004
2005
Year
Is the % increase in enrollment matching the % increase in apply?
80%
75%
70%
65%
60%
55%
50%
45%
40%
35%
30%
Percent
Total
Apply
Applicant Yield By Strata
Enrollment is yield from applicant data is decreasing for each strata
Graduate
FTIC
Transfer
% Applicants Enroll
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
1999
2000
2001
2002
2003
Year
How does this affect yearly increase in enrollment?
2004
2005
Percent Increase
Applicant Vs. Enrolled
• Applicant increase is not a viable indicator of enrollment increase
% Increase Apply
% Increase Enroll
120.0%
102.7%
100.0%
Percent
80.0%
60.0%
46.8%
31.7%
22.7%
40.0%
20.0%
6.9%3.9%
16.4%
11.5%
9.2%
3.3%
0.0%
-20.0%
2000
2001
2002
2003
Year
• What patterns are reliable to model?
2004
13.4%
-0.4%
2005
Cumulative FTIC Enrollment by Week
• Notice the parallel lines, which implies equal slopes!
• At any week = t, we can relate Enroll_t to Total Enrollment (Week = 17)
1999
2000
2001
2002
2003
2004
2005
Cummulative Enroll
900
800
700
600
500
400
300
200
100
0
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Week
• Thus, (Total Enroll – Enroll_t) should be very similar from year to year
18
Relationship Between Enrollment & Total Enrollment
• By definition, (Total Enroll – Enroll_t) = Enroll_t’
140
120
Enroll_t'
100
80
60
40
20
0
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17
Week
• Model Enroll_t’ and smooth out the consistent patterns by week
Enroll_t’ Model
• Use 2003 Enroll_t’ Model to predict Enroll_t’ for 2004
2003 FTIC Enroll_t'
140
Enroll_t'
120
100
80
60
40
20
0
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18
Week
Estimate of Enroll_t’: (R2 = 0.9857)
T̂t' = 0.1961 week 2 - 10.514 week + 130.7
Predict 2004 Enroll_t’
FTIC 2004 Enrollment
Week
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Predict
Total Apply Actual Enroll Actual Enroll_t' Enroll_t' Off
1,877
1,896
1,930
1,951
1,975
1,994
2,005
2,026
2,039
2,058
2,065
2,081
2,097
2,111
2,118
2,122
2,123
2,146
578
584
594
606
613
620
623
634
638
647
650
653
661
668
671
673
674
687
109
103
93
81
74
67
64
53
49
40
37
34
26
19
16
14
13
0
131
120
110
101
92
83
75
67
59
52
45
39
33
27
22
17
13
9
22
17
17
20
18
16
11
14
10
12
8
5
7
8
6
3
0
9
Predict 2004 FTIC Total Enroll
Total Enrollment = Enroll_t + Enroll_t’
Note: 2004 FTIC Actual Total is 687
2004 FTIC Predict
Week
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Total Apply Enroll_t Enroll_t' Total
1,877
1,896
1,930
1,951
1,975
1,994
2,005
2,026
2,039
2,058
2,065
2,081
2,097
2,111
2,118
2,122
2,123
2,146
608
615
623
632
638
644
647
655
659
665
667
669
673
678
680
681
681
690
131
120
110
101
92
83
75
67
59
52
45
39
33
27
22
17
13
9
739
736
733
733
729
727
721
722
718
717
712
708
706
705
702
698
694
698
Off
52
49
46
46
42
40
34
35
31
30
25
21
19
18
15
11
7
11
Predict 2005 FTIC Total Enroll
Total Enrollment = Enroll_t + Enroll_t’
Note: 2005 FTIC Actual Total is 765
2005 FTIC Predict
Week
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Total Apply Enroll_t Enroll_t' Total
2,328
2,357
2,390
2,409
2,432
2,444
2,480
2,497
2,521
2,534
2,549
2,564
2,583
2,595
2,606
2,611
2,617
2,652
668
675
687
690
696
697
707
712
716
719
722
727
732
736
739
740
742
755
109
100
92
84
76
69
61
55
48
42
36
31
26
21
16
12
8
5
777
775
779
774
772
766
768
767
764
761
758
758
757
756
755
752
750
760
Off
12
10
14
9
7
1
3
2
-1
-4
-7
-7
-8
-9
-10
-13
-15
-5
- END -
Thank you!
Any Questions?
© Copyright 2026 Paperzz