Fitting of Logistic Regression Model for Prediction of

International Conference on Management and Information Systems
September 18-20, 2015
Fitting of Logistic Regression Model for Prediction of Likelihood of India
Winning or Losing in Cricket Match
Monalisha Pattnaik
[email protected]
Utkal University, Bhubaneswar
Anima Bag
[email protected]
C.V. Raman Group of Institutions, Janla
The present study focuses on the prediction of likelihood of India winning or losing in One Day International
(ODI) cricket match against Australia by fitting the logistic regression model. According to ICC ODI
championship rating, dated 7th August 2015, India holds 2nd position with 5875 points and 115 rating by playing
51 matches. Data from actual recent matches with five independent variables and one dependent binary logistic
variable are used throughout to illustrate the implementation of this successful use of mathematical and
statistical principles to the solution of a practical problem in one-day international cricket match.
Keywords: Prediction, ODI, Logistic regression model, Cricket, Binary logistic
1. Introduction
The International Cricket Council (ICC) ODI Championship is an international One Day cricket competition run
by the ICC. A One Day International (ODI) is a form of limited overs cricket, in which each team faces a fixed
number of overs, usually fifty. The first ODI was played on 5 January 1971 between Australia and England at
the Melbourne Cricket Ground and ODIs were played in white kits with a red ball. One Day International
matches are also called Limited Overs Internationals (LOI) and it is a late twentieth-century development.
According to ICC ODI championship rating, dated 7th August 2015, India holds 2nd position with 5875 points
and 115 rating by playing 51 matches.
Cricket is hugely popular sport around the world. An estimated three billion people are cricket fans, a figure
that is larger only for soccer, which has an estimated 3.5 billion fans. In recent years cricket‟s governing body,
the International Cricket Council (ICC), has sought to make cricket even more popular. In order to achieve this,
one strategy the ICC has adopted is to introduce Twenty 20 (T20), a shorter format of the game, with the
intention of making cricket a faster, more exciting spectacle that might attract a new audience.
One-day cricket, which is a shortened version of normal game of cricket, is a game between two teams each of
11 players played on an approximately oval field with semi-major and semi-minor axes of approximately 70 and
55m, respectively. it has a central „pitch‟ along the major axis approximately 20 m long.
To predict the likelihood of India winning or losing a ODI match against Australia, Logistic Regression model
is framed. Data is collected from 15 recent matches with 5 independent variables and 1 dependent variable of
India win or lose.
2. Logistic Regression Analysis
Logistic regression is generally preferred when there are only two categories of the dependent variable. Logistic
regression fits an S-shaped curve to the data. This curve relationship ensures two things-first, that the predicted
values are always between 0 and 1 and secondly, that predicted values correspond to the probability of Y being
1, or win, and being 0, or lose in the present study. To achieve this, a regression is first performed with a
transformed value of Y, called logit function. The equation is:
(
(
)
)
( )
Where,
Where, odds refer to the odds of Y being equal to 1. To understand the difference between odds and
probabilities. In the above equation a,
and
are constants and and are independent variables.
Case Analysis-1
We wish to predict the likelihood of India winning a one day international match against the Australia. Data
ISBN 978-1-943295-00-5
57
International Conference on Management and Information Systems
September 18-20, 2015
collected from 15 recent matches on the following variables.
 Viratscore is the score of Virat Kohli in the match. Since Kohli‟s batting is seen as instrumental to India‟s
chances, we wish to see if his score has in fact impact on India‟s victories.
 Does batting first help or hinder? The variable Batfirst is coded as 1 otherwise 0.
 Taking early wickets helps, so say the experts. Wicket10 shows the number of Australia wickets to fall in
the first 10 overs of their batting.
 Score of opener -1 and opener -2 are the indication of India win or lose the match. The variables are
Scoreopener-1 and Scoreopener-2.
 Finally, Indiawin is the dependent variable 1-victory, 0-loss.
The Table 1 shows the input data of 15 recent ODI matches against Australia.
Table 1 Input Data of India vs Australia
Date and
Sl.No.
Place
Place
Virat Kohli
Score
Bating
First
Wicket Taken in
First 10 Overs
Score of
Opener -1
Score of
Opener-2
India
win
1
26.05.2015 Sydney
01
0
1
34
45
0
2
18.01.2015 Melbourne
09
1
2
138
02
0
3
02.11.2013 Bangalore
03
1
1
209
60
1
4
30.10.2013 Nagpur
115
0
1
79
100
1
5
19.10.2013 Mohali
68
1
0
11
08
0
6
16.10.2013 Jaipur
100
0
0
141
95
1
7
13.10.2013 Pune
61
0
0
07
42
0
8
26.02.2012 Sydney
21
0
2
05
14
0
9
19.02.2012 Brisbane
12
0
0
05
03
0
10
12.02.2012 Adelaide oval
18
0
2
92
20
1
11
05.02.2012 Melbourne
31
0
2
05
02
0
12
24.05.2011 Ahmadabad
24
0
1
15
53
1
13
20.10.2010 Visakhapatnam
118
0
1
0
15
1
14
02.11.2009 Mohali
10
0
2
30
40
0
15
25.10.2009 Vadodara
30
0
1
13
14
0
Objective
To predict the likelihood of India winning a One Day International match against the Australia.
Hypothesis
There is no significant difference between the observed value and the model prediction.
There is a significant difference between the observed value and the model prediction.
Interpretation
Table 2 shows the classification of India winning or losing a ODI match against Australia. The classification
Table 2 shows that the overall correct classification rate of the model is 100%. It indicates that the logistic
regression model fits well. The model predicts winning and losing the ODI match against Australia with same
likelihood.
Hosmer and Lemeshow test of goodness of fit or chi-square goodness of fit is the test which tests “how well
the model fits”. From Table 3 it shows that the P-value is 1.0 which is greater than 0.05, so we may accept the
ISBN 978-1-943295-00-5
58
International Conference on Management and Information Systems
September 18-20, 2015
null hypothesis at 5% level of significance. It indicates that there is no significance difference between the
observed value and the predicted value of the model. In other words, model fits well.
Table 4 shows that the logistic regression model with one dependent/categorical variable (Y, win) and other 5
independent variables like (
Virat Kohli Scoring, , Bating first (categorical variable),
Number of wicket
before 10 overs,
Scoring of opener-1 and
Scoring of opener-2). From the model it is observed that India
winning is directly related with all the five independent variables. The multivariate logistic regression model is:
Table 5 shows the predicted probabilities and classification of India winning and losing in ODI match against
Australia. Figure 1 shows observed groups and predicted probabilities classification of given data. Figure 2
shows graphical representation of logistic regression S-curve of Predicted value and one independent variable
Virat Kohli Scoring.
Table 2 Classification Table
Predicted
INDIAWIN
Percentage Correct
.00 1.00
.00 9
0
100.0
INDIAWIN
Step 1
1.00 0
6
100.0
Overall Percentage
100.0
a. The cut value is .500
Observed
Table 3 Hosemer and Lemeshow Test
Step Chi-square Df Sig.
1
.000
5 1.000
Table 4 Variables in Logistic Regression Model
B
S.E.
Wald Df Sig.
Exp(B)
VIRATSCOR
2.651
217.169
.000 1 .990
14.171
BATFIRST(1)
7.659 183830.908 .000 1 1.000
2119.718
WKTBTENOV 50.146
4775.057
.000 1 .992 5999877122161316000000.000
Step 1a
SCOOPNR
2.027
175.255
.000 1 .991
7.588
SCOROPEN
5.719
475.773
.000 1 .990
304.734
Constant
-439.502 186743.350 .000 1 .998
.000
a. Variable(s) entered on step 1: VIRATSCOR, BATFIRST, WKTBTENOV, SCOOPNR, SCOROPEN.
Table 5 True Value and Predicted Value of Likelihood of India Winning a ODI Match against Austarlia
Sl.
No.
1
Kohli
Wicket taken in first
Batfirst
cr
10 overs
01
0
1
Score of
opener, 1
34
Score of
opener, 2
45
India
win
0
Predicted
Probability
.00000
Predicted
Value
0.00
2
09
1
2
138
02
0
.00000
0.00
3
03
1
1
209
60
1
1.00000
1.00
4
115
0
1
79
100
1
1.00000
1.00
5
68
1
0
11
08
0
.00000
0.00
6
100
0
0
141
95
1
1.00000
1.00
7
61
0
0
07
42
0
.00000
0.00
8
21
0
2
05
14
0
.00000
0.00
9
12
0
0
05
03
0
.00000
0.00
10
18
0
2
92
20
1
1.00000
1.00
11
31
0
2
05
02
0
.00000
0.00
12
24
0
1
15
53
1
1.00000
1.00
13
118
0
1
0
15
1
1.00000
1.00
14
10
0
2
30
40
0
.00000
0.00
ISBN 978-1-943295-00-5
59
International Conference on Management and Information Systems
15
30
0
1
13
September 18-20, 2015
14
0
.00000
0.00
Step number: 1
Observed Groups and Predicted Probabilities
16 +
+
I
I
I
I
F
I
I
R
12 +
+
E
I
I
Q
I
I
U
I0
I
E
8 +0
+
N
I0
I
C
I0
1I
Y
I0
1I
4 +0
1+
I0
1I
I0
1I
I0
1I
Predicted ---------+---------+---------+---------+---------+---------+---------+---------+---------+---------Prob:
0
.1
.2
.3
.4
.5
.6
.7
.8
.9
1
Group: 0000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111
Predicted Probability is of Membership for 1.00
The Cut Value is .50
Symbols: 0 - .00
1 - 1.00
Each Symbol Represents 1 Case.
Figure 1 Observed Groups and Predicted Probabilities Classification
India Winning
India Winning vs Kohli Score
1.2
1
0.8
0.6
0.4
0.2
0
-0.2 0
predictedvalue
Poly.
(predictedvalue)
50
100
150
Kohli Scoring
Figure 2 Logistic Regression Curve of Predicted Value and Virat Kohli Scoring
Case Analysis-2
We wish to predict the likelihood of India winning a one day international match against the Australia. Data
collected from 15 recent matches on the following variables.
 Viratscore is the score of Virat Kohli in the match. Since Kohli‟s batting is seen as instrumental to India‟s
chances, we wish to see if his score has in fact impact on India‟s victories.
 Does batting first help or hinder? The variable Batfirst is coded as 1 otherwise 0.
 Taking early wickets helps, so say the experts. Wicket10 shows the number of Australia wickets to fall in
the first 10 overs of their batting.
 Finally, Indiawin is the dependent variable 1-victory, 0-loss.
The Table 6 shows the input data of 15 recent ODI matches against Australia.
Objective
To predict the likelihood of India winning a One Day International match against the Australia.
Hypothesis
There is no significant difference between the observed value and the model prediction.
There is a significant difference between the observed value and the model prediction.
Interpretation
ISBN 978-1-943295-00-5
60
International Conference on Management and Information Systems
September 18-20, 2015
Table 7 shows the classification of India winning or losing a ODI match against Australia. The classification
Table 7 shows that the overall correct classification rate of the model is 73.3%. It indicates that the logistic
regression model fits well. The model predicts India‟s wins better than it predicts loses the ODI match against
Australia.
Hosmer and Lemeshow test of goodness of fit or chi-square goodness of fit is the test which tests “how well
the model fits”. From Table 8 it shows that the P-value is 0.058 which is greater than 0.05, so we may accept the
null hypothesis at 5% level of significance. It indicates that there is no significance difference between the
observed value and the predicted value of the model. In other words, model fits well.
Table 9 shows that the logistic regression model with one dependent/categorical variable (Y, win) and other 3
independent variables like (
Virat Kohli Scoring, , Bating first,
Number of wicket before 10 overs,
). From the model it is observed that India winning is directly related with all the two independent variables
except bating first. The multivariate logistic regression model is:
Table 10 shows the predicted probabilities and classification of India winning and losing in ODI match
against Australia. Figure 3 shows observed groups and predicted probabilities classification of given data.
Table 6 Input Data of India vs Australia
Sl. No. Date and Place
1
26.05. 2015
2
18.01. 2015
3
02.11. 2013
4
30.10.2013
5
19.10.2013
6
16.10. 2013
7
13.10.2013
8
26.02.2012
9
19.02.2012
10
12.02.2012
11
05.02.2012
12
24.05.2011
13
20.10.2010
14
02.11.2009
15
25.10.2009
Place
Virat Kohli Score Bating First Wicket Taken in First 10 Overs India win
Sydney
01
0
1
0
Melbourne
09
1
2
0
Bangalore
03
1
1
1
Nagpur
115
0
1
1
Mohali
68
1
0
0
Jaipur
100
0
0
1
Pune
61
0
0
0
Sydney
21
0
2
0
Brisbane
12
0
0
0
Adelaide oval
18
0
2
1
Melbourne
31
0
2
0
Ahmadabad
24
0
1
1
Visakhapatnam
118
0
1
1
Mohali
10
0
2
0
Vadodara
30
0
1
0
Table 7 Classification Table
Predicted
INDIAWIN
Observed
Step 1
INDIAWIN
Percentage Correct
.00
1.00
8
1
88.9
1.00 3
3
50.0
.00
Overall Percentage
73.3
a. The cut value is .500
Table 8 Hosmer and Lemeshow Test
Step Chi-square df Sig.
1
12.203
6 .058
Table 9 Variables in Logistic Regression Model
Variables in the Equation
B
S.E. Wald df Sig. Exp(B)
KOHLISCOR
.032
.021 2.346 1 .126 1.032
BATFIRST(1)
-.234 1.489 .025 1 .875
.792
Step 1a
WKTTEN
.555
.903 .377 1 .539 1.741
Constant
-2.163 1.856 1.358 1 .244
.115
a. Variable(s) entered on step 1: KOHLISCOR, BATFIRST, WKTTEN.
ISBN 978-1-943295-00-5
61
International Conference on Management and Information Systems
September 18-20, 2015
Table 10 True Value and Predicted Value of Likelihood of India Winning a ODI Match against Austarlia
Sl. No. Kohli cr batfirst Wicket taken in first 10 overs India win Predicted Probability Predicted Values
1
01
0
1
0
.14068
0.00
2
09
1
2
0
.31735
0.00
3
03
1
1
1
.18060
0.00
4
115
0
1
1
.86176
1.00
5
68
1
0
0
.50205
1.00
6
100
0
0
1
.68919
1.00
7
61
0
0
0
.38964
0.00
8
21
0
2
0
.35060
0.00
9
12
0
0
0
.11782
0.00
10
18
0
2
1
.32912
0.00
11
31
0
2
0
.42626
0.00
12
24
0
1
1
.25439
0.00
13
118
0
1
1
.87278
1.00
14
10
0
2
0
.27536
0.00
15
30
0
1
0
.29239
0.00
Step number: 1
Observed Groups and Predicted Probabilities
4 +
+
I
I
I
I
F
I
I
R
3 +
+
E
I
I
Q
I
I
U
I
I
E
2 +
+
N
I
I
C
I
I
Y
I
I
1 +
0 0
1
1 0 0 01 0 0
0
0
1
11
+
I
0 0
1
1 0 0 01 0 0
0
0
1
11
I
I
0 0
1
1 0 0 01 0 0
0
0
1
11
I
I
0 0
1
1 0 0 01 0 0
0
0
1
11
I
Predicted ---------+---------+---------+---------+---------+---------+---------+---------+---------+---------Prob:
0
.1
.2
.3
.4
.5
.6
.7
.8
.9
1
Group: 0000000000000000000000000000000000000000000000000011111111111111111111111111111111111111111111111111
Predicted Probability is of Membership for 1.00
The Cut Value is .50
Symbols: 0 - .00
1 - 1.00
Each Symbol Represents .25 Cases.
Figure 3 Observed Groups and Predicted Probabilities Classification
3. Conclusion
The present study is focused on the prediction of likelihood of India winning or losing in One Day International
(ODI) cricket match against Australia by fitting the logistic regression model. The two case analysis have been
carried out one case has five independent variables and one dependent binary logistic variable and other case has
three independent variables and one dependent binary logistic variable for 15 recent ODI matches. From the
results, the prediction of India win or lose the ODI match against Australia to illustrate the implementation of
this successful use of mathematical and statistical principles to the solution of a practical problem in one-day
international cricket match.
4. References
1.
2.
3.
4.
Carter, M., Guthrie, G., (2004), “Cricket interruptus: fairness and incentive in limited overs cricket
matches”, Journal of Operational Research Society, Vol. 55, 822-829.
Duckworth, F.C., Lewis, A.J, (2004), “A successful operational research intervention in one-day cricket”,
Journal of Operational Research Society, Vol. 55, 749-759.
McHale, I.G., Asif, M., (2013), “A modified Duckworth-Lewis method for adjusting targets in interrupted
limited overs cricket”, European Journal of Operational Research, Vol. 225, 353-362.
Nargundkar, R. (2008), “Marketing Research”, 3rd Edition, Tata McGraw-Hill, New Delhi, India.
ISBN 978-1-943295-00-5
62