Alejandro Perez Student number: st09002131 School of sport

CARDIFF SCHOOL OF SPORT
DEGREE OF BACHELOR OF SCIENCE
(HONOURS)
SPORTS COACHING
A COMPARISON OF PREDICTIVE MODELS AND
HOW EFFECTIVELY THEY PREDICT FORM BASED
ON THE 2011 RUGBY WORLD CUP
ALEJANDRO PEREZ
ST09002131
NAME: ALEJANDRO PEREZ
STUDENT NUMBER: ST09002131
SCHOOL OF SPORT
Cardiff Metropolitan University
A COMPARISON OF PREDICTIVE MODELS AND
HOW EFFECTIVELY THEY PREDICT FORM BASED
ON THE 2011 RUGBY WORLD CUP
Cardiff Metropolitan University
Prifysgol Fetropolitan Caerdydd
Certificate of student
I certify that the whole of this work is the result of my individual effort, that all quotations
from books and journals have been acknowledged, and that the word count given below is a
true and accurate record of the words contained (omitting contents pages, acknowledgements,
indexes, figures, reference list and appendices).
Word count:
8,577 Words
Signed:
Date:
Thursday, 13 July 2017
Certificate of Dissertation Tutor responsible
I am satisfied that this work is the result of the student’s own effort.
I have received a dissertation verification file from this student
Signed:
Date:
Notes:
The University owns the right to reprint all or part of this document.
Contents
Acknowledgements ..................................................................................................................... i
Abstract ......................................................................................................................................ii
Chapter 1 Introduction .......................................................................................................... - 2 1.1 Background ................................................................................................................. - 2 1.2 Aim of Research .......................................................................................................... - 3 1.3 Rational for Research .................................................................................................. - 3 1.4 Statement of Hypothesis.............................................................................................. - 3 1.5 De-Limitations ............................................................................................................ - 4 1.6 Limitations .................................................................................................................. - 4 1.7 Definition of Terms ..................................................................................................... - 4 Chapter 2 Literature Review ................................................................................................. - 6 2.1 Uncertainty in Sport Performance ............................................................................... - 6 2.2 Factors affecting outcomes of Rugby Matches ........................................................... - 7 2.3 Predictive Modelling techniques ................................................................................. - 8 Chapter 3 Methods .............................................................................................................. - 12 3.1 Data sources .............................................................................................................. - 12 3.2 Models ....................................................................................................................... - 14 3.2.1 Independent Variables ................................................................................................ - 14 3.2.2 Dependent Variable ..................................................................................................... - 14 3.2.3 The Relation between the Variables ......................................................................... - 14 3.2.4 Linear Regression Models .......................................................................................... - 15 -
3.2.5 Predictions vs. Rugby World Cup 2011 ................................................................... - 16 Chapter 4 Results: ............................................................................................................... - 18 4.1 Expected Score Difference ........................................................................................ - 18 4.2 Accuracy of Scores.................................................................................................... - 18 4.3 Accuracy of Predictions ............................................................................................ - 19 4.4 Predicted Final Stages ............................................................................................... - 20 4.5 Actual Scores vs. Predictions .................................................................................... - 22 Chapter 5 Discussion .......................................................................................................... - 25 5.1 Comparison in Studies .............................................................................................. - 25 5.2 Upsets ........................................................................................................................ - 26 5.3 Accuracy of Predictive Models ................................................................................. - 27 5.3.1 Teams Underscoring ................................................................................................... - 27 5.3.2 Teems Meeting Expectations ..................................................................................... - 28 5.3.3 Teams Over Scoring .................................................................................................... - 29 5.4 Form .......................................................................................................................... - 31 5.5 Limitation of Models ................................................................................................. - 33 5.6 Comparison to other sports ....................................................................................... - 33 Chapter 6 Conclusion.......................................................................................................... - 35 6.1 Summary of Findings ................................................................................................ - 35 6.2 Hypothesis Acceptance or Rejection......................................................................... - 35 6.3 Future Research ......................................................................................................... - 35 References ........................................................................................................................... - 37 -
List of Tables
TABLE 3-1 SAMPLE SIZE
- 12 -
TABLE 4-1 DIFFERENCE IN POINTS SCORED DURING THE RUGBY WORLD CUP 2011
- 18 -
TABLE 4-2 ACCURACY OF PREDICTED SCORES IN THE RUGBY WORLD CUP 2011
- 19 -
TABLE 4-3 ACCURACY OF CORRECT PREDICTIONS IN THE RUGBY WORLD CUP 2011
- 19 -
TABLE 4-4 COMPARISON OF POINTS SCORED PER MATCH
- 23 -
TABLE 5-1 COMPARISON IN ACCURACY
- 25 -
TABLE 5-2 UPSETS DURING THE RUGBY WORLD CUP 2011
- 26 -
List of Figures
FIGURE 2-1 % OF HIGHER RANKED PLAYER TO WIN THREE DIFFERENT TYPES OF TENNIS SETS - 7 FIGURE 4-1 POINT MARGIN REQUIRED FOR CORRECT PREDICTIONS
- 20 -
FIGURE 4-2 "ENTER" LINEAR REGRESSION KNOCK OUT STAGE
- 21 -
FIGURE 4-3 "STEPWISE" LINEAR REGRESSION KNOCK OUT STAGE
- 22 -
FIGURE 5-1 MATCHES WITH UPSETS
- 27 -
FIGURE 5-2 COMPARISON OF MODELS TO ACTUAL POINT DIFFERENCE
- 30 -
FIGURE 5-3 FORM OF TEAMS IN RWC 2011
- 32 -
Acknowledgements
I’d like to express my sincere thanks to the people that helped me during the process of
creating this dissertation. In particular, I’d like to thank Professor Peter O’Donoghue for the
endless hours and his knowledge in the field of Predicting Performance in Sports.
i|Page
Abstract
Purpose: The purpose of this study will be to successfully predict results based on specific
variables. Using the Rugby World Cup 2011 as a comparison tool, the study will determine
which linear regression models are more accurate. It will then be able to determine which
teams are performing above their predicted ability, and which teams are inconsistent with
their form.
Method: Data collected over the past five years will provide over 470 data points from which
various linear regression models will be created to predict on the outcome of the games in the
Rugby World Cup 2011.
Results: The different types of regression models create varying levels of accuracy in the
predictions, with the most accurate model providing 89.58% of correct outcomes. This
provides information that determines that teams like Fiji are underperforming greatly, while
Wales are scoring +15 points a game more than expected.
Conclusion: Predictive models should be created using a large number of data points, and the
number of variables should increase to improve accuracy in predictions. The models
themselves can be used to determine how well teams performed in a tournament.
Key Words: Predicting Performance, Rugby World Cup, Linear Regression, Form, Upset
ii | P a g e
Chapter 1 Introduction
1.1 Background ................................................................................................................. - 2 1.2 Aim of Research .......................................................................................................... - 3 1.3 Rational for Research .................................................................................................. - 3 1.4 Statement of Hypothesis.............................................................................................. - 3 1.5 De-Limitations ............................................................................................................ - 4 1.6 Limitations .................................................................................................................. - 4 1.7 Definition of Terms ..................................................................................................... - 4 -
Chapter 1 Introduction
1.1 Background
Internationally, Rugby Union is renowned as a winter sport, ranked second in capaciousness after
soccer (Bathgate et al., 2002). Since the legend of William Webb Ellis, who is credited with first
picking up the football and running with it, has doggedly survived the countless revisionist
theories since that day at Rugby School in 1823 (IRB: Player Charter., 2012). Rugby Unions
appeal to the athletes who participate in this sport, has a great deal to do with the spirit showed by
William Webb Ellis on that day, and today the fact that it is both played within the letter and the
spirit of the game keeps the numbers growing in this sport. The responsibility for ensuring that
this happens lies not with one individual – it involves coaches, captains, players and referees. The
Object of the Game is that two teams, each of fifteen players, observing fair play, according to
the Laws and in a sporting spirit should, by carrying, passing, kicking and grounding the ball,
score as many points as possible. The development of this concept leads to the creation of the
first Rugby World Cup in 1987
The inaugural Rugby World Cup was held in New Zealand and Australia in 1987. Leading to the
host winning the original the Webb Ellis Cup. In 1991 Australia was the winner at Twickenham;
South Africa, as hosts, was the winner in 1995; in 1999 Australia won the Cup for a second time
at the Millennium Stadium in Cardiff. 2003 saw the first northern hemisphere winners - England in Sydney, but the trophy returned south in 2007 when South Africa became the second team to
win two Rugby World Cups with their victory in the Stade de France, Paris. Rugby World Cup
(RWC) is the financial engine driving the development of the game world-wide. The revenues
from RWC provide the IRB with funds to be distributed to the Member Unions to aid and assist
them in the expansion and development of the game. RWC 2007 attracted 2.2 million ticket sales,
1.8 million website hits, and record television viewing figures through broadcast exposure via
238 channels around the world. The cumulative TV audience was estimated at 4.2 billion. From
the quarter-final stage onwards, the matches were completely sold out and the qualifying pool
stages set a new record for average crowd attendance at 48,500. RWC has now become
established as one of the most important sporting events behind the Olympics and the FIFA
World Cup.
This much influence on audiences and nations, combined with the fact that the sport turned
professional in 1995, has increased the pressure now felt by teams travelling to the events.
Competitive team sports are a source of unpredictability and uncertainty for all players and
-2-|Page
coaches. Raising a major question: How can we reduce the uncertainty inevitably faced by
players and coaches in all performance contexts (Passos et al., 2008)? Even the previous
knowledge that players may acquire about their opponents is never enough to solve the problems
that emerge during sub-phases of games like Rugby Union. This is a key issue in every team
sport: players never know with 100% certainty what their opponents are going to do at every
moment of the game. This need for knowledge has increased the need for a more scientific
approach to explore the different elements in the game of rugby union (Nicholas, 1997; Duthie, et
al., 2003; James et al., 2005).
1.2 Aim of Research
The aim of this study is to attempt to create a prediction model that will successfully predict the
outcome of a majority of the matches during a tournament. Specifically, for this study, the Rugby
World Cup 2011 will be used as comparison to determine the accuracy of the models created. To
do this, the study will widen the research in predictive performance analysis using different
models and an increased number of data sources. It shall seek to determine how accurate certain
models are, and subsequently use these models to determine how teams are performing. This
identifies which teams are over performing and which teams are not achieving their expected
potential.
1.3 Rational for Research
Due to the expansion in the area of performance analysis over the past decade, the ability to
prepare for matches and specific opposition has increased. The ability to determine the outcome
of a match before this occurs provides team more time to prepare, and more information on their
opponent. The area of predicting performance is rather untouched with O’Donoghue and
Croucher leading the field in this research.
1.4 Statement of Hypothesis
1. Null hypothesis – the predictive models will not be able to predict a significant number of
results due to the amount of variables used.
Alternative hypothesis – the predictive models will be able to predict a significant number
of results with the amount of variables used.
2. Null hypotheses – the models will not be able to predict the form specific teams are in,
using the points each team are scoring to determine how they are performing.
Alternative hypotheses – the models will be used to predict the form specific teams are in,
based on the points each team is scoring.
-3-|Page
1.5 De-Limitations
Men’s Rugby – the study shall concentrate on male rugby teams that compete in the full
15- man version of the game
Rugby World Cup – the study shall concentrate on teams that have qualified for the
Rugby World Cup 2011
Time Period – the study focuses on the matches played in two complete cycles of Rugby
World Cups, a cycle consists of the two years previous to a tournament. The data
collection stars 1st January 2005, two years before the 2007 Rugby World Cup.
Data – data collection shall be from the official International Rugby Board site providing
accurate information on the matches over the time period necessary.
1.6 Limitations
Accuracy of IRB – due to the fact that the majority of data being collected is from one
source, the accuracy of the data is vastly important, and is completely reliable on the
accuracy of the IRB records.
Human Error – due to the large number of samples, human error might occur in the
processing of data, to overcome this, equations will be written up to increase the
automation of the process.
Location of Matches – due to the matches being played all over the world, the accuracy
of distance travelled between the capital city of a country and the final destination, might
vary using different maps, to overcome this, the same site will be used to determine all
the distances travelled.
1.7 Definition of Terms
‘IRB’ – refers to the International Rugby Board
‘RWC’ – refers to the Rugby World Cup
‘Ranking Points’ – refers to a system the IRB use to rank the position each team is in relation to
each other
‘Distance Travelled’ – refers to the distance travelled between the countries capital city and the
capital city of the country the match is being played in
-4-|Page
Chapter 2 Literature Review
2.1 Uncertainty in Sport Performance ............................................................................... - 6 2.2 Factors affecting outcomes of Rugby Matches ........................................................... - 7 2.3 Predictive Modelling techniques ................................................................................. - 8 -
Chapter 2 Literature Review
2.1 Uncertainty in Sport Performance
Statistics such as the number of team wins or the height of an athlete are readily available before
a match or a competition. So how do upsets occur in sport if all the data is available to analyse?
That’s the nature of sports; the unpredictability that anything can happen. Therefore, creating
models that can predict the outcome of a match, require a vast amount of data to be accurate. And
even with all of the data, one must accept that upsets do occur in sports every once in a while;
otherwise there would be no excitement; no need to watch sports, and no need to gamble in
sports.
Gambling and sports have almost always been inseparable. Sport betting is part of the wider
gambling industry and it can be defined as the general activity of predicting sports results by
making a wager on the outcome of a sporting event (Koning and van Velzen., 2009). In fact,
some sports derive their very existence from the popularity of its associated betting market (e.g.,
horse racing). Horse racing was the ground stone for sports betting due to the fact of how that
sport it set up, but since then the sport betting market has expanded in recent years to take
account of a growing demand for the opportunity to gamble on the outcome of a wide range of
sport event (Colantuoni et al., 2010). Most importantly in these hard times, it's a sector that is
continuing to grow. According to Global Betting and Gaming Consultants global online gambling
yield will increase by 25% over on the next three years, with a significant proportion coming
from global interest in football betting (Barr-Smith, 2010). Therefore the ability to predict sport
performance is vastly important as evidenced by the amount of betting agencies currently present
in the sport market (Stefani, 1998).
The implications of producing an effective model that can create adequate predictions, has a high
implication in the area of performance analysis. Performance analysis is a science that is actually
concerned with actual sport performance rather than the activities undertaken in the laboratory.
Therefore the implications of effective prediction models are rather large due to the effect that
extra preparation can have on an actual performance of an athlete or team (O’Donoghue, 2010a).
Taking this into consideration it must be stated that certain sports are easier to predict than others.
Tennis lends itself to explaining the chances of an upset occurring in a game. An upset is where
the stronger side (according to ranking or form) is defeated by the weaker side within the match.
The following Figure 2-1 shows the probability of an upset occurring over different amount of
sets.
-6-|Page
1
% of Form player to win
0.95
0.9
0.85
0.8
0.75
0.7
1 set from 1
0.65
2 sets from 3
0.6
3 sets from 5
0.55
0.5
0.5
0.6
0.7
0.8
Form of higher ranked player
0.9
1
Figure 2-1 % of higher ranked player to win three different types of tennis sets
This figure describes the probability of a favourite player to win either 1 set, best of 3 sets or best
of 5. For example if player A has been predicted to be favourite by 0.7 over the lower ranked
player, there is a chance that over one set the lower ranked player might win 0.3 of the time. As
more sets are played, the chances of a lower ranked player to win 2 out of 3 sets reduce to 0.216
of the times. The opportunity of an upset is reduced even more when the match is played to the
best of 5 sets. In this match the lower ranked player would win only 0.163 of the time. This
relates to Croucher’s (1998) study where he showed a direct relationship between points won in a
game and the probability of winning the set and then the match. This clearly demonstrates that
upsets have a higher chance of occurring in sports that are low scoring, as the higher the score the
more chances that the favourite will end up with the victory.
2.2 Factors affecting outcomes of Rugby Matches
The nature of rugby as a sport is generally considered to be a relatively high scoring sport with an
average of 39.69 ± 26.38 points for the higher ranked team and an average of 14.56 ± 9.62 points
scored for the lower ranked team during the 2007 Rugby World Cup. Therefore the chances of an
upset where the stronger side (according to ranking) is defeated by the weaker side within the
match are usually very unlikely. This gives predication models a high likelihood to predict the
winners of the matches while using just the relative quality of the performers (O’Donoghue et al.,
2008) and home advantage (Courneya and Carron, 1992; Nevill et al., 2002: Carron et al., 2005).
This is evidenced by two previous studies done on international rugby union tournaments. The
2004 study by O’Donoghue and Williams, found that the mean prediction accuracy of artificial
neural networks was 35.3 (88.1%) of the 40 pool matches and that of statistical based methods
was 37.7 (94.3%) of the 40 pool matches. Taking this into consideration, it must be mentioned
-7-|Page
that once the models attempt to predict the outcome of the knockout stages at an international
sporting tournament, the mistakes that certain upsets can cause at the group stage will propagate
threw the study and will create unrealistic results when the later stages of the tournament are
reached.
2.3 Predictive Modelling techniques
There have been a handful of studies producing prediction models based on previous sporting
events. These studies have used a variety of prediction techniques including linear regression,
discriminant function analysis, logistic regression, artificial neural networks, simulation and
qualitative techniques (O’Donoghue et al., 2008). For these studies to have been produced
correctly, it is important that measurement issues such as objectivity, reliability and validity are
subjectively analysed so that the models being applied are done to the correct set of data.
However, even when all measurement issues being properly considered, it is important to note
that most predictive models depend on several factors including whether the assumptions of
modelling techniques have been satisfied by the data used (Manly, 1994; Tabachnick and Fidell,
1996; Ntoumanis, 2001).
There are two stages involved in successfully using modelling techniques. The first stage of the
process involves using some dependent variables such as rest days (O’Donoghue, 2010b),
distance from home (Courneya and Carron, 1992; Nevill et al., 2002: Carron et al., 2005) and
ranking points (O’Donoghue et al., 2008). The scores that occur when the events occur, is the
independent variable in the study. This set of data provides the model to where the dependent
variables are still known but the independent variable isn’t and using these predictive modelling a
value based on the previous set of data can be provided. These statistical modelling techniques
have assumptions that should be satisfied by the data used to develop the models (Ntoumanis,
2001; Manly, 2005; Tabachnick and Fidell, 2007). However, taking into consideration the
previous studies undertaken in predictive modelling, it can be argued that satisfying the
assumptions of the modelling techniques does not always produce the most accurate forecasts of
the actual outcomes of matches (O’Donoghue, 2012).
There has been 6 studies undertaken in predicting performance and how satisfying the statistical
assumptions affected the overall predictions (O’Donoghue and Williams, 2004; O’Donoghue,
2005; O’Donoghue, 2006; O’Donoghue, 2009; O’Donoghue, 2010a). Out of these six studies
four of them determined that violating the assumptions gave more accurate predictions than when
the assumptions were satisfied. The Euro 2008 study (O’Donoghue, 2009) found that satisfying
the assumptions provided a more accurate set of results than by violating the assumptions.
-8-|Page
Nevertheless, O’Donoghue (2012) states that the actual difference between the results that appear
when the assumptions are violated and between the data that appears when the assumptions are
satisfied, might not be sufficient to justify the process in transforming the data variables. When
trying to transform the data to satisfy the assumptions of statistical modelling techniques,
different mathematical processes can be used. The problem arises due to the nature of the subject
that is trying to be predicted. Sports by its definition have a large number of unpredictable
variables and certain times events occur so that certain scores such as the 2007 match between
New Zealand versus Portugal ended with a score difference of +95 points for the All Black team.
When scores like these occur, it is necessary to transform the data so that it can satisfy the
necessary assumptions. The transformations could be cubed transformations, square root
transformations, logarithmic transformations or any mathematical transformation that provides a
standard normal distribution. It must be noted that at certain times no matter how many
transformations occur, there are too many outliers present and even removing these outliers will
only create more outliers in the data. This eventually leads to a point where there is a significant
amount of data points removed due to the outliers, which removes the models ability to predict
upsets (O’Donoghue, 2009).
The modelling techniques that will be used to produce the predictions was multiple linear
regression. Two different types of models will be created to produce 4 different sets of
predictions. The main difference between the two to types of models is that one will be created
by using the “Enter” method and the other with the “Stepwise” method. These are both ways to
produce a linear regression model. And to create the four final sets of predictions, these models
will be applied to both the raw, untransformed data and to the data that has been transformed to
satisfy the assumptions of multiple linear regression. Statistically there are certain assumptions
that one must fulfil to successfully create a predictive model.
The assumptions of multiple linear regression
(Ntoumanis, 2001; Allison, 1999; Field, 2009; O’Donoghue, 2012; Newell et al., 2010)

There should be at least 20 cases for each independent variable.

Linear regression assumes that the relationship between any independent variable and the
dependent variable is linear).

There must be no outliers in individual independent variables, the dependent variable or
residuals. As well as considering outliers within individual variables, we also need to
-9-|Page
check multivariate outliers. Distance measures such as Mahalanobis distances can be used
to identify outliers within the multivariate space

Multicollinearity should be avoided in the independent variables. This means that no pair
of independent variables should be highly correlated (the absolute values of r should be
less than 0.9).

Residuals should be independent, homoscedastic and normally distributed. Rather than
testing the distribution of the residuals for different subranges of each independent
variable, the predicted value for the dependent variable is used. Therefore we test that
there is little correlation between the predicted value of the dependent variable and the
absolute residual values to show homoscedasticity. Independence can be checked using
the correlation between the residuals and a variable representing the order of
measurement of the cases. Normality of the residuals can be tested using z-scores for
skewness and kurtosis which should both be between -1.96 and +1.96.
Due to the fact that the models that can be produced are based on over 470 previous data points,
they will produce descriptive enough results to be able to determine if a team is performing to
their ability and potential, or if they are underscoring and upsets are occurring. Using these
predictions and the actual results that occurred one will be able to statistically analyse certain
factors. The form of teams can be analysed by comparing the actual point difference to the
predictive point’s difference. This could provide insight into which teams are over performing
and which teams are underperforming. Therefore, these predictive models can provide data to
determine the form of the teams participating in the 2011 Rugby World Cup.
Taking the previous set of studies that have investigated the ability to satisfy assumptions, this
study will combine the effectiveness of satisfying assumptions and how precise the actual models
that created are. With these predictive models one could determine the form teams are competing
at during a tournament. Therefore the models should be able to determine which teams are more
reliable and which teams struggle to reach the expected form.
- 10 - | P a g e
Chapter 3 Methods
3.1 Data sources .............................................................................................................. - 12 3.2 Models ....................................................................................................................... - 14 3.2.1 Independent Variables ................................................................................................ - 14 3.2.2 Dependent Variable ..................................................................................................... - 14 3.2.3 The Relation between the Variables ......................................................................... - 14 3.2.4 Linear Regression Models .......................................................................................... - 15 3.2.5 Predictions vs. Rugby World Cup 2011 ................................................................... - 16 -
Chapter 3 Methods
3.1 Data sources
To correctly create prediction models there is an order of events that must take place. The first
step is to accurately collect the correct data in order to create a complete data source. One must
firstly understand what the final information they want to collect is, to then decide what
information they need. In this study, were the final goal is to determine how reliable certain teams
are based on the predictive models, one must collect data pertaining to three things: a team’s
score in previous matches, a team’s ranking points, and the match location’s from home. Once a
complete spreadsheet has been created, one can then move to produce the actual linear regression
model, firstly the ones that violate the assumptions because the data does not need to be modified,
and secondly the regressions that need the data to be transformed to satisfy the assumptions. This
process will produce the models to create the data necessary to answer the research question.
It is unclear which factors influence the final result most significantly. Some factors such as
distance can be less significant than values such as the quality of the opposition, the amount of
rest days or even the referee at the event. The final model will feature all the Tier 1 rugby nations
and every game they have played since January 1st, 2005, providing 470 matches for analysis. As
can be seen in Table 3-1 New Zealand is the team that appears the most, while there are 23 teams
in total being used in this study. For this study, the two relevant factors that will be analyzed will
be the International Rugby Board World Ranking Points and the distance travelled to the match.
Table 3-1 Sample Size
Number of Teams
23
Number of Matches
470
Team with Most Matches
New Zealand (94)
Team with Least Matches
Namibia (4)
The IRB World Rankings are calculated using a 'Points Exchange' system. Every single
international team has a rating usually between 0 and 100. The better sides in the world would
usually have a ranking around 85-90+ points. With new countries starting on 40 points. The
exchange of points occurs when two teams play each other. The amount of points the winning
team might acquire is directly correlated to the amount of points the losing team will lose,
creating a direct exchange. The amount of points exchanged, is correlated to the strength of each
- 12 - | P a g e
team, the margin of victory, and it takes into consideration home advantage. The IRB provide a
three point “handicap” to the home team proving the importance of home advantage.
To determine how far a team has travelled to a match, a giant circle between the country’s capital
city and the capital city of the host nation will be drawn and the distance will be calculated. This
information will be gained from an internet based distance calculator (Indonesia, 2006)
When determining which factors to use as Independent Variables, the amount of rest days each
team has determined to be of significant importance. This is based on other studies that have
attempted to predict previous Rugby World Cups (O’Donoghue and Williams, 2004;
O’Donoghue, 2005; O’Donoghue, 2006; O’Donoghue, 2009; O’Donoghue, 2010a). This author
admits that there is a significant importance in the amount of rest days available, but due to how
the predictive model will be created, the rest days between certain matches will not be significant
enough. For example, the Rugby World cup commences with 4 pools of 5 teams, with a round
robin system therefore the rest days will be different varying from the shortest rest period of 4
days to the longest rest period of 11 days, The mean for these rest days is of 7 days while the rest
days are distributed so that the lower quartile is at 5.75 days while the higher quartile is of 8 rest
days. The average rest period for the Rugby World Cup is still shorter than the usual rest period
that teams get at the 6 Nations which is usually 10.5 days. And the lower quartile being 7 days
long which is the mean rest period for the WRC. Therefore the significance of rest days is second
value to over twice the amount of data points that the previous studies have had.
The data for the Independent Variables must be collected before any statistical interpretation can
be undertaken, and thus, an Excel spread sheet was created to gather the data. Once the data is
collected, a more advanced statistical analysis tool such as IBM SPSS Statistics 19, is then
utilized. This will allow the vast amount of data to be analyzed and will provide the specific
mathematical analysis that is necessary to create the models.
The first step when producing predictive models, is considering the assumptions: one of them
being that all of the independent variables were normally distributed (Field, 2009). To test this, a
Kolmogorov-Smirnov test was used to determine wheter the independent variables were normally
distributed. With a (P <0.05 it was determined that the variables were not normally distributed,
and thus the so the data must then be transformed by means of square roots or logarithmic
transformation until it is normally distributed (Nevill, 2000).
- 13 - | P a g e
When the data is available in both the original form violating assumptions and transformed, one
will be able to start the regression methods. SPSS provide three different types of regression
models: The Hierarchical, Enter, and Stepwise Models. The Hierarchical Model is one which
allows a researcher to determine the order the data is entered into the model. The Enter Model is
one in which all predictors are forced into the model simultaneously. And lastly the Stepwise
model is one which allows the researcher to determine which variable to use providing specific
variables and therefore providing a model based on only one variable.
3.2 Models
3.2.1 Independent Variables
The models used two independent variables which were decided based on previous research. The
data was then collected so that the higher ranked of the two teams within the match was
considered Team A. The IRB World Rankings were collected before the International break
allocated for the matches started using the IRB World Rankings, leading to the following
independent variables.

The difference in World Ranking Points:
Rankδ: higher ranked team’s value – lower’ ranked team’s value.

The difference in distance travelled to the tournament:
Distδ: higher ranked team’s value – lower’ ranked team’s value.
3.2.2 Dependent Variable
The dependent variable was the point’s difference, Pδ, between the higher ranked team in a
match, and the lower ranked team. If the higher ranked team won the match then this variable
would be positive, if the match was an upset, the values would be negative, and if the match was
a draw then the value would be 0.
3.2.3 The Relation between the Variables
Before the models were created, a Pearson Correlation had to be run to understand how the data’s
relationship and it’s affect n the models and each other. The difference in distance is negatively
correlated to the score of the match with a coefficient of r= -0.133. This means that the further
from home a team plays, the more points they will give away, which coincides with the
- 14 - | P a g e
assumptions previously discussed (Ntoumanis, 2001; Allison, 1999; Field, 2009; O’Donoghue,
2012; Newell et al., 2010). The more significant of the two variables is how the difference in
ranking points has a coefficient of r= 0.551, which means that the final score is impacted more
by which team is better based on their ranking points, than the distance they have travelled. To
state how significant each of the two variables are, by using R2 one will be able to determine how
much of the entire score each of the variables affect, and while the distance from home only
affects 1.77% of the final score. The ranking points that each team have and how much better one
team is than the other affects the final result by 30.36%.
3.2.4 Linear Regression Models
Model A: “Enter” method with assumptions violated
The models were created using all 471 rugby matches in the data set, with both the distance from
home and the ranking pointes entered in their raw form without any transformation or removal of
outliers. After running the linear regression with the “Enter” Method the model that was produced
is the following:
Pδ = -2.458 + 2.207 Rankδ - 0.000269 Distδ
The main reason why the data violated the assumptions, is that both the Rankδ and Pδ contained
outliers. The data in the Pδ was leptokurtic (ZKurt = +2.714) while the Distδ was normal. What
this data actually meant was that for every 10,000 km a team travelled they lost 2.69 points in
their final score. The residuals for this model were leptokurtic (ZKurt = +1.517) and were
determined not to be normal after running a Kolmogorov-Smirnov test, which resulted with p =
0.004. The correlation between predicted values for Pδ and the residual values (r=0.271) which
meant that the homoscedasticity of the data could be assumed. The residuals had a mean±SD of 1.974±18.21.
Model B: “Stepwise” method with assumptions violated
The model was created using all 471 rugby matches in the data set; however, the more influential
of the two independent variables was used. The data was left untransformed, and the regression
was then run under the “Stepwise” Method producing the following model.
Pδ = -2.31 + 2.202 Rankδ
This model violated the assumptions of linear regression due to the fact that Rankδ contained
outliers. The residuals for this model were leptokurtic (ZKurt = +1.468) and were determined not
- 15 - | P a g e
to be normal after running a Kolmogorov-Smirnov test that came out with p = 0.002. The
residuals in this model had a a mean±SD of -2.385± 18.42. The data in this model had a
correlation between predicted values for Pδ and residual values of r=0.291, meaning that the
homoscedasticity of the data could be assumed.
Model C: “Enter” method with assumptions satisfied
To satisfy the assumptions, the data in the ranking points had to be modified so that no outliers
existed. To do this, the ranking points where transformed by calculating the square root. This
modified the data and provided the necessary transformations for the data to satisfy the
assumptions.
Pδ = -1.573 + 1.577 Rankδ – 0.00565 Distδ
The residuals for this model were leptokurtic (ZKurt = +1.729) and were determined not to be
normal after running a Kolmogorov-Smirnov test that came out with p = 0.005. The residuals in
this model had a a mean±SD of -1.469± 18.66. The data in this model had a correlation between
predicted values for Pδ and residual values of r=0.292, meaning that the homoscedasticity of the
data could be assumed.
Model D: “Stepwise” method with assumptions satisfied
The last model created used the same transformations to satisfy the assumptions creating the
following model.
Pδ = -1.539 + 1.560 Rankδ
The residuals for this model were leptokurtic (ZKurt = +1.661) and were determined not to be
normal after running a Kolmogorov-Smirnov test that came out with p = 0.002. The residuals in
this model had a a mean±SD of -2.258± 18.92. The data in this model had a correlation between
predicted values for Pδ and residual values of r=0.312, meaning that the homoscedasticity of the
data could be assumed.
3.2.5 Predictions vs. Rugby World Cup 2011
Once the models were created, the data that had been previously collected in relation to Distδ and
Rankδ was inserted into the models creating actual predictions of who would win the matches
between each team and by how many points. To determine how accurate the models actually are,
the data must then to be compared to the actual results of the Rugby World Cup.
- 16 - | P a g e
Chapter 4 Results
4.1 Expected Score Difference ........................................................................................ - 18 4.2 Accuracy of Scores.................................................................................................... - 18 4.3 Accuracy of Predictions ............................................................................................ - 19 4.4 Predicted Final Stages ............................................................................................... - 20 4.5 Actual Scores vs. Predictions .................................................................................... - 23 -
Chapter 4 Results:
4.1 Expected Score Difference
Once the models have been created, and compared to the actual results of the tournament, certain
patterns and answers begin to appear, indicating model accuracy. Taking into consideration all
the points scored during the tournament, it was important to see how close each model was to
predicting the overall points scored per game. It is clear that the models under predict the number
of points that have been scored during the tournament. It is particularly important to understand
which models most accurately predict the actual points scored in a game. Table 4-1 shows that
the models that violate the assumptions are within 3 points of correctly predicting the accuracy of
the final score. It is noted that the closest model is only 119 points off from the total number of
points that were scored during the RWC 2011, which totalled of 1063 points.
Table 4-1 Difference in points scored during the Rugby World Cup 2011
Expected Score
Actual Score
Difference
Points predicted per Game
Actual Point difference per Game
22.15
Enter
Stepwise
Violating
Satisfying
Violating
Satisfying
Assumptions
Assumptions
Assumptions
Assumptions
19.66
13.24
19.30
2.98
-2.49
-8.90
-2.85
-19.17
4.2 Accuracy of Scores
Taking this into consideration, one can then start understanding how accurate the predictions are
and how close they were to getting the correct result. It is important to note that rugby is a sport
in which a lot of points are scored and that for predictions to be within one score, even if it’s a
penalty, a try or a converted try is very complicated. When looking at the data shows that the
models are predicting the total number of points scored in a game to within 3 points in 25.0% of
the matches, and as the margins of points scored increase, the matches that are correctly predicted
to within a converted try rise to 39.6%. If the data is extrapolated, all the matches will be
predicted within a margin of 20 points or less, and in the case of the model that was created using
the Enter method and satisfying Assumptions, it would reach all the correct scores within 13
points.
- 18 - | P a g e
Table 4-2 Accuracy of Predicted Scores in the Rugby World Cup 2011
Accuracy of Score
Enter
Stepwise
Violating
Satisfying
Violating
Satisfying
Assumptions
Assumptions
Assumptions
Assumptions
Matches Within 3 Points
25.0%
16.7%
18.8%
20.8%
Matches Within 5 Points
31.3%
27.1%
33.3%
22.9%
Matches Within 7 Points
35.4%
37.5%
39.6%
29.2%
4.3 Accuracy of Predictions
After taking into considerations the accuracy of the scores, it is important to understand how
accurate the predictions are to determine the correct winner of the match. To determine this, one
must understand that a win is a win no matter by how many points; so, if the predictions
accurately determine who won the match it counts as an accurate prediction. With this in mind,
Table 4-3 shows that the overall accuracy of the models is extremely high, accurately predicting
the scores of 89.58% and when modifying the margins to within a converted try, this goes up to
91.67%. This is predicting a very high number of correct results, but to completely understand the
data, one must look at the results that were wrongly predicted. Out of the 8 games that were
incorrectly predicted, 5 of them were within the 7 point bracket, 1 was a draw and the other two
had a score difference of +7point bracket. The models product was a score difference so none of
them were able to predict a draw as an outcome.
Table 4-3 Accuracy of Correct predictions in the Rugby World Cup 2011
Accuracy of Predictions
Enter
Stepwise
Violating
Satisfying
Violating
Satisfying
Assumptions
Assumptions
Assumptions
Assumptions
Outcome Correctly Predicted
89.58%
83.33%
85.42%
85.42%
Outcome Correctly Predicted ± 3Points
89.58%
87.50%
89.58%
89.58%
Outcome Correctly Predicted ± 5Points
89.58%
89.58%
89.58%
89.58%
Outcome Correctly Predicted ± 7Points
91.67%
89.58%
89.58%
89.58%
By taking the information in Table 4-3 one can expand on the information provided and
determine how big of a margin would be necessary for the models to predict 100% of the
matches.
- 19 - | P a g e
% of Correct Predictions
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
0
2
4
6
8
10
12
Points Required
14
16
Enter Violating Assumptions
Enter Satisfying Assumptions
Stepwise Violating Assumptions
Stepwise Satisfying Assumptions
18
20
Figure 4-1 Point Margin Required for Correct Predictions
Figure 4-1 shows the margin of points required for the models to be completely accurate, and
even though it is potentially not feasible to create a model were are margin of ±17 points to
determine all the correct scores as is the case for “Enter” Violating Assumptions model, one can
see that two models do get to 100% of correct predictions within ±10 points. This is quite
significant due to the fact that it does provide a good overview of how close and accurate the
models created are.
4.4 Predicted Final Stages
The prediction models gives the opportunity to create the actual knock out stages of the
tournament. Due to the fact that the predictions were created before the tournament, it is
important to understand that these knock out stages are what the model predicted at the earliest
stage of the tournament.
- 20 - | P a g e
Q.Final 1
WA
New Zealand
RU B
Argentina
S.Final 1
New Zealand
Q.Final 2
WD
South Africa
RU C
Ireland
South Africa
Final
New Zealand
Australia
Q.Final 3
WB
England
RU A
France
S.Final 2
England
Q.Final 4
Australia
1st
nd
WC
Australia
2
RU D
Wales
3rd
4
th
New Zealand
Australia
South Africa
England
Figure 4-2 "Enter" Linear Regression Knock Out Stage
Figure 4-2 shows the complete knock out stage, but is done on a win only basis; thus, the actual
score by which the teams won is not included in the figure. It also shows how both predictive
models that used the “Enter” technique came out with the same final knock out stage, with each
of the rounds being won by the team with the higher ranking.
Figure 4-3 shows the models using the “Stepwise” technique. The model that violated the
assumptions, and the model that satisfied the assumptions, both came out with the same final
knock out stage. The main difference in the actual difference between both models is the fact that
England beats Australia in Semi Final 2. This is interesting because Australia is both the high
ranked team and the team that has travelled the least; therefore, this result is an upset.
- 21 - | P a g e
Q.Final 1
WA
New Zealand
RU B
Argentina
S.Final 1
New Zealand
Q.Final 2
WD
South Africa
RU C
Ireland
South Africa
Final
New Zealand
England
Q.Final 3
WB
England
RU A
France
S.Final 2
England
Q.Final 4
Australia
1st
nd
WC
Australia
2
RU D
Wales
3rd
4
th
New Zealand
England
South Africa
Australia
Figure 4-3 "Stepwise" Linear Regression Knock Out Stage
When interpreting the predicted knock out stages, one can take a couple of factors into
consideration. The winner of the tournament is always the same: New Zealand wins the
tournament, in all four prediction models, but this is understandably due to the fact that the model
is constructed using both ranking points, and the distance travelled, and considers that New
Zealand was both number one in the world and were hosting the tournament, they were clear
favourites to win. The next factor that is important is the fact that none of the four predictions had
France beating England. Due to their unpredictability the French ranking points changed from
83.78 before the start of the tournament, dropping down to 79.72 and finishing at an 84.79 after
the final. Over the period of the tournament, France’s ranking points travelled a range of 9.13
points. Another factor to consider, is how neither Ireland, nor Wales, was able to beat their
southern hemisphere rivals in any of the predictions. Therefore, even if all of the predictions
correctly determined the outcome, the factors that are worth analysing are those that occurred in
the matches before the final.
4.5 Actual Scores vs. Predictions
To actually see how certain teams performed in the tournament and if they over performed or
underperformed one can use the points being scored as a guideline to evaluate their performance.
As Table 4-4 Comparison of Points Scored per Match shows certain teams such as teams like
Ireland, Australia, Italy and Scotland are very reliable teams that are scoring as expected. This
- 22 - | P a g e
also shows how certain teams like Fiji and Namibia are underperforming and teams such as
Wales are over performing significantly. This table provides information on how accurate certain
models were to actual specific teams providing information on which teams were easier to
predict. As stated teams such as Ireland were scoring 17.80 points a game. The most accurate of
models predicted their scoring pattern to within 0.04 points a game.
Table 4-4 Comparison of Points Scored per Match
Comparison of
Points Scored per
Enter
Actual Point
Difference
match
Stepwise
Violating
Satisfying
Violating
Satisfying
Assumptions
Assumptions
Assumptions
Assumptions
Argentina
5.40
2.489±16.31
2.162±11.19
2.361±15.04
0.233±2.578
Australia
16.57
20.41±29.97
16.46±25.21
19.47±27.76
2.583±3.497
Canada
-21.50
-16.05±17.28
-12.23±13.62
-16.45±16.06
-2.445±2.222
England
19.20
12.681±15.36
7.896±9.962
13.38±15.50
2.083±2.382
Fiji
-27.00
-5.998±15.12
-2.035±9.004
-8.356±15.77
-1.350±2.532
France
5.00
-1.342±19.95
-2.059±12.76
-0.043±19.90
-0.025±3.332
Georgia
-10.50
-12.33±12.01
-7.541±6.882
-12.39±12.03
-2.025±2.136
Ireland
17.80
17.84±15.00
11.87±10.05
17.39±14.89
2.683±2.280
Italy
-0.75
-0.417±18.86
-2.062±13.92
0.694±17.67
0.1641±2.827
Japan
-28.75
-15.90±16.13
-10.91±12.43
-14.94±16.07
-2.011±2.338
Namibia
-55.50
-32.80±20.49
-22.92±14.75
-31.94±20.69
-4.476±2.615
New Zealand
32.71
28.60±12.38
21.80±11.01
25.83±12.55
3.909±1.287
Romania
-31.25
-24.33±15.34
-15.98±10.56
-23.92±15.19
-3.719±2.201
Russia
-34.75
-30.72±23.25
-23.16±20.17
-29.91±22.23
-4.105±2.705
Samoa
10.50
-2.178±11.91
0.144±7.045
-4.398±12.28
-0.908±2.158
Scotland
3.50
7.887±11.40
4.743±6.830
7.566±11.70
1.209±1.985
South Africa
28.00
21.67±19.28
13.94±13.95
22.09±19.85
3.122±2.597
Tonga
-4.50
-5.50±16.94
-5.011±11.30
-3.941±17.70
-0.436±2.562
USA
-21.00
-22.36±20.16
-15.64±14.51
-22.59±20.08
-3.045±2.721
Wales
22.00
6.138±17.09
3.257±10.53
8.444±17.18
1.207±2.654
- 23 - | P a g e
Chapter 5 Discussion
5.1 Comparison in Studies .............................................................................................. - 25 5.2 Upsets ........................................................................................................................ - 26 5.3 Accuracy of Predictive Models ................................................................................. - 27 5.3.1 Teams Underscoring ................................................................................................... - 27 5.3.2 Teems Meeting Expectations ..................................................................................... - 28 5.3.3 Teams Over Scoring .................................................................................................... - 29 5.4 Form .......................................................................................................................... - 31 5.5 Limitation of Models ................................................................................................. - 33 5.6 Comparison to other sports ....................................................................................... - 33 -
Chapter 5 Discussion
5.1 Comparison in Studies
The process that has been used to create the models in this investigation is similar to previous
studies undertaken in predicting performance (O’Donoghue and Williams, 2004; O’Donoghue,
2005; O’Donoghue, 2006; O’Donoghue, 2009; O’Donoghue, 2010a). The main difference with
the previous studies is the size of the data sample; the previous studies have been limited to the
number of games that have occurred during Rugby World Cups. The latest study by O’Donoghue
(2012) had 232 matches over 25 years, while this current study doubles the amount of data points
by 202% and reduces the period of time to 5 years. This extended amount of data in a smaller
period of time increases the value of the data, due to the closer correlation of how the teams
actually perform. This is part of why the accuracy of results in this study is generally higher than
previous studies undertaken.
Table 5-1 Comparison in Accuracy
Enter
Accuracy of Predictions
Stepwise
Violating
Satisfying
Violating
Satisfying
Assumptions
Assumptions
Assumptions
Assumptions
Outcome Correctly Predicted in O'Donoghue (2011)
75.19%
74.88%
76.58%
76.50%
Outcome Correctly Predicted in this study
89.58%
83.33%
85.42%
85.42%
Increase in accuracy
19.14%
11.29%
11.54%
11.66%
The scores show a general increase in accuracy of 13.41% which is directly related to the period
of time the data was taken from. It must be noted that the previous study included rest days
between matches, which provides an extra independent variable that will increase the accuracy of
the overall predictive model. Due to the data sample that was used for this study, this was not a
possibility. Further analysis into Table 5-1 Comparison in Accuracy , leads to noticing that the
biggest difference between predictive models is in the “Enter” method while violating
assumptions. This is due to the number of upsets that occurred during the Rugby World Cup.
- 25 - | P a g e
5.2 Upsets
Out of the 48 matches that occurred at the RWC 2011, there are five matches that can be
considered as upsets. To consider a match an upset in must have the following characteristics:


Lower Ranked Team to Win
Winning Margin to be over 5 points
Even if it’s only two characteristics, this occurs extremely rarely in a sport that is as a high
scoring as rugby. There a couple other matches that come close to fulfilling these requirements
but due to the small margin in points such as Australia beating South Africa by two points, or
France beating Wales by one point, they will not be considered as upsets. Table 5-2 shows the
five matches that were the upsets in the Rugby World Cup 2011.
Table 5-2 Upsets During the Rugby World Cup 2011
Date
Ranking Points Team A
Team A
Score
Team B Ranking Points Team B
14-Sep-11
72.48
Tonga
20-25
Canada
71.56
17-Sep-11
88.84
Australia
6 - 15
Ireland
78.50
01-Oct-11
83.78
France
14 - 19
Tonga
72.48
08-Oct-11
84.54
England
12 - 19
France
79.72
08-Oct-11
83.14
Ireland
10 - 22
Wales
80.73
These upsets affected the tournament in various ways. Tonga participated in two upsets during
the group stage that could have affected the final outcome of the tournament in various ways.
Tonga lost to Canada in their second game of the tournament, and then went to upset the finalist,
France, in a match that none of the models were able to predict. Had Tonga beat Canada, it would
have been then making it threw to the quarter finals instead of France. This didn’t occur, so thus
the tournament proceeded how it was expected with the first two seeds of each group going
through to the final round. What did occur in Group C was Ireland upsetting Australia beating
them convincingly by nine points to create a Knock Out draw that was divided into Northern
Hemisphere teams on one side and Southern Hemisphere teams on the other. The last three upsets
all occurred in the Quarter Finals, where only one of the four matches was won by the higher
ranked team. England and Ireland lost to France and Wales, respectively, sending them home
earlier than the higher ranked teams would have hoped.. To visually interpret how unpredictable
the finals scores actually were, Figure 5-1 provides a visual representation of the data.
- 26 - | P a g e
30
Actual Score Difference
Predicted Score Difference
25
Points Scored
20
15
10
33.4
5
23.2
15.2
0
0
1
2
3
4
5
6
7.6
14.9
-5
-10
-15
Figure 5-1 Matches with Upsets
The further the distance between two sets of data the bigger the upset was; leading to the biggest
upset being Australia’s loss to Ireland. Had everything gone as planned, the knock out stages
would have a different look, with Australia and New Zealand on opposite sides of the draw which
would have provided for a different set of results.
5.3 Accuracy of Predictive Models
There were a total of four models produced that provided predictions to the actual outcome of the
RWC 2011. As stated previously in Table 5-1, certain models were able to predict the score of
each team with greater accuracy than others. Figure 5-2 shows how accurately the models were
able to predict the score difference. To discuss the accuracy of the models, the teams will be
broken into three groups based on where their predictions are, in comparison to the actual point
difference.
5.3.1 Teams Underscoring
The category of teams is characterised by the fact that the models had predicted them to score
more points that they actually did. To fall into this category, teams must be losing by more than a
try. The main reasons they are underperforming are most likely directly related to the two
independent variables that this study is based on; but, there are a couple of other variables that are
worth discussing. The teams that are underscoring are:
- 27 - | P a g e


Fiji -32.99
Namibia -22.7

Japan -12.85

Romania -6.92

Canada -5.45
The main characteristic of the teams in the category is the fact that they are all ranked outside the
top 10 in the world. This means that the amount of data to their actual performances is reduced in
comparison to those higher ranked teams. This also means that the access to competitive matches
versus the higher ranked teams are not as common therefore providing a bigger gap in the quality
of play which, makes it easier for the higher ranked teams to beat team. The lack of weekly rugby
for these nations where quality leagues are not in place will also lead to lower fitness levels;
therefore, opposition will usually score a higher number of tries in the last quarter of the match.
5.3.2 Teems Meeting Expectations
In this category, teams are doing exactly what is expected of them, scoring ±5 points a game, and
thus either keeping the opposition to within a try, or scoring the amount of tries that was expected
of them. Teams in this category are ranked all over the place and are scoring according to what
they were expected to score. These teams are:

Scotland -4.39

Russia -4.03

Australia -3.84


Ireland -0.04




Tonga +1
USA +1.36
Georgia 1.83
Argentina +2.91


Italy -0.33
France +3.66
New Zealand +4.11
It must be mentioned that even though Australia did come third in the overall tournament, they
still did not perform as well as they were expected. This might be due the fact that the style of
- 28 - | P a g e
rugby that they usually play was not achievable during prolonged periods of time during the later
stages of the knock out stages of the world cup. On the other hand, New Zealand was beating
teams by what was expected of them in the tournament. Even if their overall point difference was
brought down in the final when they only managed to win by one point, they still were able to
keep a positive record. The three teams outside the top 10 in the world that performed better than
expected; provide a clear statement that there is progress being made in the development of the
lesser rugby nations.
5.3.3 Teams Over Scoring
These teams are scoring more than a try a game more than they are predicted. The main reason
why they might be scoring this amount of points is due to the fact that the all had group stage
matches were they had significant victories over their opposition. The teams over scoring are:

South Africa +6.33

England +6.52

Samoa +8.32

Wales +15.86
The fact that both England and South Africa have scored more than they were expected is due to
the fact that they both won their groups, putting in big scoring results in the process. Samoa’s
case is somewhat more interesting due to the fact that they did not make it out of their group, but
did manage to score +8.32 points more than they were expected a game. The case of Wales is
quite impressive due to the fact that they break their model by scoring more than 15.86 points a
game over what was predicted. The main reason that their scoring average was so high, is the
fact that not only did they score in every game, but in the games they did win, they won by a
margin of +159 and the three games they did loose they lost by a margin of -5 points.
- 29 - | P a g e
Wales
Argentina
40
Australia
30
USA
Canada
20
10
Tonga
England
0
-10
-20
South Africa
Fiji
-30
Scotland
-40
Actual Point Difference
-50
"Enter" Violating Assumptions
France
-60
"Enter" Satsifying Assumptions
"Stepwise" Violating Assumptions
"Stepwise" Satisfying Assumptions
Samoa
Georgia
Russia
Ireland
Romania
Italy
New Zealand
Japan
Namibia
Figure 5-2 Comparison of Models to Actual Point Difference
- 30 - | P a g e
5.4 Form
To completely understand what these models are providing, the form of teams must be analysed.
The models provide data so that a final difference in score can be extrapolated. This provides an
outcome to the entire tournament. As stated earlier, none of the models predicted all the correct
outcomes of the results, but what the models did successfully recreate was the amount of points
they were expected to score, and by using this information and comparing the actual points they
did score, one can understand that the closer these two pieces of information are to each other, the
closer the teams are to their predicted form. Figure 5-3 shows how accurate the models were to
predict the form teams were in. The nearer both sets of data get to each other the closer that team
is to the actual form that was predicted before the tournament. It also determines teams that are
performing above their predicted form, in the case of Wales where their predicted score if far
below the actual form they were in, or the opposite such as Fiji were their predicted form was far
above the form they were expected to be in. Although previous studies of rugby union profiling
have successfully assessed performance scores and relative form (Bracewell, 2003; Jones et al.
2008), this study provides quantifiable data on how close to form the teams actually were.
- 31 - | P a g e
40
30
20
10
PointS Scored
0
-10
-20
-30
-40
-50
-60
Actual Point Difference
Predicted Point Difference
Figure 5-3 Form of Teams in RWC 2011
- 32 - | P a g e
5.5 Limitation of Models
As accurate as the models are, there are certain variables that can never be accounted for. This
study focused on two variables with a high number of data sources, but previous studies limited
the data sources to increase a value which would be rest days (O’Donoghue and Williams, 2004;
O’Donoghue, 2005; O’Donoghue, 2006; O’Donoghue, 2009; O’Donoghue, 2010a). This study
shows that a higher number of data points have provided an increase in the accuracy of results.
Other factors that could increase the accuracy of the models would be the ability to retain
possession, the amount of ball kicked away, or the effectiveness of their set piece (Carter, 1996).
These characteristics could increase the accuracy of the overall model. However, this would
complicate the modelling process due to the fact that each team would have a specific variable to
quantify the qualities of their game. These are still quantifiable characteristics that can be given a
numerical order and included into the modelling process, unlike other characteristics which may
give the model more accuracy, but would be nearly impossible to measure, such as, team
cohesion, and how they interact with each other (Bull et al., 2005; Gucciardi et al., 200, Thelwell
et al., 2005). Mental toughness is also an important psychological aspect that can be used to
produce more accurate models, being able to generally cope better than the opponents with the
demands in a particular sport (Jones et al, 2002). Other factors such as sleep, crowd distribution
and referee bias could also be included (Nevill et al., 1996; Walters and Lovell, 2002).
The study focuses on the predictive models that have been created to produce the point difference
in matches, other studies use their predictive model to create a simulation of 1000, 5000 or even
10,000 matches and provide a percentage of occurrences that the matches actually have specific
outcomes (O’Donoghue, 2006). By using this method one can get deeper predictions and provide
an actual number of chances that a specific team has to win the tournament, or to successfully
make it out of their group.
5.6 Comparison to other sports
This modelling technique lends itself extremely well to this type of team sport. The factors the
model needs to work consistently and accurately are large scoring games so that a pattern can be
predicted. In studies where Football has been used as an example, the accuracy drops due to the
higher chance of upsets. Other high scoring invasion sports such as Basketball or Australian
football might provide a good background for further research. At the same time other sports such
as tennis or cricket, don’t apply themselves as well to predictive modelling.
- 33 - | P a g e
Chapter 6 Conclusion
6.1 Summary of Findings ................................................................................................ - 35 6.2 Hypothesis Acceptance or Rejection......................................................................... - 35 6.3 Future Research ......................................................................................................... - 35 -
Chapter 6 Conclusion
6.1 Summary of Findings
From this study, it is evident that certain models are more accurate than others in predicting the
outcome of a tournament. The fact that the accurately predict approximately around 89% of the
results, indicate that these models can be used to predict tournament outcomes: the matches that
are not correctly predicted provide interesting information due to the fact that they are upsets.
The form the teams are in based on the predicted results, gives us clear ideas of how specific
teams are performing. Finally, there are many possibilities in how the data is used to create
further information and to provide quantifiable data on form.
6.2 Hypothesis Acceptance or Rejection
Following the research and results of this study both null hypotheses can be rejected as 1. The predictive models were able to predict a significant number of results the amount
of variables in the study.
2. The models were able to predict the form teams were in compared to how they actually
performed.
6.3 Future Research
In addition to the research conducted a natural progression over the next RWC. This study could
be extended to other tournaments, such as the yearly tournaments played in the northern and
southern hemisphere. The main expansion for this research is to increase the number of variables
in the study increasing the number of independent variables, which would
increase
the
probability of predicting the upsets that occur during the tournament.
- 35 - | P a g e
References
References
Allison. P. (1999). Multiple Regression, Oakland: Pine Forge Press.
Barr-Smith, A. (2010). Sports Betting: United Kingdom. International Sports Law Review
Pandektis, 3/4, 155-158.
Bathgate, A., Best, J.P., Craig, G. and Jamieson, M. (2002). A prospective study of injuries to
elite Australian rugby union players. British Journal of Sports Medicine, 36(4), 265-269.
Bracewell, P. J. (2003). Monitoring meaningful rugby ratings. Journal of Sports Sciences, 21,
611–620.
Bull, S., Shambrook, C., James, W., and Brooks, J. (2005). Towards an understanding of mental
toughness in elite English cricketers. Journal of Applied Sport Psychology, 17, 209-227.
Carron, A.V., Loughhead, T.M. and Bray, S.R. (2005). The home advantage in sport
competitions: Courneya and Carron’s (1992) conceptual framework a decade later.
Journal of Sports Sciences, 23: 395–407.
Carter, A. (1996). Time and motion analysis and heart rate monitoring of a back-row forward in
first class rugby union football. In Notational Analysis of Sport, I and II (edited by M.
Hughes), pp. 145-160.
Colantuoni, L., Novazio, C., Izar, A. and Pozzi, M. (2010). Betting in Sport. International Sports
Law Review Pandektis, 8, 3/4, 281-293
Courneya, K.S. and Carron, A.V. (1992). The home advantage in sports competitions: a literature
review. Journal of Sport and Exercise Psychology, 14, 13-27.
Croucher, JS (1998). Developing strategies in tennis. In J. Bennett (Ed.). Statistics in sport, 157171, London: Arnold.
Duthie, G., Pyne, D. and Hooper, S. (2003). Applied physiology and game analysis of rugby
union. Journal of Sport Medicine, 33(13), 973-991.
Field, A. (2009). Discovering Statistics using SPSS, 3rd Edition, London: Sage Publications Ltd
Gucciardi, D. F., Gordon, S., and Dimmock, J. A. (2008). Towards an understanding of mental
toughness in Australian football. Journal of Applied Sport Psychology, 20, 261–281.
Indonesia (2006). Distance [on-line]. www.indo.com/distance [accessed October 2011].
IRB (2012). Player Charter [on-line]. http://www.irb.com/aboutirb/organisation/index.html
[accessed February 2012].
IRB (2012). World Ranking Explanation [on-line].
http://www.irb.com/rankings/explain/index.html [accessed February 2012].
- 37 - | P a g e
James, N., Mellalieu, S., Jones, D. and Nicholas, M. P. (2005). The development of
positionspecific performance indicators in professional rugby union. Journal of Sports
Sciences, 23(1),
Jones, G., Hanton, S., and Connaughton, D. (2002). What is this thing called mental toughness?
An investigation of elite sport performers. Journal of Applied Sport Psychology,
14, 205_218.
Jones, N., James, N., and Mellalieu, S., (2008), An objective method for depicting team
performance in elite professional rugby union, Journal of Sport Sciences, 26, 7, 691-700.
Koning, R, and van Velzen, B., (2009). Betting Exchanges: The Future of Sports Betting?
International Journal of Sport Finance, 4, 1, 42-62
Manly, B.F.J. (1994). Multivariate statistical methods: a primer, 2nd Edition, London: Chapman
Hall.
Nevill, A.M., Balmer, N.J. and Williams, A.M. (2002). Can crowd reactions influence decisions
in favour of the home side? In Science and Football IV, Edited by Spinks, W., Reilly, T.
and Murphy, A. (London: Routledge), 308-319.
Nevill, A.M., Newell, S.M. and Gale, S. (1996). Factors associated with home advantage in
English and Scottish soccer matches, Journal of Sports Sciences, 14, 181-186.
Nicholas, C.W. (1997). Anthropometric and physiological characteristics of rugby union football
players. Sports Medicine, 23(6), 375-396.
Ntoumanis, N. (2001). A step-by-step guide to SPSS for sport and exercise studies, London:
Routledge.
O’Donoghue, P.G. (2006). The effectiveness of satisfying the assumptions of predictive
modelling techniques: an exercise in predicting the FIFA World Cup 2006, International
Journal of Computing Science in Sport(e), 5(2), 5-16.
O’Donoghue, P.G. (2009). Predictions of the 2007 Rugby World Cup and Euro 2008, 3rd
International Workshop of the International Society of Performance Analysis of Sport,
Lincoln, UK, 6th-7th April 2009.
O’Donoghue, P.G. (2010)a. Research methods for sports performance analysis, London :
Routledge.
O’Donoghue, P.G. (2010)b. The effectiveness of satisfying the assumptions of predictive
modelling techniques: an exercise in predicting the FIFA World Cup 2010, International
Journal of Computer Science in Sport, 9(3), 15-27.
O’Donoghue, P.G. (2012). The Assumptions Strike Back! A comparison of prediction models
for the 2011 Rugby World Cup,
- 38 - | P a g e
O’Donoghue, P.G. and Williams, J. (2004), An evaluation of human and computer-based
predictions of the 2003 rugby union world cup, International Journal of Computer
Science in Sport (e), 3(1), 5-22.
O'Donoghue, P.G., Dubitzky, W., Lopes, P., Berrar, D., Lagan, K., Hassan, D., Bairner, A. and
Darby, P., (2004). An Evaluation of quantitative and qualitative methods of predicting the
2002 FIFA World Cup, Journal of Sports Sciences, 22, 513-514.
Passos, P, Araujulo, D, Davids, K, and Shuttleworth, R. (2008). Manipulating Constraints to
Train Decisions Making in Rugby Union, International Journal of Sports Science and
Coaching, 3, 1, pp. 125-140
Stefani, R. (1998). Predicting Outcomes. In Statistics in Sport, (Edited by Bennett, J.), London:
Arnold, pp. 249-275.
Tabachnick, B.G. and Fidell, L.S. (2007). Using multivariate statistics, 5th edn, New York:
Harper Collins.
Thelwell, R., Weston, N., and Greenlees, I. (2005). Defining and understanding mental toughness
in soccer. Journal of Applied Sport Psychology, 17, 326-332
Walters, A. and Lovell, G. (2002). An examination of the homefield advantage in a professional
English soccer team from a psychological standpoint, Football Studies, 5, 46-59.
- 39 - | P a g e