CARDIFF SCHOOL OF SPORT DEGREE OF BACHELOR OF SCIENCE (HONOURS) SPORTS COACHING A COMPARISON OF PREDICTIVE MODELS AND HOW EFFECTIVELY THEY PREDICT FORM BASED ON THE 2011 RUGBY WORLD CUP ALEJANDRO PEREZ ST09002131 NAME: ALEJANDRO PEREZ STUDENT NUMBER: ST09002131 SCHOOL OF SPORT Cardiff Metropolitan University A COMPARISON OF PREDICTIVE MODELS AND HOW EFFECTIVELY THEY PREDICT FORM BASED ON THE 2011 RUGBY WORLD CUP Cardiff Metropolitan University Prifysgol Fetropolitan Caerdydd Certificate of student I certify that the whole of this work is the result of my individual effort, that all quotations from books and journals have been acknowledged, and that the word count given below is a true and accurate record of the words contained (omitting contents pages, acknowledgements, indexes, figures, reference list and appendices). Word count: 8,577 Words Signed: Date: Thursday, 13 July 2017 Certificate of Dissertation Tutor responsible I am satisfied that this work is the result of the student’s own effort. I have received a dissertation verification file from this student Signed: Date: Notes: The University owns the right to reprint all or part of this document. Contents Acknowledgements ..................................................................................................................... i Abstract ......................................................................................................................................ii Chapter 1 Introduction .......................................................................................................... - 2 1.1 Background ................................................................................................................. - 2 1.2 Aim of Research .......................................................................................................... - 3 1.3 Rational for Research .................................................................................................. - 3 1.4 Statement of Hypothesis.............................................................................................. - 3 1.5 De-Limitations ............................................................................................................ - 4 1.6 Limitations .................................................................................................................. - 4 1.7 Definition of Terms ..................................................................................................... - 4 Chapter 2 Literature Review ................................................................................................. - 6 2.1 Uncertainty in Sport Performance ............................................................................... - 6 2.2 Factors affecting outcomes of Rugby Matches ........................................................... - 7 2.3 Predictive Modelling techniques ................................................................................. - 8 Chapter 3 Methods .............................................................................................................. - 12 3.1 Data sources .............................................................................................................. - 12 3.2 Models ....................................................................................................................... - 14 3.2.1 Independent Variables ................................................................................................ - 14 3.2.2 Dependent Variable ..................................................................................................... - 14 3.2.3 The Relation between the Variables ......................................................................... - 14 3.2.4 Linear Regression Models .......................................................................................... - 15 - 3.2.5 Predictions vs. Rugby World Cup 2011 ................................................................... - 16 Chapter 4 Results: ............................................................................................................... - 18 4.1 Expected Score Difference ........................................................................................ - 18 4.2 Accuracy of Scores.................................................................................................... - 18 4.3 Accuracy of Predictions ............................................................................................ - 19 4.4 Predicted Final Stages ............................................................................................... - 20 4.5 Actual Scores vs. Predictions .................................................................................... - 22 Chapter 5 Discussion .......................................................................................................... - 25 5.1 Comparison in Studies .............................................................................................. - 25 5.2 Upsets ........................................................................................................................ - 26 5.3 Accuracy of Predictive Models ................................................................................. - 27 5.3.1 Teams Underscoring ................................................................................................... - 27 5.3.2 Teems Meeting Expectations ..................................................................................... - 28 5.3.3 Teams Over Scoring .................................................................................................... - 29 5.4 Form .......................................................................................................................... - 31 5.5 Limitation of Models ................................................................................................. - 33 5.6 Comparison to other sports ....................................................................................... - 33 Chapter 6 Conclusion.......................................................................................................... - 35 6.1 Summary of Findings ................................................................................................ - 35 6.2 Hypothesis Acceptance or Rejection......................................................................... - 35 6.3 Future Research ......................................................................................................... - 35 References ........................................................................................................................... - 37 - List of Tables TABLE 3-1 SAMPLE SIZE - 12 - TABLE 4-1 DIFFERENCE IN POINTS SCORED DURING THE RUGBY WORLD CUP 2011 - 18 - TABLE 4-2 ACCURACY OF PREDICTED SCORES IN THE RUGBY WORLD CUP 2011 - 19 - TABLE 4-3 ACCURACY OF CORRECT PREDICTIONS IN THE RUGBY WORLD CUP 2011 - 19 - TABLE 4-4 COMPARISON OF POINTS SCORED PER MATCH - 23 - TABLE 5-1 COMPARISON IN ACCURACY - 25 - TABLE 5-2 UPSETS DURING THE RUGBY WORLD CUP 2011 - 26 - List of Figures FIGURE 2-1 % OF HIGHER RANKED PLAYER TO WIN THREE DIFFERENT TYPES OF TENNIS SETS - 7 FIGURE 4-1 POINT MARGIN REQUIRED FOR CORRECT PREDICTIONS - 20 - FIGURE 4-2 "ENTER" LINEAR REGRESSION KNOCK OUT STAGE - 21 - FIGURE 4-3 "STEPWISE" LINEAR REGRESSION KNOCK OUT STAGE - 22 - FIGURE 5-1 MATCHES WITH UPSETS - 27 - FIGURE 5-2 COMPARISON OF MODELS TO ACTUAL POINT DIFFERENCE - 30 - FIGURE 5-3 FORM OF TEAMS IN RWC 2011 - 32 - Acknowledgements I’d like to express my sincere thanks to the people that helped me during the process of creating this dissertation. In particular, I’d like to thank Professor Peter O’Donoghue for the endless hours and his knowledge in the field of Predicting Performance in Sports. i|Page Abstract Purpose: The purpose of this study will be to successfully predict results based on specific variables. Using the Rugby World Cup 2011 as a comparison tool, the study will determine which linear regression models are more accurate. It will then be able to determine which teams are performing above their predicted ability, and which teams are inconsistent with their form. Method: Data collected over the past five years will provide over 470 data points from which various linear regression models will be created to predict on the outcome of the games in the Rugby World Cup 2011. Results: The different types of regression models create varying levels of accuracy in the predictions, with the most accurate model providing 89.58% of correct outcomes. This provides information that determines that teams like Fiji are underperforming greatly, while Wales are scoring +15 points a game more than expected. Conclusion: Predictive models should be created using a large number of data points, and the number of variables should increase to improve accuracy in predictions. The models themselves can be used to determine how well teams performed in a tournament. Key Words: Predicting Performance, Rugby World Cup, Linear Regression, Form, Upset ii | P a g e Chapter 1 Introduction 1.1 Background ................................................................................................................. - 2 1.2 Aim of Research .......................................................................................................... - 3 1.3 Rational for Research .................................................................................................. - 3 1.4 Statement of Hypothesis.............................................................................................. - 3 1.5 De-Limitations ............................................................................................................ - 4 1.6 Limitations .................................................................................................................. - 4 1.7 Definition of Terms ..................................................................................................... - 4 - Chapter 1 Introduction 1.1 Background Internationally, Rugby Union is renowned as a winter sport, ranked second in capaciousness after soccer (Bathgate et al., 2002). Since the legend of William Webb Ellis, who is credited with first picking up the football and running with it, has doggedly survived the countless revisionist theories since that day at Rugby School in 1823 (IRB: Player Charter., 2012). Rugby Unions appeal to the athletes who participate in this sport, has a great deal to do with the spirit showed by William Webb Ellis on that day, and today the fact that it is both played within the letter and the spirit of the game keeps the numbers growing in this sport. The responsibility for ensuring that this happens lies not with one individual – it involves coaches, captains, players and referees. The Object of the Game is that two teams, each of fifteen players, observing fair play, according to the Laws and in a sporting spirit should, by carrying, passing, kicking and grounding the ball, score as many points as possible. The development of this concept leads to the creation of the first Rugby World Cup in 1987 The inaugural Rugby World Cup was held in New Zealand and Australia in 1987. Leading to the host winning the original the Webb Ellis Cup. In 1991 Australia was the winner at Twickenham; South Africa, as hosts, was the winner in 1995; in 1999 Australia won the Cup for a second time at the Millennium Stadium in Cardiff. 2003 saw the first northern hemisphere winners - England in Sydney, but the trophy returned south in 2007 when South Africa became the second team to win two Rugby World Cups with their victory in the Stade de France, Paris. Rugby World Cup (RWC) is the financial engine driving the development of the game world-wide. The revenues from RWC provide the IRB with funds to be distributed to the Member Unions to aid and assist them in the expansion and development of the game. RWC 2007 attracted 2.2 million ticket sales, 1.8 million website hits, and record television viewing figures through broadcast exposure via 238 channels around the world. The cumulative TV audience was estimated at 4.2 billion. From the quarter-final stage onwards, the matches were completely sold out and the qualifying pool stages set a new record for average crowd attendance at 48,500. RWC has now become established as one of the most important sporting events behind the Olympics and the FIFA World Cup. This much influence on audiences and nations, combined with the fact that the sport turned professional in 1995, has increased the pressure now felt by teams travelling to the events. Competitive team sports are a source of unpredictability and uncertainty for all players and -2-|Page coaches. Raising a major question: How can we reduce the uncertainty inevitably faced by players and coaches in all performance contexts (Passos et al., 2008)? Even the previous knowledge that players may acquire about their opponents is never enough to solve the problems that emerge during sub-phases of games like Rugby Union. This is a key issue in every team sport: players never know with 100% certainty what their opponents are going to do at every moment of the game. This need for knowledge has increased the need for a more scientific approach to explore the different elements in the game of rugby union (Nicholas, 1997; Duthie, et al., 2003; James et al., 2005). 1.2 Aim of Research The aim of this study is to attempt to create a prediction model that will successfully predict the outcome of a majority of the matches during a tournament. Specifically, for this study, the Rugby World Cup 2011 will be used as comparison to determine the accuracy of the models created. To do this, the study will widen the research in predictive performance analysis using different models and an increased number of data sources. It shall seek to determine how accurate certain models are, and subsequently use these models to determine how teams are performing. This identifies which teams are over performing and which teams are not achieving their expected potential. 1.3 Rational for Research Due to the expansion in the area of performance analysis over the past decade, the ability to prepare for matches and specific opposition has increased. The ability to determine the outcome of a match before this occurs provides team more time to prepare, and more information on their opponent. The area of predicting performance is rather untouched with O’Donoghue and Croucher leading the field in this research. 1.4 Statement of Hypothesis 1. Null hypothesis – the predictive models will not be able to predict a significant number of results due to the amount of variables used. Alternative hypothesis – the predictive models will be able to predict a significant number of results with the amount of variables used. 2. Null hypotheses – the models will not be able to predict the form specific teams are in, using the points each team are scoring to determine how they are performing. Alternative hypotheses – the models will be used to predict the form specific teams are in, based on the points each team is scoring. -3-|Page 1.5 De-Limitations Men’s Rugby – the study shall concentrate on male rugby teams that compete in the full 15- man version of the game Rugby World Cup – the study shall concentrate on teams that have qualified for the Rugby World Cup 2011 Time Period – the study focuses on the matches played in two complete cycles of Rugby World Cups, a cycle consists of the two years previous to a tournament. The data collection stars 1st January 2005, two years before the 2007 Rugby World Cup. Data – data collection shall be from the official International Rugby Board site providing accurate information on the matches over the time period necessary. 1.6 Limitations Accuracy of IRB – due to the fact that the majority of data being collected is from one source, the accuracy of the data is vastly important, and is completely reliable on the accuracy of the IRB records. Human Error – due to the large number of samples, human error might occur in the processing of data, to overcome this, equations will be written up to increase the automation of the process. Location of Matches – due to the matches being played all over the world, the accuracy of distance travelled between the capital city of a country and the final destination, might vary using different maps, to overcome this, the same site will be used to determine all the distances travelled. 1.7 Definition of Terms ‘IRB’ – refers to the International Rugby Board ‘RWC’ – refers to the Rugby World Cup ‘Ranking Points’ – refers to a system the IRB use to rank the position each team is in relation to each other ‘Distance Travelled’ – refers to the distance travelled between the countries capital city and the capital city of the country the match is being played in -4-|Page Chapter 2 Literature Review 2.1 Uncertainty in Sport Performance ............................................................................... - 6 2.2 Factors affecting outcomes of Rugby Matches ........................................................... - 7 2.3 Predictive Modelling techniques ................................................................................. - 8 - Chapter 2 Literature Review 2.1 Uncertainty in Sport Performance Statistics such as the number of team wins or the height of an athlete are readily available before a match or a competition. So how do upsets occur in sport if all the data is available to analyse? That’s the nature of sports; the unpredictability that anything can happen. Therefore, creating models that can predict the outcome of a match, require a vast amount of data to be accurate. And even with all of the data, one must accept that upsets do occur in sports every once in a while; otherwise there would be no excitement; no need to watch sports, and no need to gamble in sports. Gambling and sports have almost always been inseparable. Sport betting is part of the wider gambling industry and it can be defined as the general activity of predicting sports results by making a wager on the outcome of a sporting event (Koning and van Velzen., 2009). In fact, some sports derive their very existence from the popularity of its associated betting market (e.g., horse racing). Horse racing was the ground stone for sports betting due to the fact of how that sport it set up, but since then the sport betting market has expanded in recent years to take account of a growing demand for the opportunity to gamble on the outcome of a wide range of sport event (Colantuoni et al., 2010). Most importantly in these hard times, it's a sector that is continuing to grow. According to Global Betting and Gaming Consultants global online gambling yield will increase by 25% over on the next three years, with a significant proportion coming from global interest in football betting (Barr-Smith, 2010). Therefore the ability to predict sport performance is vastly important as evidenced by the amount of betting agencies currently present in the sport market (Stefani, 1998). The implications of producing an effective model that can create adequate predictions, has a high implication in the area of performance analysis. Performance analysis is a science that is actually concerned with actual sport performance rather than the activities undertaken in the laboratory. Therefore the implications of effective prediction models are rather large due to the effect that extra preparation can have on an actual performance of an athlete or team (O’Donoghue, 2010a). Taking this into consideration it must be stated that certain sports are easier to predict than others. Tennis lends itself to explaining the chances of an upset occurring in a game. An upset is where the stronger side (according to ranking or form) is defeated by the weaker side within the match. The following Figure 2-1 shows the probability of an upset occurring over different amount of sets. -6-|Page 1 % of Form player to win 0.95 0.9 0.85 0.8 0.75 0.7 1 set from 1 0.65 2 sets from 3 0.6 3 sets from 5 0.55 0.5 0.5 0.6 0.7 0.8 Form of higher ranked player 0.9 1 Figure 2-1 % of higher ranked player to win three different types of tennis sets This figure describes the probability of a favourite player to win either 1 set, best of 3 sets or best of 5. For example if player A has been predicted to be favourite by 0.7 over the lower ranked player, there is a chance that over one set the lower ranked player might win 0.3 of the time. As more sets are played, the chances of a lower ranked player to win 2 out of 3 sets reduce to 0.216 of the times. The opportunity of an upset is reduced even more when the match is played to the best of 5 sets. In this match the lower ranked player would win only 0.163 of the time. This relates to Croucher’s (1998) study where he showed a direct relationship between points won in a game and the probability of winning the set and then the match. This clearly demonstrates that upsets have a higher chance of occurring in sports that are low scoring, as the higher the score the more chances that the favourite will end up with the victory. 2.2 Factors affecting outcomes of Rugby Matches The nature of rugby as a sport is generally considered to be a relatively high scoring sport with an average of 39.69 ± 26.38 points for the higher ranked team and an average of 14.56 ± 9.62 points scored for the lower ranked team during the 2007 Rugby World Cup. Therefore the chances of an upset where the stronger side (according to ranking) is defeated by the weaker side within the match are usually very unlikely. This gives predication models a high likelihood to predict the winners of the matches while using just the relative quality of the performers (O’Donoghue et al., 2008) and home advantage (Courneya and Carron, 1992; Nevill et al., 2002: Carron et al., 2005). This is evidenced by two previous studies done on international rugby union tournaments. The 2004 study by O’Donoghue and Williams, found that the mean prediction accuracy of artificial neural networks was 35.3 (88.1%) of the 40 pool matches and that of statistical based methods was 37.7 (94.3%) of the 40 pool matches. Taking this into consideration, it must be mentioned -7-|Page that once the models attempt to predict the outcome of the knockout stages at an international sporting tournament, the mistakes that certain upsets can cause at the group stage will propagate threw the study and will create unrealistic results when the later stages of the tournament are reached. 2.3 Predictive Modelling techniques There have been a handful of studies producing prediction models based on previous sporting events. These studies have used a variety of prediction techniques including linear regression, discriminant function analysis, logistic regression, artificial neural networks, simulation and qualitative techniques (O’Donoghue et al., 2008). For these studies to have been produced correctly, it is important that measurement issues such as objectivity, reliability and validity are subjectively analysed so that the models being applied are done to the correct set of data. However, even when all measurement issues being properly considered, it is important to note that most predictive models depend on several factors including whether the assumptions of modelling techniques have been satisfied by the data used (Manly, 1994; Tabachnick and Fidell, 1996; Ntoumanis, 2001). There are two stages involved in successfully using modelling techniques. The first stage of the process involves using some dependent variables such as rest days (O’Donoghue, 2010b), distance from home (Courneya and Carron, 1992; Nevill et al., 2002: Carron et al., 2005) and ranking points (O’Donoghue et al., 2008). The scores that occur when the events occur, is the independent variable in the study. This set of data provides the model to where the dependent variables are still known but the independent variable isn’t and using these predictive modelling a value based on the previous set of data can be provided. These statistical modelling techniques have assumptions that should be satisfied by the data used to develop the models (Ntoumanis, 2001; Manly, 2005; Tabachnick and Fidell, 2007). However, taking into consideration the previous studies undertaken in predictive modelling, it can be argued that satisfying the assumptions of the modelling techniques does not always produce the most accurate forecasts of the actual outcomes of matches (O’Donoghue, 2012). There has been 6 studies undertaken in predicting performance and how satisfying the statistical assumptions affected the overall predictions (O’Donoghue and Williams, 2004; O’Donoghue, 2005; O’Donoghue, 2006; O’Donoghue, 2009; O’Donoghue, 2010a). Out of these six studies four of them determined that violating the assumptions gave more accurate predictions than when the assumptions were satisfied. The Euro 2008 study (O’Donoghue, 2009) found that satisfying the assumptions provided a more accurate set of results than by violating the assumptions. -8-|Page Nevertheless, O’Donoghue (2012) states that the actual difference between the results that appear when the assumptions are violated and between the data that appears when the assumptions are satisfied, might not be sufficient to justify the process in transforming the data variables. When trying to transform the data to satisfy the assumptions of statistical modelling techniques, different mathematical processes can be used. The problem arises due to the nature of the subject that is trying to be predicted. Sports by its definition have a large number of unpredictable variables and certain times events occur so that certain scores such as the 2007 match between New Zealand versus Portugal ended with a score difference of +95 points for the All Black team. When scores like these occur, it is necessary to transform the data so that it can satisfy the necessary assumptions. The transformations could be cubed transformations, square root transformations, logarithmic transformations or any mathematical transformation that provides a standard normal distribution. It must be noted that at certain times no matter how many transformations occur, there are too many outliers present and even removing these outliers will only create more outliers in the data. This eventually leads to a point where there is a significant amount of data points removed due to the outliers, which removes the models ability to predict upsets (O’Donoghue, 2009). The modelling techniques that will be used to produce the predictions was multiple linear regression. Two different types of models will be created to produce 4 different sets of predictions. The main difference between the two to types of models is that one will be created by using the “Enter” method and the other with the “Stepwise” method. These are both ways to produce a linear regression model. And to create the four final sets of predictions, these models will be applied to both the raw, untransformed data and to the data that has been transformed to satisfy the assumptions of multiple linear regression. Statistically there are certain assumptions that one must fulfil to successfully create a predictive model. The assumptions of multiple linear regression (Ntoumanis, 2001; Allison, 1999; Field, 2009; O’Donoghue, 2012; Newell et al., 2010) There should be at least 20 cases for each independent variable. Linear regression assumes that the relationship between any independent variable and the dependent variable is linear). There must be no outliers in individual independent variables, the dependent variable or residuals. As well as considering outliers within individual variables, we also need to -9-|Page check multivariate outliers. Distance measures such as Mahalanobis distances can be used to identify outliers within the multivariate space Multicollinearity should be avoided in the independent variables. This means that no pair of independent variables should be highly correlated (the absolute values of r should be less than 0.9). Residuals should be independent, homoscedastic and normally distributed. Rather than testing the distribution of the residuals for different subranges of each independent variable, the predicted value for the dependent variable is used. Therefore we test that there is little correlation between the predicted value of the dependent variable and the absolute residual values to show homoscedasticity. Independence can be checked using the correlation between the residuals and a variable representing the order of measurement of the cases. Normality of the residuals can be tested using z-scores for skewness and kurtosis which should both be between -1.96 and +1.96. Due to the fact that the models that can be produced are based on over 470 previous data points, they will produce descriptive enough results to be able to determine if a team is performing to their ability and potential, or if they are underscoring and upsets are occurring. Using these predictions and the actual results that occurred one will be able to statistically analyse certain factors. The form of teams can be analysed by comparing the actual point difference to the predictive point’s difference. This could provide insight into which teams are over performing and which teams are underperforming. Therefore, these predictive models can provide data to determine the form of the teams participating in the 2011 Rugby World Cup. Taking the previous set of studies that have investigated the ability to satisfy assumptions, this study will combine the effectiveness of satisfying assumptions and how precise the actual models that created are. With these predictive models one could determine the form teams are competing at during a tournament. Therefore the models should be able to determine which teams are more reliable and which teams struggle to reach the expected form. - 10 - | P a g e Chapter 3 Methods 3.1 Data sources .............................................................................................................. - 12 3.2 Models ....................................................................................................................... - 14 3.2.1 Independent Variables ................................................................................................ - 14 3.2.2 Dependent Variable ..................................................................................................... - 14 3.2.3 The Relation between the Variables ......................................................................... - 14 3.2.4 Linear Regression Models .......................................................................................... - 15 3.2.5 Predictions vs. Rugby World Cup 2011 ................................................................... - 16 - Chapter 3 Methods 3.1 Data sources To correctly create prediction models there is an order of events that must take place. The first step is to accurately collect the correct data in order to create a complete data source. One must firstly understand what the final information they want to collect is, to then decide what information they need. In this study, were the final goal is to determine how reliable certain teams are based on the predictive models, one must collect data pertaining to three things: a team’s score in previous matches, a team’s ranking points, and the match location’s from home. Once a complete spreadsheet has been created, one can then move to produce the actual linear regression model, firstly the ones that violate the assumptions because the data does not need to be modified, and secondly the regressions that need the data to be transformed to satisfy the assumptions. This process will produce the models to create the data necessary to answer the research question. It is unclear which factors influence the final result most significantly. Some factors such as distance can be less significant than values such as the quality of the opposition, the amount of rest days or even the referee at the event. The final model will feature all the Tier 1 rugby nations and every game they have played since January 1st, 2005, providing 470 matches for analysis. As can be seen in Table 3-1 New Zealand is the team that appears the most, while there are 23 teams in total being used in this study. For this study, the two relevant factors that will be analyzed will be the International Rugby Board World Ranking Points and the distance travelled to the match. Table 3-1 Sample Size Number of Teams 23 Number of Matches 470 Team with Most Matches New Zealand (94) Team with Least Matches Namibia (4) The IRB World Rankings are calculated using a 'Points Exchange' system. Every single international team has a rating usually between 0 and 100. The better sides in the world would usually have a ranking around 85-90+ points. With new countries starting on 40 points. The exchange of points occurs when two teams play each other. The amount of points the winning team might acquire is directly correlated to the amount of points the losing team will lose, creating a direct exchange. The amount of points exchanged, is correlated to the strength of each - 12 - | P a g e team, the margin of victory, and it takes into consideration home advantage. The IRB provide a three point “handicap” to the home team proving the importance of home advantage. To determine how far a team has travelled to a match, a giant circle between the country’s capital city and the capital city of the host nation will be drawn and the distance will be calculated. This information will be gained from an internet based distance calculator (Indonesia, 2006) When determining which factors to use as Independent Variables, the amount of rest days each team has determined to be of significant importance. This is based on other studies that have attempted to predict previous Rugby World Cups (O’Donoghue and Williams, 2004; O’Donoghue, 2005; O’Donoghue, 2006; O’Donoghue, 2009; O’Donoghue, 2010a). This author admits that there is a significant importance in the amount of rest days available, but due to how the predictive model will be created, the rest days between certain matches will not be significant enough. For example, the Rugby World cup commences with 4 pools of 5 teams, with a round robin system therefore the rest days will be different varying from the shortest rest period of 4 days to the longest rest period of 11 days, The mean for these rest days is of 7 days while the rest days are distributed so that the lower quartile is at 5.75 days while the higher quartile is of 8 rest days. The average rest period for the Rugby World Cup is still shorter than the usual rest period that teams get at the 6 Nations which is usually 10.5 days. And the lower quartile being 7 days long which is the mean rest period for the WRC. Therefore the significance of rest days is second value to over twice the amount of data points that the previous studies have had. The data for the Independent Variables must be collected before any statistical interpretation can be undertaken, and thus, an Excel spread sheet was created to gather the data. Once the data is collected, a more advanced statistical analysis tool such as IBM SPSS Statistics 19, is then utilized. This will allow the vast amount of data to be analyzed and will provide the specific mathematical analysis that is necessary to create the models. The first step when producing predictive models, is considering the assumptions: one of them being that all of the independent variables were normally distributed (Field, 2009). To test this, a Kolmogorov-Smirnov test was used to determine wheter the independent variables were normally distributed. With a (P <0.05 it was determined that the variables were not normally distributed, and thus the so the data must then be transformed by means of square roots or logarithmic transformation until it is normally distributed (Nevill, 2000). - 13 - | P a g e When the data is available in both the original form violating assumptions and transformed, one will be able to start the regression methods. SPSS provide three different types of regression models: The Hierarchical, Enter, and Stepwise Models. The Hierarchical Model is one which allows a researcher to determine the order the data is entered into the model. The Enter Model is one in which all predictors are forced into the model simultaneously. And lastly the Stepwise model is one which allows the researcher to determine which variable to use providing specific variables and therefore providing a model based on only one variable. 3.2 Models 3.2.1 Independent Variables The models used two independent variables which were decided based on previous research. The data was then collected so that the higher ranked of the two teams within the match was considered Team A. The IRB World Rankings were collected before the International break allocated for the matches started using the IRB World Rankings, leading to the following independent variables. The difference in World Ranking Points: Rankδ: higher ranked team’s value – lower’ ranked team’s value. The difference in distance travelled to the tournament: Distδ: higher ranked team’s value – lower’ ranked team’s value. 3.2.2 Dependent Variable The dependent variable was the point’s difference, Pδ, between the higher ranked team in a match, and the lower ranked team. If the higher ranked team won the match then this variable would be positive, if the match was an upset, the values would be negative, and if the match was a draw then the value would be 0. 3.2.3 The Relation between the Variables Before the models were created, a Pearson Correlation had to be run to understand how the data’s relationship and it’s affect n the models and each other. The difference in distance is negatively correlated to the score of the match with a coefficient of r= -0.133. This means that the further from home a team plays, the more points they will give away, which coincides with the - 14 - | P a g e assumptions previously discussed (Ntoumanis, 2001; Allison, 1999; Field, 2009; O’Donoghue, 2012; Newell et al., 2010). The more significant of the two variables is how the difference in ranking points has a coefficient of r= 0.551, which means that the final score is impacted more by which team is better based on their ranking points, than the distance they have travelled. To state how significant each of the two variables are, by using R2 one will be able to determine how much of the entire score each of the variables affect, and while the distance from home only affects 1.77% of the final score. The ranking points that each team have and how much better one team is than the other affects the final result by 30.36%. 3.2.4 Linear Regression Models Model A: “Enter” method with assumptions violated The models were created using all 471 rugby matches in the data set, with both the distance from home and the ranking pointes entered in their raw form without any transformation or removal of outliers. After running the linear regression with the “Enter” Method the model that was produced is the following: Pδ = -2.458 + 2.207 Rankδ - 0.000269 Distδ The main reason why the data violated the assumptions, is that both the Rankδ and Pδ contained outliers. The data in the Pδ was leptokurtic (ZKurt = +2.714) while the Distδ was normal. What this data actually meant was that for every 10,000 km a team travelled they lost 2.69 points in their final score. The residuals for this model were leptokurtic (ZKurt = +1.517) and were determined not to be normal after running a Kolmogorov-Smirnov test, which resulted with p = 0.004. The correlation between predicted values for Pδ and the residual values (r=0.271) which meant that the homoscedasticity of the data could be assumed. The residuals had a mean±SD of 1.974±18.21. Model B: “Stepwise” method with assumptions violated The model was created using all 471 rugby matches in the data set; however, the more influential of the two independent variables was used. The data was left untransformed, and the regression was then run under the “Stepwise” Method producing the following model. Pδ = -2.31 + 2.202 Rankδ This model violated the assumptions of linear regression due to the fact that Rankδ contained outliers. The residuals for this model were leptokurtic (ZKurt = +1.468) and were determined not - 15 - | P a g e to be normal after running a Kolmogorov-Smirnov test that came out with p = 0.002. The residuals in this model had a a mean±SD of -2.385± 18.42. The data in this model had a correlation between predicted values for Pδ and residual values of r=0.291, meaning that the homoscedasticity of the data could be assumed. Model C: “Enter” method with assumptions satisfied To satisfy the assumptions, the data in the ranking points had to be modified so that no outliers existed. To do this, the ranking points where transformed by calculating the square root. This modified the data and provided the necessary transformations for the data to satisfy the assumptions. Pδ = -1.573 + 1.577 Rankδ – 0.00565 Distδ The residuals for this model were leptokurtic (ZKurt = +1.729) and were determined not to be normal after running a Kolmogorov-Smirnov test that came out with p = 0.005. The residuals in this model had a a mean±SD of -1.469± 18.66. The data in this model had a correlation between predicted values for Pδ and residual values of r=0.292, meaning that the homoscedasticity of the data could be assumed. Model D: “Stepwise” method with assumptions satisfied The last model created used the same transformations to satisfy the assumptions creating the following model. Pδ = -1.539 + 1.560 Rankδ The residuals for this model were leptokurtic (ZKurt = +1.661) and were determined not to be normal after running a Kolmogorov-Smirnov test that came out with p = 0.002. The residuals in this model had a a mean±SD of -2.258± 18.92. The data in this model had a correlation between predicted values for Pδ and residual values of r=0.312, meaning that the homoscedasticity of the data could be assumed. 3.2.5 Predictions vs. Rugby World Cup 2011 Once the models were created, the data that had been previously collected in relation to Distδ and Rankδ was inserted into the models creating actual predictions of who would win the matches between each team and by how many points. To determine how accurate the models actually are, the data must then to be compared to the actual results of the Rugby World Cup. - 16 - | P a g e Chapter 4 Results 4.1 Expected Score Difference ........................................................................................ - 18 4.2 Accuracy of Scores.................................................................................................... - 18 4.3 Accuracy of Predictions ............................................................................................ - 19 4.4 Predicted Final Stages ............................................................................................... - 20 4.5 Actual Scores vs. Predictions .................................................................................... - 23 - Chapter 4 Results: 4.1 Expected Score Difference Once the models have been created, and compared to the actual results of the tournament, certain patterns and answers begin to appear, indicating model accuracy. Taking into consideration all the points scored during the tournament, it was important to see how close each model was to predicting the overall points scored per game. It is clear that the models under predict the number of points that have been scored during the tournament. It is particularly important to understand which models most accurately predict the actual points scored in a game. Table 4-1 shows that the models that violate the assumptions are within 3 points of correctly predicting the accuracy of the final score. It is noted that the closest model is only 119 points off from the total number of points that were scored during the RWC 2011, which totalled of 1063 points. Table 4-1 Difference in points scored during the Rugby World Cup 2011 Expected Score Actual Score Difference Points predicted per Game Actual Point difference per Game 22.15 Enter Stepwise Violating Satisfying Violating Satisfying Assumptions Assumptions Assumptions Assumptions 19.66 13.24 19.30 2.98 -2.49 -8.90 -2.85 -19.17 4.2 Accuracy of Scores Taking this into consideration, one can then start understanding how accurate the predictions are and how close they were to getting the correct result. It is important to note that rugby is a sport in which a lot of points are scored and that for predictions to be within one score, even if it’s a penalty, a try or a converted try is very complicated. When looking at the data shows that the models are predicting the total number of points scored in a game to within 3 points in 25.0% of the matches, and as the margins of points scored increase, the matches that are correctly predicted to within a converted try rise to 39.6%. If the data is extrapolated, all the matches will be predicted within a margin of 20 points or less, and in the case of the model that was created using the Enter method and satisfying Assumptions, it would reach all the correct scores within 13 points. - 18 - | P a g e Table 4-2 Accuracy of Predicted Scores in the Rugby World Cup 2011 Accuracy of Score Enter Stepwise Violating Satisfying Violating Satisfying Assumptions Assumptions Assumptions Assumptions Matches Within 3 Points 25.0% 16.7% 18.8% 20.8% Matches Within 5 Points 31.3% 27.1% 33.3% 22.9% Matches Within 7 Points 35.4% 37.5% 39.6% 29.2% 4.3 Accuracy of Predictions After taking into considerations the accuracy of the scores, it is important to understand how accurate the predictions are to determine the correct winner of the match. To determine this, one must understand that a win is a win no matter by how many points; so, if the predictions accurately determine who won the match it counts as an accurate prediction. With this in mind, Table 4-3 shows that the overall accuracy of the models is extremely high, accurately predicting the scores of 89.58% and when modifying the margins to within a converted try, this goes up to 91.67%. This is predicting a very high number of correct results, but to completely understand the data, one must look at the results that were wrongly predicted. Out of the 8 games that were incorrectly predicted, 5 of them were within the 7 point bracket, 1 was a draw and the other two had a score difference of +7point bracket. The models product was a score difference so none of them were able to predict a draw as an outcome. Table 4-3 Accuracy of Correct predictions in the Rugby World Cup 2011 Accuracy of Predictions Enter Stepwise Violating Satisfying Violating Satisfying Assumptions Assumptions Assumptions Assumptions Outcome Correctly Predicted 89.58% 83.33% 85.42% 85.42% Outcome Correctly Predicted ± 3Points 89.58% 87.50% 89.58% 89.58% Outcome Correctly Predicted ± 5Points 89.58% 89.58% 89.58% 89.58% Outcome Correctly Predicted ± 7Points 91.67% 89.58% 89.58% 89.58% By taking the information in Table 4-3 one can expand on the information provided and determine how big of a margin would be necessary for the models to predict 100% of the matches. - 19 - | P a g e % of Correct Predictions 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0 2 4 6 8 10 12 Points Required 14 16 Enter Violating Assumptions Enter Satisfying Assumptions Stepwise Violating Assumptions Stepwise Satisfying Assumptions 18 20 Figure 4-1 Point Margin Required for Correct Predictions Figure 4-1 shows the margin of points required for the models to be completely accurate, and even though it is potentially not feasible to create a model were are margin of ±17 points to determine all the correct scores as is the case for “Enter” Violating Assumptions model, one can see that two models do get to 100% of correct predictions within ±10 points. This is quite significant due to the fact that it does provide a good overview of how close and accurate the models created are. 4.4 Predicted Final Stages The prediction models gives the opportunity to create the actual knock out stages of the tournament. Due to the fact that the predictions were created before the tournament, it is important to understand that these knock out stages are what the model predicted at the earliest stage of the tournament. - 20 - | P a g e Q.Final 1 WA New Zealand RU B Argentina S.Final 1 New Zealand Q.Final 2 WD South Africa RU C Ireland South Africa Final New Zealand Australia Q.Final 3 WB England RU A France S.Final 2 England Q.Final 4 Australia 1st nd WC Australia 2 RU D Wales 3rd 4 th New Zealand Australia South Africa England Figure 4-2 "Enter" Linear Regression Knock Out Stage Figure 4-2 shows the complete knock out stage, but is done on a win only basis; thus, the actual score by which the teams won is not included in the figure. It also shows how both predictive models that used the “Enter” technique came out with the same final knock out stage, with each of the rounds being won by the team with the higher ranking. Figure 4-3 shows the models using the “Stepwise” technique. The model that violated the assumptions, and the model that satisfied the assumptions, both came out with the same final knock out stage. The main difference in the actual difference between both models is the fact that England beats Australia in Semi Final 2. This is interesting because Australia is both the high ranked team and the team that has travelled the least; therefore, this result is an upset. - 21 - | P a g e Q.Final 1 WA New Zealand RU B Argentina S.Final 1 New Zealand Q.Final 2 WD South Africa RU C Ireland South Africa Final New Zealand England Q.Final 3 WB England RU A France S.Final 2 England Q.Final 4 Australia 1st nd WC Australia 2 RU D Wales 3rd 4 th New Zealand England South Africa Australia Figure 4-3 "Stepwise" Linear Regression Knock Out Stage When interpreting the predicted knock out stages, one can take a couple of factors into consideration. The winner of the tournament is always the same: New Zealand wins the tournament, in all four prediction models, but this is understandably due to the fact that the model is constructed using both ranking points, and the distance travelled, and considers that New Zealand was both number one in the world and were hosting the tournament, they were clear favourites to win. The next factor that is important is the fact that none of the four predictions had France beating England. Due to their unpredictability the French ranking points changed from 83.78 before the start of the tournament, dropping down to 79.72 and finishing at an 84.79 after the final. Over the period of the tournament, France’s ranking points travelled a range of 9.13 points. Another factor to consider, is how neither Ireland, nor Wales, was able to beat their southern hemisphere rivals in any of the predictions. Therefore, even if all of the predictions correctly determined the outcome, the factors that are worth analysing are those that occurred in the matches before the final. 4.5 Actual Scores vs. Predictions To actually see how certain teams performed in the tournament and if they over performed or underperformed one can use the points being scored as a guideline to evaluate their performance. As Table 4-4 Comparison of Points Scored per Match shows certain teams such as teams like Ireland, Australia, Italy and Scotland are very reliable teams that are scoring as expected. This - 22 - | P a g e also shows how certain teams like Fiji and Namibia are underperforming and teams such as Wales are over performing significantly. This table provides information on how accurate certain models were to actual specific teams providing information on which teams were easier to predict. As stated teams such as Ireland were scoring 17.80 points a game. The most accurate of models predicted their scoring pattern to within 0.04 points a game. Table 4-4 Comparison of Points Scored per Match Comparison of Points Scored per Enter Actual Point Difference match Stepwise Violating Satisfying Violating Satisfying Assumptions Assumptions Assumptions Assumptions Argentina 5.40 2.489±16.31 2.162±11.19 2.361±15.04 0.233±2.578 Australia 16.57 20.41±29.97 16.46±25.21 19.47±27.76 2.583±3.497 Canada -21.50 -16.05±17.28 -12.23±13.62 -16.45±16.06 -2.445±2.222 England 19.20 12.681±15.36 7.896±9.962 13.38±15.50 2.083±2.382 Fiji -27.00 -5.998±15.12 -2.035±9.004 -8.356±15.77 -1.350±2.532 France 5.00 -1.342±19.95 -2.059±12.76 -0.043±19.90 -0.025±3.332 Georgia -10.50 -12.33±12.01 -7.541±6.882 -12.39±12.03 -2.025±2.136 Ireland 17.80 17.84±15.00 11.87±10.05 17.39±14.89 2.683±2.280 Italy -0.75 -0.417±18.86 -2.062±13.92 0.694±17.67 0.1641±2.827 Japan -28.75 -15.90±16.13 -10.91±12.43 -14.94±16.07 -2.011±2.338 Namibia -55.50 -32.80±20.49 -22.92±14.75 -31.94±20.69 -4.476±2.615 New Zealand 32.71 28.60±12.38 21.80±11.01 25.83±12.55 3.909±1.287 Romania -31.25 -24.33±15.34 -15.98±10.56 -23.92±15.19 -3.719±2.201 Russia -34.75 -30.72±23.25 -23.16±20.17 -29.91±22.23 -4.105±2.705 Samoa 10.50 -2.178±11.91 0.144±7.045 -4.398±12.28 -0.908±2.158 Scotland 3.50 7.887±11.40 4.743±6.830 7.566±11.70 1.209±1.985 South Africa 28.00 21.67±19.28 13.94±13.95 22.09±19.85 3.122±2.597 Tonga -4.50 -5.50±16.94 -5.011±11.30 -3.941±17.70 -0.436±2.562 USA -21.00 -22.36±20.16 -15.64±14.51 -22.59±20.08 -3.045±2.721 Wales 22.00 6.138±17.09 3.257±10.53 8.444±17.18 1.207±2.654 - 23 - | P a g e Chapter 5 Discussion 5.1 Comparison in Studies .............................................................................................. - 25 5.2 Upsets ........................................................................................................................ - 26 5.3 Accuracy of Predictive Models ................................................................................. - 27 5.3.1 Teams Underscoring ................................................................................................... - 27 5.3.2 Teems Meeting Expectations ..................................................................................... - 28 5.3.3 Teams Over Scoring .................................................................................................... - 29 5.4 Form .......................................................................................................................... - 31 5.5 Limitation of Models ................................................................................................. - 33 5.6 Comparison to other sports ....................................................................................... - 33 - Chapter 5 Discussion 5.1 Comparison in Studies The process that has been used to create the models in this investigation is similar to previous studies undertaken in predicting performance (O’Donoghue and Williams, 2004; O’Donoghue, 2005; O’Donoghue, 2006; O’Donoghue, 2009; O’Donoghue, 2010a). The main difference with the previous studies is the size of the data sample; the previous studies have been limited to the number of games that have occurred during Rugby World Cups. The latest study by O’Donoghue (2012) had 232 matches over 25 years, while this current study doubles the amount of data points by 202% and reduces the period of time to 5 years. This extended amount of data in a smaller period of time increases the value of the data, due to the closer correlation of how the teams actually perform. This is part of why the accuracy of results in this study is generally higher than previous studies undertaken. Table 5-1 Comparison in Accuracy Enter Accuracy of Predictions Stepwise Violating Satisfying Violating Satisfying Assumptions Assumptions Assumptions Assumptions Outcome Correctly Predicted in O'Donoghue (2011) 75.19% 74.88% 76.58% 76.50% Outcome Correctly Predicted in this study 89.58% 83.33% 85.42% 85.42% Increase in accuracy 19.14% 11.29% 11.54% 11.66% The scores show a general increase in accuracy of 13.41% which is directly related to the period of time the data was taken from. It must be noted that the previous study included rest days between matches, which provides an extra independent variable that will increase the accuracy of the overall predictive model. Due to the data sample that was used for this study, this was not a possibility. Further analysis into Table 5-1 Comparison in Accuracy , leads to noticing that the biggest difference between predictive models is in the “Enter” method while violating assumptions. This is due to the number of upsets that occurred during the Rugby World Cup. - 25 - | P a g e 5.2 Upsets Out of the 48 matches that occurred at the RWC 2011, there are five matches that can be considered as upsets. To consider a match an upset in must have the following characteristics: Lower Ranked Team to Win Winning Margin to be over 5 points Even if it’s only two characteristics, this occurs extremely rarely in a sport that is as a high scoring as rugby. There a couple other matches that come close to fulfilling these requirements but due to the small margin in points such as Australia beating South Africa by two points, or France beating Wales by one point, they will not be considered as upsets. Table 5-2 shows the five matches that were the upsets in the Rugby World Cup 2011. Table 5-2 Upsets During the Rugby World Cup 2011 Date Ranking Points Team A Team A Score Team B Ranking Points Team B 14-Sep-11 72.48 Tonga 20-25 Canada 71.56 17-Sep-11 88.84 Australia 6 - 15 Ireland 78.50 01-Oct-11 83.78 France 14 - 19 Tonga 72.48 08-Oct-11 84.54 England 12 - 19 France 79.72 08-Oct-11 83.14 Ireland 10 - 22 Wales 80.73 These upsets affected the tournament in various ways. Tonga participated in two upsets during the group stage that could have affected the final outcome of the tournament in various ways. Tonga lost to Canada in their second game of the tournament, and then went to upset the finalist, France, in a match that none of the models were able to predict. Had Tonga beat Canada, it would have been then making it threw to the quarter finals instead of France. This didn’t occur, so thus the tournament proceeded how it was expected with the first two seeds of each group going through to the final round. What did occur in Group C was Ireland upsetting Australia beating them convincingly by nine points to create a Knock Out draw that was divided into Northern Hemisphere teams on one side and Southern Hemisphere teams on the other. The last three upsets all occurred in the Quarter Finals, where only one of the four matches was won by the higher ranked team. England and Ireland lost to France and Wales, respectively, sending them home earlier than the higher ranked teams would have hoped.. To visually interpret how unpredictable the finals scores actually were, Figure 5-1 provides a visual representation of the data. - 26 - | P a g e 30 Actual Score Difference Predicted Score Difference 25 Points Scored 20 15 10 33.4 5 23.2 15.2 0 0 1 2 3 4 5 6 7.6 14.9 -5 -10 -15 Figure 5-1 Matches with Upsets The further the distance between two sets of data the bigger the upset was; leading to the biggest upset being Australia’s loss to Ireland. Had everything gone as planned, the knock out stages would have a different look, with Australia and New Zealand on opposite sides of the draw which would have provided for a different set of results. 5.3 Accuracy of Predictive Models There were a total of four models produced that provided predictions to the actual outcome of the RWC 2011. As stated previously in Table 5-1, certain models were able to predict the score of each team with greater accuracy than others. Figure 5-2 shows how accurately the models were able to predict the score difference. To discuss the accuracy of the models, the teams will be broken into three groups based on where their predictions are, in comparison to the actual point difference. 5.3.1 Teams Underscoring The category of teams is characterised by the fact that the models had predicted them to score more points that they actually did. To fall into this category, teams must be losing by more than a try. The main reasons they are underperforming are most likely directly related to the two independent variables that this study is based on; but, there are a couple of other variables that are worth discussing. The teams that are underscoring are: - 27 - | P a g e Fiji -32.99 Namibia -22.7 Japan -12.85 Romania -6.92 Canada -5.45 The main characteristic of the teams in the category is the fact that they are all ranked outside the top 10 in the world. This means that the amount of data to their actual performances is reduced in comparison to those higher ranked teams. This also means that the access to competitive matches versus the higher ranked teams are not as common therefore providing a bigger gap in the quality of play which, makes it easier for the higher ranked teams to beat team. The lack of weekly rugby for these nations where quality leagues are not in place will also lead to lower fitness levels; therefore, opposition will usually score a higher number of tries in the last quarter of the match. 5.3.2 Teems Meeting Expectations In this category, teams are doing exactly what is expected of them, scoring ±5 points a game, and thus either keeping the opposition to within a try, or scoring the amount of tries that was expected of them. Teams in this category are ranked all over the place and are scoring according to what they were expected to score. These teams are: Scotland -4.39 Russia -4.03 Australia -3.84 Ireland -0.04 Tonga +1 USA +1.36 Georgia 1.83 Argentina +2.91 Italy -0.33 France +3.66 New Zealand +4.11 It must be mentioned that even though Australia did come third in the overall tournament, they still did not perform as well as they were expected. This might be due the fact that the style of - 28 - | P a g e rugby that they usually play was not achievable during prolonged periods of time during the later stages of the knock out stages of the world cup. On the other hand, New Zealand was beating teams by what was expected of them in the tournament. Even if their overall point difference was brought down in the final when they only managed to win by one point, they still were able to keep a positive record. The three teams outside the top 10 in the world that performed better than expected; provide a clear statement that there is progress being made in the development of the lesser rugby nations. 5.3.3 Teams Over Scoring These teams are scoring more than a try a game more than they are predicted. The main reason why they might be scoring this amount of points is due to the fact that the all had group stage matches were they had significant victories over their opposition. The teams over scoring are: South Africa +6.33 England +6.52 Samoa +8.32 Wales +15.86 The fact that both England and South Africa have scored more than they were expected is due to the fact that they both won their groups, putting in big scoring results in the process. Samoa’s case is somewhat more interesting due to the fact that they did not make it out of their group, but did manage to score +8.32 points more than they were expected a game. The case of Wales is quite impressive due to the fact that they break their model by scoring more than 15.86 points a game over what was predicted. The main reason that their scoring average was so high, is the fact that not only did they score in every game, but in the games they did win, they won by a margin of +159 and the three games they did loose they lost by a margin of -5 points. - 29 - | P a g e Wales Argentina 40 Australia 30 USA Canada 20 10 Tonga England 0 -10 -20 South Africa Fiji -30 Scotland -40 Actual Point Difference -50 "Enter" Violating Assumptions France -60 "Enter" Satsifying Assumptions "Stepwise" Violating Assumptions "Stepwise" Satisfying Assumptions Samoa Georgia Russia Ireland Romania Italy New Zealand Japan Namibia Figure 5-2 Comparison of Models to Actual Point Difference - 30 - | P a g e 5.4 Form To completely understand what these models are providing, the form of teams must be analysed. The models provide data so that a final difference in score can be extrapolated. This provides an outcome to the entire tournament. As stated earlier, none of the models predicted all the correct outcomes of the results, but what the models did successfully recreate was the amount of points they were expected to score, and by using this information and comparing the actual points they did score, one can understand that the closer these two pieces of information are to each other, the closer the teams are to their predicted form. Figure 5-3 shows how accurate the models were to predict the form teams were in. The nearer both sets of data get to each other the closer that team is to the actual form that was predicted before the tournament. It also determines teams that are performing above their predicted form, in the case of Wales where their predicted score if far below the actual form they were in, or the opposite such as Fiji were their predicted form was far above the form they were expected to be in. Although previous studies of rugby union profiling have successfully assessed performance scores and relative form (Bracewell, 2003; Jones et al. 2008), this study provides quantifiable data on how close to form the teams actually were. - 31 - | P a g e 40 30 20 10 PointS Scored 0 -10 -20 -30 -40 -50 -60 Actual Point Difference Predicted Point Difference Figure 5-3 Form of Teams in RWC 2011 - 32 - | P a g e 5.5 Limitation of Models As accurate as the models are, there are certain variables that can never be accounted for. This study focused on two variables with a high number of data sources, but previous studies limited the data sources to increase a value which would be rest days (O’Donoghue and Williams, 2004; O’Donoghue, 2005; O’Donoghue, 2006; O’Donoghue, 2009; O’Donoghue, 2010a). This study shows that a higher number of data points have provided an increase in the accuracy of results. Other factors that could increase the accuracy of the models would be the ability to retain possession, the amount of ball kicked away, or the effectiveness of their set piece (Carter, 1996). These characteristics could increase the accuracy of the overall model. However, this would complicate the modelling process due to the fact that each team would have a specific variable to quantify the qualities of their game. These are still quantifiable characteristics that can be given a numerical order and included into the modelling process, unlike other characteristics which may give the model more accuracy, but would be nearly impossible to measure, such as, team cohesion, and how they interact with each other (Bull et al., 2005; Gucciardi et al., 200, Thelwell et al., 2005). Mental toughness is also an important psychological aspect that can be used to produce more accurate models, being able to generally cope better than the opponents with the demands in a particular sport (Jones et al, 2002). Other factors such as sleep, crowd distribution and referee bias could also be included (Nevill et al., 1996; Walters and Lovell, 2002). The study focuses on the predictive models that have been created to produce the point difference in matches, other studies use their predictive model to create a simulation of 1000, 5000 or even 10,000 matches and provide a percentage of occurrences that the matches actually have specific outcomes (O’Donoghue, 2006). By using this method one can get deeper predictions and provide an actual number of chances that a specific team has to win the tournament, or to successfully make it out of their group. 5.6 Comparison to other sports This modelling technique lends itself extremely well to this type of team sport. The factors the model needs to work consistently and accurately are large scoring games so that a pattern can be predicted. In studies where Football has been used as an example, the accuracy drops due to the higher chance of upsets. Other high scoring invasion sports such as Basketball or Australian football might provide a good background for further research. At the same time other sports such as tennis or cricket, don’t apply themselves as well to predictive modelling. - 33 - | P a g e Chapter 6 Conclusion 6.1 Summary of Findings ................................................................................................ - 35 6.2 Hypothesis Acceptance or Rejection......................................................................... - 35 6.3 Future Research ......................................................................................................... - 35 - Chapter 6 Conclusion 6.1 Summary of Findings From this study, it is evident that certain models are more accurate than others in predicting the outcome of a tournament. The fact that the accurately predict approximately around 89% of the results, indicate that these models can be used to predict tournament outcomes: the matches that are not correctly predicted provide interesting information due to the fact that they are upsets. The form the teams are in based on the predicted results, gives us clear ideas of how specific teams are performing. Finally, there are many possibilities in how the data is used to create further information and to provide quantifiable data on form. 6.2 Hypothesis Acceptance or Rejection Following the research and results of this study both null hypotheses can be rejected as 1. The predictive models were able to predict a significant number of results the amount of variables in the study. 2. The models were able to predict the form teams were in compared to how they actually performed. 6.3 Future Research In addition to the research conducted a natural progression over the next RWC. This study could be extended to other tournaments, such as the yearly tournaments played in the northern and southern hemisphere. The main expansion for this research is to increase the number of variables in the study increasing the number of independent variables, which would increase the probability of predicting the upsets that occur during the tournament. - 35 - | P a g e References References Allison. P. (1999). Multiple Regression, Oakland: Pine Forge Press. Barr-Smith, A. (2010). Sports Betting: United Kingdom. International Sports Law Review Pandektis, 3/4, 155-158. Bathgate, A., Best, J.P., Craig, G. and Jamieson, M. (2002). A prospective study of injuries to elite Australian rugby union players. British Journal of Sports Medicine, 36(4), 265-269. Bracewell, P. J. (2003). Monitoring meaningful rugby ratings. Journal of Sports Sciences, 21, 611–620. Bull, S., Shambrook, C., James, W., and Brooks, J. (2005). Towards an understanding of mental toughness in elite English cricketers. Journal of Applied Sport Psychology, 17, 209-227. Carron, A.V., Loughhead, T.M. and Bray, S.R. (2005). The home advantage in sport competitions: Courneya and Carron’s (1992) conceptual framework a decade later. Journal of Sports Sciences, 23: 395–407. Carter, A. (1996). Time and motion analysis and heart rate monitoring of a back-row forward in first class rugby union football. In Notational Analysis of Sport, I and II (edited by M. Hughes), pp. 145-160. Colantuoni, L., Novazio, C., Izar, A. and Pozzi, M. (2010). Betting in Sport. International Sports Law Review Pandektis, 8, 3/4, 281-293 Courneya, K.S. and Carron, A.V. (1992). The home advantage in sports competitions: a literature review. Journal of Sport and Exercise Psychology, 14, 13-27. Croucher, JS (1998). Developing strategies in tennis. In J. Bennett (Ed.). Statistics in sport, 157171, London: Arnold. Duthie, G., Pyne, D. and Hooper, S. (2003). Applied physiology and game analysis of rugby union. Journal of Sport Medicine, 33(13), 973-991. Field, A. (2009). Discovering Statistics using SPSS, 3rd Edition, London: Sage Publications Ltd Gucciardi, D. F., Gordon, S., and Dimmock, J. A. (2008). Towards an understanding of mental toughness in Australian football. Journal of Applied Sport Psychology, 20, 261–281. Indonesia (2006). Distance [on-line]. www.indo.com/distance [accessed October 2011]. IRB (2012). Player Charter [on-line]. http://www.irb.com/aboutirb/organisation/index.html [accessed February 2012]. IRB (2012). World Ranking Explanation [on-line]. http://www.irb.com/rankings/explain/index.html [accessed February 2012]. - 37 - | P a g e James, N., Mellalieu, S., Jones, D. and Nicholas, M. P. (2005). The development of positionspecific performance indicators in professional rugby union. Journal of Sports Sciences, 23(1), Jones, G., Hanton, S., and Connaughton, D. (2002). What is this thing called mental toughness? An investigation of elite sport performers. Journal of Applied Sport Psychology, 14, 205_218. Jones, N., James, N., and Mellalieu, S., (2008), An objective method for depicting team performance in elite professional rugby union, Journal of Sport Sciences, 26, 7, 691-700. Koning, R, and van Velzen, B., (2009). Betting Exchanges: The Future of Sports Betting? International Journal of Sport Finance, 4, 1, 42-62 Manly, B.F.J. (1994). Multivariate statistical methods: a primer, 2nd Edition, London: Chapman Hall. Nevill, A.M., Balmer, N.J. and Williams, A.M. (2002). Can crowd reactions influence decisions in favour of the home side? In Science and Football IV, Edited by Spinks, W., Reilly, T. and Murphy, A. (London: Routledge), 308-319. Nevill, A.M., Newell, S.M. and Gale, S. (1996). Factors associated with home advantage in English and Scottish soccer matches, Journal of Sports Sciences, 14, 181-186. Nicholas, C.W. (1997). Anthropometric and physiological characteristics of rugby union football players. Sports Medicine, 23(6), 375-396. Ntoumanis, N. (2001). A step-by-step guide to SPSS for sport and exercise studies, London: Routledge. O’Donoghue, P.G. (2006). The effectiveness of satisfying the assumptions of predictive modelling techniques: an exercise in predicting the FIFA World Cup 2006, International Journal of Computing Science in Sport(e), 5(2), 5-16. O’Donoghue, P.G. (2009). Predictions of the 2007 Rugby World Cup and Euro 2008, 3rd International Workshop of the International Society of Performance Analysis of Sport, Lincoln, UK, 6th-7th April 2009. O’Donoghue, P.G. (2010)a. Research methods for sports performance analysis, London : Routledge. O’Donoghue, P.G. (2010)b. The effectiveness of satisfying the assumptions of predictive modelling techniques: an exercise in predicting the FIFA World Cup 2010, International Journal of Computer Science in Sport, 9(3), 15-27. O’Donoghue, P.G. (2012). The Assumptions Strike Back! A comparison of prediction models for the 2011 Rugby World Cup, - 38 - | P a g e O’Donoghue, P.G. and Williams, J. (2004), An evaluation of human and computer-based predictions of the 2003 rugby union world cup, International Journal of Computer Science in Sport (e), 3(1), 5-22. O'Donoghue, P.G., Dubitzky, W., Lopes, P., Berrar, D., Lagan, K., Hassan, D., Bairner, A. and Darby, P., (2004). An Evaluation of quantitative and qualitative methods of predicting the 2002 FIFA World Cup, Journal of Sports Sciences, 22, 513-514. Passos, P, Araujulo, D, Davids, K, and Shuttleworth, R. (2008). Manipulating Constraints to Train Decisions Making in Rugby Union, International Journal of Sports Science and Coaching, 3, 1, pp. 125-140 Stefani, R. (1998). Predicting Outcomes. In Statistics in Sport, (Edited by Bennett, J.), London: Arnold, pp. 249-275. Tabachnick, B.G. and Fidell, L.S. (2007). Using multivariate statistics, 5th edn, New York: Harper Collins. Thelwell, R., Weston, N., and Greenlees, I. (2005). Defining and understanding mental toughness in soccer. Journal of Applied Sport Psychology, 17, 326-332 Walters, A. and Lovell, G. (2002). An examination of the homefield advantage in a professional English soccer team from a psychological standpoint, Football Studies, 5, 46-59. - 39 - | P a g e
© Copyright 2026 Paperzz