A Probabilistic Model for Optimal Soccer Shots Kenneth T. Co May 16, 2016 Acknowledgements I would like to thank the Woodrow Wilson Undergraduate Research Fellowship Program for this amazing opportunity to do undergraduate research. A huge thank you to Ami Cox for her unending support and another to Professor John Wierman for his guidance with my research. I would also like to thank my friends and family for being there to support me. Contents 1 Introduction 1.1 Background & Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Paper by Vars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 2 Assumptions 2.1 Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Goalkeeper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 6 6 3 Probabilistic Model 3.1 Implementation . . . . . . . . 3.2 Data & Results . . . . . . . . 3.2.1 λ Values . . . . . . . . 3.2.2 Optimal Shot Values . 3.3 Extending to Two Dimensions . . . . . 8 8 9 9 10 12 4 Discussion & Conclusion 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Conclusion & Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 14 16 . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 Introduction 1.1 Background & Motivation Despite the amount of practice and work players put into practicing soccer, they will still make shots that deviate from their intended target. No one can kick a ball with perfect accuracy every time. If they could, their best strategy would be to aim for the corners of the goal within the posts all the time, as those are the hardest place to reach for the goalkeeper. However, professional players do not have perfect accuracy and consistency. It is therefore important to take this into account when developing a strategy to score goals. My paper aims to develop a simple probabilistic framework that will allow a quantitative approach to determining where players should aim their shots at in general game situations. Building a successful model will enable the analysis of open play situations in soccer in pure quantitative and theoretical terms. 1.2 Paper by Vars Summary Vars (2008) tried to formulate an optimal shooting strategy for soccer. He created a onedimensional model to try and explain, in theory, where a player should aim his shot to get the best possible chance of scoring. He then compared his results with actual scoring percentages of several players, taking note of the players’ shooting strategies. Vars’ model made use of a normal distribution to represent the player’s shot distribution. The mean being the point where the shot is aimed at. He then blocks off part of the goal area to represent where a goalkeeper could save. His reasoning for the normal distribution is that competent players do not miss more on one side than the other, and proficient players, bar the rare miskick, shoot the ball in the general direction of their intended shot. Therefore it is reasonable to assume that the base 2 frequency distribution of the shot is symmetric and thickest at the center. This leads him to postulate that the shot distribution is approximately normal. Shortcomings Vars acknowledges that his model leaves out several important factors. He discusses the effects of these factors on the chances of scoring a goal. This discussion will be left towards the end. For now here is a summary of the five factors. The first factor is that goalkeepers are not perfect walls, so blocking out parts of the goal is unrealistic. He suggests varying the value of the save percentage rather than making it an absolute 100%. The second factor is that his model is one-dimensional simplifies a goal mouth that is actually in two dimensions. The third factor is the positioning of the player. With this third factor, he mentions two relevant components: the player’s distance from the goal and the player’s angle from the center of the goal. The fourth factor is the time it takes for the ball to reach the goal. This is greatly affected by the speed of the shot and the distance of the player from the goal. Finally, the final factor is the goalkeeper’s behavior. A critical element of shooting strategy is tricking the goalkeeper into diving the wrong way. Although he digresses, saying that this game theory component of shooting is less important in the ordinary run of play, where players only have seconds to make split decisions. Improvements My paper expands and builds on the ideas behind Vars’ theoretical model. Although he makes detailed analysis on concrete data, our paper does not focus on that since it is outside his theoretical framework. My model uses a normal distribution as the base frequency distribution for the player’s shots, as Vars has. My paper builds a more rigorous and formal description of the probabilistic model. I describe the various factors such as the goalkeeper and player with the appropriate probability distributions, and then calculate the optimal spots for the shot placement. Vars’ discussion on the important factors his model left out is used as a starting point on where we will improve his idea. Aside from the factors he mentioned, I also add some of our own and expound on how these factors can affect the model and ultimately the strategy behind scoring a goal. 3 Chapter 2 Assumptions 2.1 Situation This chapter details the basic assumptions I make for the model. First off, there are three terms used often in this paper: shot distribution which refers to the probability distribution of the player’s shot, save distribution which refers to the probability distribution of the goalkeeper’s save, and optimal shot which refers to the point where the player can maximize his chance of scoring if he aims his shot there. The model considers a generalized open play situation where there is a player facing a goalkeeper defending a goal area. These are the only three components. It is assumed that 1. A player is at a distance from the goal and is at a reasonably wide enough angle in front of the goal. 2. The goalkeeper positions himself in the center of the player’s line of sight between the goal posts. 3. This is in open play where the player is about to shoot. 4. The player and goalkeeper have no “stronger side” when shooting or saving respectively. That is, they do not shoot or save better to their left than their right or vice-versa. 5. The situation is quick and the shot is made in a split second. 6. There are no obstructions between the player and goal aside from the goalkeeper. The first and third assumptions are made because that is the general scenario considered. The second assumption follows because the goalkeeper would like to maximize his chances of reaching a shot at anywhere on the goal. So it is best for the goalkeeper to place himself in the middle of the player’s line of sight with respect to the goal. The fourth, fifth, and sixth assumptions are simplifying ones. 4 The fourth assumption is to simplify computations. If the player has a stronger side, then clearly that side would be preferred. However, if the goalkeeper has a stronger side, he may position himself differently. There is also an interaction to when the player’s stronger side matches with the goalkeeper’s stronger side. This will complicate computations. If the player’s stronger side matches with the goalkeeper’s weaker side, the choice for the player is obvious. However, the goalkeeper will know this and make decisions differently. That is why removing this and making the additional fifth assumption will eliminate the game theory element from the model. The calculations are more convoluted for these types of situations. The split second assumption makes the actions made by the player and goalkeeper more reliant on muscle memory and instinct rather than decision making. If the reader is interested game theory, there are numerous literature available on its application in soccer and, in particular, its application to penalty kicks. It is also important to identify some assumptions that are not made. These include • How the player is handling the ball - he could be running onto the ball, or jogging while receiving the ball. • How the player strikes the ball - the power, spin, and flight of the ball. • What the exact position of the player is relative to the goal. • Which part of the body the player strikes the ball with. The player can be receiving the ball, running onto the ball, or already holding the ball before making the shot. These various situations may have an impact on the player’s overall shot accuracy. When the player strikes the ball, they can vary its movement in different. Doing so may make it easier or more difficult for the goalkeeper to make the save while changes the amount of control the players themselves have over where the ball will go. In soccer, the player can strike the ball with any part of his body except the arms and hands. There could be variations with where the ball could go if it was struck with the players left foot, right foot, forehead, chest, or any other body part. The model does not consider all these complexities just yet. These and other factors are discussed further in Chapter 4. To summarize, the situation that is modeled is a simply open play situation that has three components: the player facing a goalkeeper defending a goal area. 5 2.2 Player The fourth assumption made in Section 2.1, that the player does not shoot better on one side than the other, implies that the shot distribution of the player is necessarily symmetric in its horizontal component. For decent players it makes sense for them to shoot the ball in the general direction of their intended target. Therefore it is reasonable to assume that the base frequency distribution of the shot to be thickest at the center. Because the type of shot and the body part the player makes contact with the ball are not distinguished, it is reasonable to combine these probabilities and assume that the aggregate distribution is approximately a Normal distribution. To count for mis-kicks, deflections, and outlier random events, a Cauchy distribution as it is a standard model for random direction. This is motivated by Buffon’s Needle Problem which states that if you throw a needle down at a random angle, and look where the extended line hits a line in the plane, it has a Cauchy distribution. To summarize, there is a mixture model for the shot distribution. The base is a Normal distribution to represent the actual shot of the player and there is an added Cauchy distribution to represent the probability of mis-kicks. Player’s Skill Better players are more consistent and have less variance in their shots. Thus the skill level of the player is inversely proportional to the variance of their shot distribution. Aside from the variance, worse players may be more susceptible to mis-kicks. Hence the proportion of the Cauchy distribution can be increased to reflect this for lower skilled players. 2.3 Goalkeeper The goalkeeper is assumed to have a reasonably high chance of saving goals within his reach. This save percentage will drop off for regions further away from his starting position at the center of the goal. The fourth assumption made in Section 2.1, that the goalkeeper does not save better on one side than the other, implies that the save distribution of the goalkeeper is necessarily symmetric in its horizontal component. 6 Goalkeeper’s Skill The goalkeeper is assumed to operate at a high and consistent level. To find the optimal shot, randomness and inconsistency from the goalkeeper should not be a factor. Optimizing the chance to score a goal should come down to the superior shot placement of the player, rather than the incompetence of the goalkeeper. Therefore there must be a near 100% chance for the goalkeeper to save shots near his starting position. The drop-off of the save probability for areas further from the center should not be too drastic. Goal Area Shots that go into the designated goal area are considered to be goals. Shots outside the goal are are not goals (misses). These misses can be thought of as shots with a 100% of being saved. 7 Chapter 3 Probabilistic Model 3.1 Implementation The one-dimensional model is a good basis. The model was implemented in MATLAB version R2015b. The three components of the model: the player, goalkeeper, and goal area are represented by the shot distribution, save distribution, and goal area respectively. These are the parameters of the one-dimensional model. Goal Area The goal area is the interval [−1, 1]. Shot Distribution The shot distribution is a linear combination λN + (1 − λ)C where N is a Normal random variable and C is a Cauchy random variable. N has mean µ ∈ [−1, 1]. The player aims at point µ within the goal interval variance [−1, 1]. N has variance σ which is inversely proportional to the skill level of the player. The skill level of the player s ranges from 0 to 1. The variance is computed such that 100·s% of N lies within the interval [−1, 1] when µ = 0 (the player aims straight down the center). The value of λ ∈ [0, 1] represents the strength of the Normal random variable. Lower values of λ imply that the Cauchy variable has a greater effect, meaning an increase in the influence of mis-kicks, deflections, and such. Save Distribution The probability of the goalkeeper saving at point x ∈ [−1, 1] is given by 1 − x2 . This makes it so that 100% at x = 0 and it gradually drops off to 0% at x = ±1. 8 Optimal Point The chance of scoring a goal at point x ∈ [−1, 1] is computed to be p(x) = (λN + (1 − λ)C) · (1 − (1 − x2 )) = (λN + (1 − λ)C) · x2 Let t ∈ [−1, 1] be the optimal shot. That is, the point the player should aim at to maximize his chances of scoring a goal. This is found by computing p(x) for all x ∈ [−1, 1]. We then choose t such that p(t) = maxx∈[−1,1] p(x). The chance of scoring a goal at t is the highest chance of scoring a goal for that player. 3.2 3.2.1 Data & Results λ Values Optimal shot for different values of λ. For these figures we have Optimal Shot t vs. λ for the first row and Optimal Shot Probability p(t) vs. λ for the second row. For each column λ has the values 0, 0.5, and 1 for the first, second, and third columns respectively. One of the questions raised when modeling was whether or not random variables for external factors had a noticeable effect on the optimal shot location. The external factors could include mis-kicks, deflections, and outlier random events. The results show that the strength of the Cauchy distribution has no effect on the location of the optimal shot unless it replaces the base distribution at λ = 0. The Cauchy distribution, as λ increases, only lowers the probability of scoring a goal. This phenomena could be explained 9 by the Central Limit Theorem, where the sum of sufficiently many random variables becomes approximately normal. Hence, external random factors on the player’s shot could be accounted for by simply increasing the variance in the base Normal distribution. For further analysis we can fix λ = 1. 3.2.2 Optimal Shot Values For skill levels s < 0.83 it was found that the optimal point is at t = 0, so players should aim down the center of goal when they are at this skill level or lower. As s increases, for values ≥ 0.83, the optimal shot t moves away from the center at a fast rate then plateaus to 0.77 as s approaches 1. As s increases, the optimal shot probability p(t) increases as expected. Going from 21.3% at s = 0.8 to 30.7% at s = 0.99. Skill Level s 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 Optimal Shot t Goal Probability p(t) 0 0.213215 0 0.213654 0 0.213920 0.089 0.214002 0.280 0.214249 0.378 0.214904 0.448 0.215971 0.503 0.217463 0.548 0.219398 0.586 0.221804 0.618 0.224721 0.645 0.228206 0.669 0.232338 0.689 0.237227 0.706 0.243038 0.721 0.250022 0.734 0.258588 0.745 0.269483 0.755 0.284303 0.765 0.307808 Optimal shot and the highest goal probability for each skill level. 10 This figure shows the optimal shot probability p(t) as skill level s increases. As expected, the chance of scoring increases as the skill level increases. It achieves the largest probability with p(t) ≈ 0.31 as s approaches 1. This figure shows the distance of the optimal shot t from the center as skill level s increases. The optimal spot is t is on center for skill levels s ≤ 0.82, but it increases for s ≥ 0.83, t ≈ 0.77 as s approaches 1. 11 This figure shows the goal probabilities (y-axis) for each skill level at each point on the goal area (x-axis). Note that the center of the x-axis represents the center of the goal (x = 0). The different colors correspond to different skill levels: light blue (s = 0.80), green (s = 0.85), purple (s = 0.90), yellow (s = 0.95), orange (s = 0.975), and blue (s = 0.99). To summarize, the optimal shot location for high-skilled players is ≈ 0.77 of the distance from the center of the goal to either post. While for low-skilled players, it is best to aim down the center of the goal. This conclusion reflects our previous discussion and data. 3.3 Extending to Two Dimensions One of the most obvious improvements that can be made is to extend the model into two dimensions. Because it is a vital step in making the model more realistic, this discussion deserves its own section. To extend the current model to two dimensions, the goal area could be extended to the interval [−1, 1] on the x-axis and [0, 1] for the y-axis. For the shot and save distributions, the following ideas have to be considered. 12 Shot Distribution The goal area can be divided into two components, the horizontal and vertical components. These are the x and y-axes respectively. This subsection focuses on the possible candidates for the shot distribution on the vertical axis (y-axis). There is no compelling reason to have the horizontal and vertical shot distributions to be dependent. As in kinematics, one is able to separate and compute on the x and y-axes as independent systems, the same can be done for the player’s shot distribution. When the probabilities are computed on each axis, they can then be multiplied to get the goal-scoring chance on that point in two dimensions. It is reasonable to have different distributions for the horizontal and vertical aspects of the shot, as the forces acting on each are not the same. Gravity is a factor for the vertical component of the shot, gravity. Additionally, the ball can only go from the ground and up. The horizontal shot does not have a force such as gravity acting upon it, and it is free to go left or right. This justifies differences between the distributions used. In soccer, the horizontal direction of the shot when kicking a ball is easy to control. In general, where the player points his body to is where the ball will go. Relative to that, the height of the shot is much more difficult to control, and so gives itself to higher volatility. To add to that, gravity is a large external force that acts on the vertical component of the ball - making it harder to control. This justifies a wider spread for the vertical shot distribution than the horizontal one. Possible candidates for the vertical shot distribution are one-sided distributions. Modifying the Normal distribution for this is also possible. Save Distribution Because of human anatomy, it is natural for a goalkeeper’s horizontal and vertical save distributions to be dependent. For example, it is much easier for goalkeepers to block shots within the reach of their arms than shots in that are reachable from their legs. This is because it is easier to be precise with the arms than the legs. To achieve the desired probability distribution, a piecewise approach is the best option. For the height from the waist up to the neck level, there can be very high chances of saving near the center with a gradual drop off as it goes further from the center. While from the waist down and above the head level, the chances of saving near the center can be high, but it should drop off quickly as it goes further from the center. 13 Chapter 4 Discussion & Conclusion 4.1 Overview This section goes over all the other factors that should be considered in future iterations of the model. The discussions include • what these factors are, • their effects on the open play scenario, and • how each can be incorporated into the model. Positioning The different positions of all the components (player, goalkeeper, and goal area) with respect to each other is a big factor in the player’s chance of scoring a goal. We use the player as our reference point. That is, everything hereon is taken from the perspective or viewpoint of the player. Angles and distances will be relative to the player. Now for positioning, there are two components for the player: his distance from the goal and his angle from the center of the goal. For distance, if the player gets closer to the goal, then this is equivalent to tightening the player’s shot distribution and increasing the drop-off with the goalkeeper’s save distribution. This is because the player will be nearer, allowing a more accurate shot, and the player’s shot will also reach the goal faster, giving the goalkeeper less time to react. This can be done by decreasing the variance of the player’s shot distribution and making the goalkeeper’s save distribution decrease faster the further it is from the center. The opposite affect can be applied if the player is farther from the goal. If the goalkeeper is closer to the player, it increases the goalkeeper’s save distribution base save distribution and decreases the rate at which it drops off the further it is from the center. 14 For the angle, the widest the player can be is if he is in the center with respect to the goal. The tighter the angle, the thinner the goal will appear in front of the player. This can be achieved by decreasing the x-axis component of the goal area. So at a wide angle, the goal area will be the interval [−1, 1]. At a tight angle, the goal area will appear as [−k, k] for some 0 < k < 1. However, these changes do not illustrate all the nuance involved with positioning. Consider the situation where the player is at a tight angle from goal with the right post closer than the left post, all from the player’s perspective. In theory, because the left post is farther, the goalkeeper will have more time to react to shots to that side than shots to the right side. Knowing this, the goalkeeper could choose to anticipate shots at the right side or not. The player will be aware of this as well. This element of game theory is apparent. To add to this, since there is more space to the left of the player, the player can curve the ball around and behind the goalkeeper having a higher chance of scoring a goal. Type of Shot This factor includes how hard the balls is struck, how the ball is struck, and with which part of the body it is struck with. The latter two factors are much more difficult to implement, so we will focus only on the first factor. How hard the ball is struck will affect the speed of the shot. A stronger shot would give the goalkeeper less time to react and therefore smaller chance of saving, but it will be more difficult to control that shot, making it more inaccurate. Thus, to implement this, stronger shots would decrease the goalkeeper’s save distribution while increasing the variance of the player’s shot distribution. Symmetry & Asymmetry Several other symmetric and asymmetric distributions can be tried for the player’s shot distribution, in place of the Normal distribution, to see their effects and how they fit. Surfaces on the human body are naturally rugged, however small the asymmetry may be. So one could consider an asymmetric model for the shot distribution. If we only consider a specific foot on the player, its shot distribution will always be asymmetric. However, a counter to that, in favor for a symmetric model, could be made by considering both of the player’s feet. If the shot distribution of each foot is asymmetric, but they happen to be a mirror of each other, then putting them together gives a symmetric distribution. Whether a player is left-footed, right-footed, or ambidextrous is another more complicated matter. Asymmetry can also be considered for a goalkeeper. When they have “better side” they can save better from. However, to the player who is shooting, that information may not always be available, so it’s reasonable to have the player assume a symmetric save distribution when deciding where to best place his shot. 15 Additional Players Increasing the number of moving players such as attackers and defenders has a significant impact on where to shoot. There will be multiple moving parts and they all have complex interactions. The actual implementation will be significantly more difficult. Complex Factors The final set of include the more difficult to measure or implement. fitness of the players, and others. (game theory), mental composure 4.2 subtle and complex factors. These are, in general, very The physical factors include air resistance, turf condition, The psychological factors could include the mind games of the players, teamwork, attitude, and many others. Conclusion & Summary In our one-dimensional model, it was concluded that the optimal shot location for highskilled players is approximately 0.77 of the distance from the center of the goal to either post. While for low-skilled players, it is best to aim down the center of the goal. Recall the five factors Vars acknowledged that he left out of his model. These factors are (1) the goalkeeper not being a perfect wall, (2) the goal area is in two dimensions, (3) the positioning of the players, (4) the time it takes for the ball to reach the goal, and (5) deceiving the goalkeeper. Our one-dimensional model addresses the first factor. Implementation details for the second and third factors were also discussed. However, there were no concrete suggestions for the more complex fourth and fifth factors. Overall the first improvement we recommend is to extend the model into two dimensions. After that, positioning should be next. Adding these will greatly improve the model and make it closer to real life. 16 Bibliography [1] Vars, Fredrick E. Missing well: optimal targeting of soccer shots. The University of Alabama, 2008. Retrieved from http://ssrn.com/abstract=1268872 17
© Copyright 2026 Paperzz