A Probabilistic Model for Optimal Soccer Shots

A Probabilistic Model for Optimal Soccer Shots
Kenneth T. Co
May 16, 2016
Acknowledgements
I would like to thank the Woodrow Wilson Undergraduate Research Fellowship Program
for this amazing opportunity to do undergraduate research. A huge thank you to Ami Cox
for her unending support and another to Professor John Wierman for his guidance with my
research. I would also like to thank my friends and family for being there to support me.
Contents
1 Introduction
1.1 Background & Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Paper by Vars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2
2
2 Assumptions
2.1 Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Player . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Goalkeeper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
6
6
3 Probabilistic Model
3.1 Implementation . . . . . . . .
3.2 Data & Results . . . . . . . .
3.2.1 λ Values . . . . . . . .
3.2.2 Optimal Shot Values .
3.3 Extending to Two Dimensions
.
.
.
.
.
8
8
9
9
10
12
4 Discussion & Conclusion
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Conclusion & Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
14
16
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Introduction
1.1
Background & Motivation
Despite the amount of practice and work players put into practicing soccer, they will still
make shots that deviate from their intended target. No one can kick a ball with perfect
accuracy every time. If they could, their best strategy would be to aim for the corners of the
goal within the posts all the time, as those are the hardest place to reach for the goalkeeper.
However, professional players do not have perfect accuracy and consistency. It is therefore
important to take this into account when developing a strategy to score goals.
My paper aims to develop a simple probabilistic framework that will allow a quantitative
approach to determining where players should aim their shots at in general game situations.
Building a successful model will enable the analysis of open play situations in soccer in pure
quantitative and theoretical terms.
1.2
Paper by Vars
Summary
Vars (2008) tried to formulate an optimal shooting strategy for soccer. He created a onedimensional model to try and explain, in theory, where a player should aim his shot to
get the best possible chance of scoring. He then compared his results with actual scoring
percentages of several players, taking note of the players’ shooting strategies.
Vars’ model made use of a normal distribution to represent the player’s shot distribution.
The mean being the point where the shot is aimed at. He then blocks off part of the goal
area to represent where a goalkeeper could save.
His reasoning for the normal distribution is that competent players do not miss more on
one side than the other, and proficient players, bar the rare miskick, shoot the ball in the
general direction of their intended shot. Therefore it is reasonable to assume that the base
2
frequency distribution of the shot is symmetric and thickest at the center. This leads him
to postulate that the shot distribution is approximately normal.
Shortcomings
Vars acknowledges that his model leaves out several important factors. He discusses the
effects of these factors on the chances of scoring a goal. This discussion will be left towards
the end. For now here is a summary of the five factors.
The first factor is that goalkeepers are not perfect walls, so blocking out parts of the goal is
unrealistic. He suggests varying the value of the save percentage rather than making it an
absolute 100%.
The second factor is that his model is one-dimensional simplifies a goal mouth that is actually
in two dimensions.
The third factor is the positioning of the player. With this third factor, he mentions two
relevant components: the player’s distance from the goal and the player’s angle from the
center of the goal.
The fourth factor is the time it takes for the ball to reach the goal. This is greatly affected
by the speed of the shot and the distance of the player from the goal.
Finally, the final factor is the goalkeeper’s behavior. A critical element of shooting strategy
is tricking the goalkeeper into diving the wrong way. Although he digresses, saying that
this game theory component of shooting is less important in the ordinary run of play, where
players only have seconds to make split decisions.
Improvements
My paper expands and builds on the ideas behind Vars’ theoretical model. Although he
makes detailed analysis on concrete data, our paper does not focus on that since it is outside
his theoretical framework.
My model uses a normal distribution as the base frequency distribution for the player’s shots,
as Vars has. My paper builds a more rigorous and formal description of the probabilistic
model. I describe the various factors such as the goalkeeper and player with the appropriate
probability distributions, and then calculate the optimal spots for the shot placement.
Vars’ discussion on the important factors his model left out is used as a starting point on
where we will improve his idea. Aside from the factors he mentioned, I also add some of
our own and expound on how these factors can affect the model and ultimately the strategy
behind scoring a goal.
3
Chapter 2
Assumptions
2.1
Situation
This chapter details the basic assumptions I make for the model.
First off, there are three terms used often in this paper: shot distribution which refers
to the probability distribution of the player’s shot, save distribution which refers to the
probability distribution of the goalkeeper’s save, and optimal shot which refers to the point
where the player can maximize his chance of scoring if he aims his shot there.
The model considers a generalized open play situation where there is a player facing a
goalkeeper defending a goal area. These are the only three components. It is assumed that
1. A player is at a distance from the goal and is at a reasonably wide enough angle in
front of the goal.
2. The goalkeeper positions himself in the center of the player’s line of sight between
the goal posts.
3. This is in open play where the player is about to shoot.
4. The player and goalkeeper have no “stronger side” when shooting or saving respectively.
That is, they do not shoot or save better to their left than their right or vice-versa.
5. The situation is quick and the shot is made in a split second.
6. There are no obstructions between the player and goal aside from the goalkeeper.
The first and third assumptions are made because that is the general scenario considered.
The second assumption follows because the goalkeeper would like to maximize his chances
of reaching a shot at anywhere on the goal. So it is best for the goalkeeper to place himself
in the middle of the player’s line of sight with respect to the goal.
The fourth, fifth, and sixth assumptions are simplifying ones.
4
The fourth assumption is to simplify computations. If the player has a stronger side, then
clearly that side would be preferred. However, if the goalkeeper has a stronger side, he may
position himself differently. There is also an interaction to when the player’s stronger side
matches with the goalkeeper’s stronger side. This will complicate computations.
If the player’s stronger side matches with the goalkeeper’s weaker side, the choice for the
player is obvious. However, the goalkeeper will know this and make decisions differently.
That is why removing this and making the additional fifth assumption will eliminate the
game theory element from the model. The calculations are more convoluted for these types of
situations. The split second assumption makes the actions made by the player and goalkeeper
more reliant on muscle memory and instinct rather than decision making.
If the reader is interested game theory, there are numerous literature available on its application in soccer and, in particular, its application to penalty kicks.
It is also important to identify some assumptions that are not made. These include
• How the player is handling the ball - he could be running onto the ball, or jogging
while receiving the ball.
• How the player strikes the ball - the power, spin, and flight of the ball.
• What the exact position of the player is relative to the goal.
• Which part of the body the player strikes the ball with.
The player can be receiving the ball, running onto the ball, or already holding the ball before
making the shot. These various situations may have an impact on the player’s overall shot
accuracy.
When the player strikes the ball, they can vary its movement in different. Doing so may
make it easier or more difficult for the goalkeeper to make the save while changes the amount
of control the players themselves have over where the ball will go.
In soccer, the player can strike the ball with any part of his body except the arms and hands.
There could be variations with where the ball could go if it was struck with the players left
foot, right foot, forehead, chest, or any other body part.
The model does not consider all these complexities just yet. These and other factors are
discussed further in Chapter 4.
To summarize, the situation that is modeled is a simply open play situation that has three
components: the player facing a goalkeeper defending a goal area.
5
2.2
Player
The fourth assumption made in Section 2.1, that the player does not shoot better on one
side than the other, implies that the shot distribution of the player is necessarily symmetric
in its horizontal component.
For decent players it makes sense for them to shoot the ball in the general direction of their
intended target. Therefore it is reasonable to assume that the base frequency distribution of
the shot to be thickest at the center.
Because the type of shot and the body part the player makes contact with the ball are not
distinguished, it is reasonable to combine these probabilities and assume that the aggregate
distribution is approximately a Normal distribution.
To count for mis-kicks, deflections, and outlier random events, a Cauchy distribution as it is
a standard model for random direction. This is motivated by Buffon’s Needle Problem which
states that if you throw a needle down at a random angle, and look where the extended line
hits a line in the plane, it has a Cauchy distribution.
To summarize, there is a mixture model for the shot distribution. The base is a Normal
distribution to represent the actual shot of the player and there is an added Cauchy distribution to represent the probability of mis-kicks.
Player’s Skill
Better players are more consistent and have less variance in their shots. Thus the skill level
of the player is inversely proportional to the variance of their shot distribution.
Aside from the variance, worse players may be more susceptible to mis-kicks. Hence the
proportion of the Cauchy distribution can be increased to reflect this for lower skilled players.
2.3
Goalkeeper
The goalkeeper is assumed to have a reasonably high chance of saving goals within his reach.
This save percentage will drop off for regions further away from his starting position at the
center of the goal.
The fourth assumption made in Section 2.1, that the goalkeeper does not save better on
one side than the other, implies that the save distribution of the goalkeeper is necessarily
symmetric in its horizontal component.
6
Goalkeeper’s Skill
The goalkeeper is assumed to operate at a high and consistent level. To find the optimal
shot, randomness and inconsistency from the goalkeeper should not be a factor. Optimizing
the chance to score a goal should come down to the superior shot placement of the player,
rather than the incompetence of the goalkeeper.
Therefore there must be a near 100% chance for the goalkeeper to save shots near his starting
position. The drop-off of the save probability for areas further from the center should not
be too drastic.
Goal Area
Shots that go into the designated goal area are considered to be goals. Shots outside the
goal are are not goals (misses). These misses can be thought of as shots with a 100% of
being saved.
7
Chapter 3
Probabilistic Model
3.1
Implementation
The one-dimensional model is a good basis. The model was implemented in MATLAB
version R2015b. The three components of the model: the player, goalkeeper, and goal area
are represented by the shot distribution, save distribution, and goal area respectively.
These are the parameters of the one-dimensional model.
Goal Area
The goal area is the interval [−1, 1].
Shot Distribution
The shot distribution is a linear combination λN + (1 − λ)C where N is a Normal random
variable and C is a Cauchy random variable.
N has mean µ ∈ [−1, 1]. The player aims at point µ within the goal interval variance [−1, 1].
N has variance σ which is inversely proportional to the skill level of the player.
The skill level of the player s ranges from 0 to 1. The variance is computed such that 100·s%
of N lies within the interval [−1, 1] when µ = 0 (the player aims straight down the center).
The value of λ ∈ [0, 1] represents the strength of the Normal random variable. Lower values
of λ imply that the Cauchy variable has a greater effect, meaning an increase in the influence
of mis-kicks, deflections, and such.
Save Distribution
The probability of the goalkeeper saving at point x ∈ [−1, 1] is given by 1 − x2 . This makes
it so that 100% at x = 0 and it gradually drops off to 0% at x = ±1.
8
Optimal Point
The chance of scoring a goal at point x ∈ [−1, 1] is computed to be
p(x) = (λN + (1 − λ)C) · (1 − (1 − x2 )) = (λN + (1 − λ)C) · x2
Let t ∈ [−1, 1] be the optimal shot. That is, the point the player should aim at to maximize
his chances of scoring a goal. This is found by computing p(x) for all x ∈ [−1, 1]. We then
choose t such that p(t) = maxx∈[−1,1] p(x).
The chance of scoring a goal at t is the highest chance of scoring a goal for that player.
3.2
3.2.1
Data & Results
λ Values
Optimal shot for different values of λ.
For these figures we have Optimal Shot t vs. λ for the first row and Optimal Shot Probability
p(t) vs. λ for the second row. For each column λ has the values 0, 0.5, and 1 for the first,
second, and third columns respectively.
One of the questions raised when modeling was whether or not random variables for external
factors had a noticeable effect on the optimal shot location. The external factors could
include mis-kicks, deflections, and outlier random events.
The results show that the strength of the Cauchy distribution has no effect on the location of
the optimal shot unless it replaces the base distribution at λ = 0. The Cauchy distribution, as
λ increases, only lowers the probability of scoring a goal. This phenomena could be explained
9
by the Central Limit Theorem, where the sum of sufficiently many random variables becomes
approximately normal.
Hence, external random factors on the player’s shot could be accounted for by simply increasing the variance in the base Normal distribution. For further analysis we can fix λ = 1.
3.2.2
Optimal Shot Values
For skill levels s < 0.83 it was found that the optimal point is at t = 0, so players should
aim down the center of goal when they are at this skill level or lower. As s increases, for
values ≥ 0.83, the optimal shot t moves away from the center at a fast rate then plateaus to
0.77 as s approaches 1.
As s increases, the optimal shot probability p(t) increases as expected. Going from 21.3%
at s = 0.8 to 30.7% at s = 0.99.
Skill Level s
0.80
0.81
0.82
0.83
0.84
0.85
0.86
0.87
0.88
0.89
0.90
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
Optimal Shot t Goal Probability p(t)
0
0.213215
0
0.213654
0
0.213920
0.089
0.214002
0.280
0.214249
0.378
0.214904
0.448
0.215971
0.503
0.217463
0.548
0.219398
0.586
0.221804
0.618
0.224721
0.645
0.228206
0.669
0.232338
0.689
0.237227
0.706
0.243038
0.721
0.250022
0.734
0.258588
0.745
0.269483
0.755
0.284303
0.765
0.307808
Optimal shot and the highest goal probability for each skill level.
10
This figure shows the optimal shot probability p(t) as skill level s increases. As expected,
the chance of scoring increases as the skill level increases. It achieves the largest probability
with p(t) ≈ 0.31 as s approaches 1.
This figure shows the distance of the optimal shot t from the center as skill level s increases.
The optimal spot is t is on center for skill levels s ≤ 0.82, but it increases for s ≥ 0.83,
t ≈ 0.77 as s approaches 1.
11
This figure shows the goal probabilities (y-axis) for each skill level at each point on the goal
area (x-axis). Note that the center of the x-axis represents the center of the goal (x = 0). The
different colors correspond to different skill levels: light blue (s = 0.80), green (s = 0.85),
purple (s = 0.90), yellow (s = 0.95), orange (s = 0.975), and blue (s = 0.99).
To summarize, the optimal shot location for high-skilled players is ≈ 0.77 of the distance
from the center of the goal to either post. While for low-skilled players, it is best to aim
down the center of the goal. This conclusion reflects our previous discussion and data.
3.3
Extending to Two Dimensions
One of the most obvious improvements that can be made is to extend the model into two
dimensions. Because it is a vital step in making the model more realistic, this discussion
deserves its own section.
To extend the current model to two dimensions, the goal area could be extended to the
interval [−1, 1] on the x-axis and [0, 1] for the y-axis. For the shot and save distributions,
the following ideas have to be considered.
12
Shot Distribution
The goal area can be divided into two components, the horizontal and vertical components.
These are the x and y-axes respectively. This subsection focuses on the possible candidates
for the shot distribution on the vertical axis (y-axis).
There is no compelling reason to have the horizontal and vertical shot distributions to be
dependent. As in kinematics, one is able to separate and compute on the x and y-axes as
independent systems, the same can be done for the player’s shot distribution. When the
probabilities are computed on each axis, they can then be multiplied to get the goal-scoring
chance on that point in two dimensions.
It is reasonable to have different distributions for the horizontal and vertical aspects of the
shot, as the forces acting on each are not the same. Gravity is a factor for the vertical
component of the shot, gravity. Additionally, the ball can only go from the ground and up.
The horizontal shot does not have a force such as gravity acting upon it, and it is free to go
left or right. This justifies differences between the distributions used.
In soccer, the horizontal direction of the shot when kicking a ball is easy to control. In
general, where the player points his body to is where the ball will go. Relative to that, the
height of the shot is much more difficult to control, and so gives itself to higher volatility. To
add to that, gravity is a large external force that acts on the vertical component of the ball
- making it harder to control. This justifies a wider spread for the vertical shot distribution
than the horizontal one.
Possible candidates for the vertical shot distribution are one-sided distributions. Modifying
the Normal distribution for this is also possible.
Save Distribution
Because of human anatomy, it is natural for a goalkeeper’s horizontal and vertical save
distributions to be dependent. For example, it is much easier for goalkeepers to block shots
within the reach of their arms than shots in that are reachable from their legs. This is
because it is easier to be precise with the arms than the legs.
To achieve the desired probability distribution, a piecewise approach is the best option. For
the height from the waist up to the neck level, there can be very high chances of saving near
the center with a gradual drop off as it goes further from the center. While from the waist
down and above the head level, the chances of saving near the center can be high, but it
should drop off quickly as it goes further from the center.
13
Chapter 4
Discussion & Conclusion
4.1
Overview
This section goes over all the other factors that should be considered in future iterations of
the model. The discussions include
• what these factors are,
• their effects on the open play scenario, and
• how each can be incorporated into the model.
Positioning
The different positions of all the components (player, goalkeeper, and goal area) with respect
to each other is a big factor in the player’s chance of scoring a goal. We use the player as
our reference point. That is, everything hereon is taken from the perspective or viewpoint
of the player. Angles and distances will be relative to the player.
Now for positioning, there are two components for the player: his distance from the goal and
his angle from the center of the goal.
For distance, if the player gets closer to the goal, then this is equivalent to tightening the
player’s shot distribution and increasing the drop-off with the goalkeeper’s save distribution.
This is because the player will be nearer, allowing a more accurate shot, and the player’s
shot will also reach the goal faster, giving the goalkeeper less time to react. This can be
done by decreasing the variance of the player’s shot distribution and making the goalkeeper’s
save distribution decrease faster the further it is from the center. The opposite affect can be
applied if the player is farther from the goal.
If the goalkeeper is closer to the player, it increases the goalkeeper’s save distribution base
save distribution and decreases the rate at which it drops off the further it is from the center.
14
For the angle, the widest the player can be is if he is in the center with respect to the goal.
The tighter the angle, the thinner the goal will appear in front of the player. This can be
achieved by decreasing the x-axis component of the goal area. So at a wide angle, the goal
area will be the interval [−1, 1]. At a tight angle, the goal area will appear as [−k, k] for
some 0 < k < 1.
However, these changes do not illustrate all the nuance involved with positioning. Consider
the situation where the player is at a tight angle from goal with the right post closer than
the left post, all from the player’s perspective. In theory, because the left post is farther,
the goalkeeper will have more time to react to shots to that side than shots to the right side.
Knowing this, the goalkeeper could choose to anticipate shots at the right side or not. The
player will be aware of this as well. This element of game theory is apparent. To add to
this, since there is more space to the left of the player, the player can curve the ball around
and behind the goalkeeper having a higher chance of scoring a goal.
Type of Shot
This factor includes how hard the balls is struck, how the ball is struck, and with which part
of the body it is struck with. The latter two factors are much more difficult to implement,
so we will focus only on the first factor.
How hard the ball is struck will affect the speed of the shot. A stronger shot would give
the goalkeeper less time to react and therefore smaller chance of saving, but it will be more
difficult to control that shot, making it more inaccurate. Thus, to implement this, stronger
shots would decrease the goalkeeper’s save distribution while increasing the variance of the
player’s shot distribution.
Symmetry & Asymmetry
Several other symmetric and asymmetric distributions can be tried for the player’s shot
distribution, in place of the Normal distribution, to see their effects and how they fit.
Surfaces on the human body are naturally rugged, however small the asymmetry may be.
So one could consider an asymmetric model for the shot distribution. If we only consider
a specific foot on the player, its shot distribution will always be asymmetric. However, a
counter to that, in favor for a symmetric model, could be made by considering both of the
player’s feet. If the shot distribution of each foot is asymmetric, but they happen to be a
mirror of each other, then putting them together gives a symmetric distribution. Whether
a player is left-footed, right-footed, or ambidextrous is another more complicated matter.
Asymmetry can also be considered for a goalkeeper. When they have “better side” they can
save better from. However, to the player who is shooting, that information may not always
be available, so it’s reasonable to have the player assume a symmetric save distribution when
deciding where to best place his shot.
15
Additional Players
Increasing the number of moving players such as attackers and defenders has a significant
impact on where to shoot. There will be multiple moving parts and they all have complex
interactions. The actual implementation will be significantly more difficult.
Complex Factors
The final set of include the more
difficult to measure or implement.
fitness of the players, and others.
(game theory), mental composure
4.2
subtle and complex factors. These are, in general, very
The physical factors include air resistance, turf condition,
The psychological factors could include the mind games
of the players, teamwork, attitude, and many others.
Conclusion & Summary
In our one-dimensional model, it was concluded that the optimal shot location for highskilled players is approximately 0.77 of the distance from the center of the goal to either
post. While for low-skilled players, it is best to aim down the center of the goal.
Recall the five factors Vars acknowledged that he left out of his model. These factors are
(1) the goalkeeper not being a perfect wall, (2) the goal area is in two dimensions, (3) the
positioning of the players, (4) the time it takes for the ball to reach the goal, and (5) deceiving the goalkeeper. Our one-dimensional model addresses the first factor. Implementation
details for the second and third factors were also discussed. However, there were no concrete
suggestions for the more complex fourth and fifth factors.
Overall the first improvement we recommend is to extend the model into two dimensions.
After that, positioning should be next. Adding these will greatly improve the model and
make it closer to real life.
16
Bibliography
[1] Vars, Fredrick E. Missing well: optimal targeting of soccer shots. The University of
Alabama, 2008. Retrieved from http://ssrn.com/abstract=1268872
17