You are on page 1of 9

“Probability Models for Hockey Betting”

Ravi Patel IB SL Math


Hockey has indefinitely established itself as one of the largely growing sports in North
America. Since my day’s of watching Hockey Night in Canada as a child, I have developed a fond
passion for the national winter sport of Canada. Throughout my life of watching these games, I
attained a fond passion of the great rivalry between the two giant franchises in the NHL. Of course,
this is in reference to the acclaimed rivalry between the Toronto Maple Leaf’s and the Montreal
Canadiens.

Throughout my day’s watching these games of hockey, I always noticed advertisements from
betting companies encouraging its viewers to bet on the score and winner of these high tier sports
rivalries in exchange for money. The whole concept of betting on sports seemed extremely
convoluted to myself, especially when I came across a CBC video showcasing that the grand scheme
of winners from sport’s betting website were nerd-like individuals who had created mathematical
models to predict the winners/rankings of future games and/or seasons. The complexity of these
mathematical models didn’t present themselves until I finally learnt the concepts of statistics and
probability distribution in my SL math class. I noticed that specific probability distributions can be
analyzed and extrapolated using probability theorems to create mathematical models of probability
for future situations.

At first, I had my doubts on co-relating normal probability distributions on hockey betting,


however, upon researching different concepts on the HL math textbook, I had rose to a topic. My
rationale of this Internal Assessment is to create probability models from the Goals Scored statistics
of both Toronto and Montreal from the 2017-18 National Hockey League season, and then find an
equation of a probability model which best matches the data in order to predict the score of a future
league game. In order to keep parameters of the equation’s limited and to avoid any home team bias
within my mathematical model, I am going to assume that the game will be played in Los Angeles as
an annual “Winter Classic Game.” To create extremely accurate data, I will not round any of the
figures which my graphing calculator displays to ensure that my parameter values are as accurate as
possibly. Furthermore, I will also access and reflect on different variables that will affect my
calculations. These mathematical precautions will ensure that my final expected probabilities will be
as accurate as possible.
The first step needed to be taken in the mathematical modelling process is finding the expected value
of goals which each team may score in the game. This is done by creating a histogram of the Goals
Forwarded (GF) by each individual team throughout the 82-game season of 2017-18. I also chose to
add a trendline to display the trend of the goal scoring averages of the Maple Leaf’s throughout the
season. The gathered data will be displayed on a histogram in the figure below.

(Figure 1.0)

GF of the Toronto Maple Leafs


25
20
20 18
FREQUENCY

15 13
11
10 7 6
5 2 3 2
0
0 1 2 3 4 5 6 7 8
# OF GOALS FORWARDED

Now that we have created a frequency table for the goals forwarded of Toronto, this data must be
converted into a graph displaying probability so that it can be interpreted to find expected values of
the team’s goal scoring tendencies. At first it was difficult for me to find distinct curve pattern (e.g.
exponential, normal) which matches with my data as it is discrete. After doing further research, I see
this model best matching a Poisson Distribution curve. The Poisson Distribution is a discrete
frequency distribution that gives the probability of a number of independent events occurring in a
fixed time. To determine whether the number of arrivals per minute follows a Poisson distribution,
the null and alternative hypotheses are written below:
𝐻0 : The number of arrivals per minute follows a Poisson distribution
𝐻1 : The number of arrivals per minute does not follow a Poisson distribution
To analyze is my data follows a Poisson Process, I must first determine the parameter of my data,
which is the mean, modelled as Lambda (ƛ).
∑𝑐𝑗=1 𝑚𝑗 𝑓𝑗
𝑋=
𝑛
277
𝑋= = 3.37805 = ƛ
82
This value of X will not be rounded to three significant figures as I am using this mean value as the
estimate of ƛ. Therefore, using Lambda as an estimate of the mean, the frequencies of X successes
for (𝑋 = 1,2,3,4,5,6,7,8) can be found. Moreover, the theoretical frequency for each value of X is
derived from multiplying the sample size (N) by each X values Poisson probability.
Now that I have found the estimate of the mean for the total data, I must calculate the Poisson
Probability for 10 intervals of X. These intervals include (𝑋 = 1,2,3,4,5,6,7.8.9 𝑜𝑟 𝑚𝑜𝑟𝑒). I must
calculate the probability of 9 or more as it is important to note that the Poisson Probability is
displaying a theoretical concept of the goals forwarded by the Toronto Maple Leaf’s. Thus, since it is
theoretically possible to score more than 9 goals in a game, or estimate of the mean (3.37805) will
provide the expected value of (𝑋 = [9, ∞])
To find the theoretical Poisson Probability of X, I discovered the Poisson Distribution Formula which
is displayed below:

ƛ𝑥 𝑒 −ƛ
𝑃(𝑋 = 𝑥) =
𝑥!
Where:
ƛ = Used as a parameter in the equation (like mentioned previously)
𝑒 = Euler’s Number, a mathematical constant with the value 2.71828
Calculating the expected probabilities of each goals forwarded seem fairly easy at first, however I
approached a convoluted stage when I had to find the expected probability of (X=0). This is when I
remembered the process which my tutor had taught me to show the factorial of 0 is equal to 1.

𝑛! = 𝑛(𝑛 − 1)(𝑛 − 2)(𝑛 − 3) ….


𝑛! = 𝑛(𝑛 − 1)!
1! = 1(1 − 1)!
1 = 1(0!)
1 = 0!
Using the expression of 𝑛! = 𝑛!, I then replaced one value of 𝑛 with 1 as I knew its factorial would
simply be 1 (1x1). Furthermore, the (n-1) value would also give me zero and hence I found one that
1= 0! I can now continue on with my calculations of the Poisson Probability by replacing X with 0
and ƛ with 3.37805.
3.378050 𝑒 −3.37805
𝑃(𝑋 = 0) =
0!
1𝑒 −3.37805 )
𝑃(𝑋 = 0) =
0!
1(0.03411)
𝑃(𝑋 = 0) =
1
𝑃(𝑋 = 0) = 0.03411
Now that I computed my Expected Poisson Probability of (X=0), I must continue to do the same for
all the other intervals of X, which include (X= 0,1,2,3,4,5,6,7,8,9 or more). The following Expected
Probabilities are listed below:
Table 1.1
# of Goals P(X) from Poisson
Forwarded (X) Distribution with ƛ = I also found a slight bit of difficulty when trying
3.37805 to find the Probability of 9 or more, however I
0 0.03411 simply applied my knowledge of statistics and
1 0.11524 probability to get its Expected Probability.
2 0.19464
3 0.21917 𝑃(𝑋 ≥ 9) = 1 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)
4 0.18509 + 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3)
5 0.12505 + 𝑃(𝑋 = 4) + 𝑃(𝑋 = 5)
6 0.07040 + 𝑃(𝑋 = 6) + 𝑃(𝑋 = 7)
7 0.03398 + 𝑃(𝑋 = 8)]
8 0.01435
𝑃(𝑋 ≥ 9) = 0.00744
9 or more 0.00744

Now that I have each of the P(X) for Goals Forwarded by the Toronto Maple Leaf’s, I can now
take that data and create a visual curve, which should model the original GF vs Frequency
graph from Figure 1.1

Figure 1.2

GF Probablity with Lambda = 3.37805


0.25
0.21917
0.19464
0.2 0.18509
PROBABILITY OF (X)

0.15 0.12505
0.11524
0.1
0.0704

0.05 0.03411 0.03398


0.01435 0.00744
0
0 1 2 3 4 5 6 7 8 9
# OF GOALS FORWARDED (X)

I can evidently see that this distribution is a Poisson Distribution as Figure 1.2 using Lambda as a
parameter of theoretical probability matches with the original model from Figure 1.1. I can confirm
this as the trendline’s have the same intervals of increase and decrease between the different points.
However, the graph does not perfectly portray the data of the Goals Forwarded.
Instead of using qualitative mathematical models, I needed to find a way to get a mathematical
process which could take my data and determine if my data was Poisson in the most accurately way
possible. To get a more accurate representation of how close my data is to a Poisson model, I decided
to perform a Chi-Squared Test. The Chi-Square goodness of fit test is a non-parametric test that is
used to find out how the observed value of a given phenomenon is significantly different from the
expected value. The equation is displayed as:
(𝑓0 − 𝑓𝑒 )2
ƛ2𝑘−𝑝−1 = ∑
𝑓𝑒
𝑘

Where:
𝑓0 = Is the observed frequency
𝑓𝑒 = Is the theoretical or expected frequency
𝑘 = The number of categories or classes remaining after combining classes
𝑝 = The number of parameters estimated from the data
Looking at the equation below, it is evident that I am missing some variables from the equation. I
need to find the theoretical probability (𝑓𝑒 ). This frequency can be found from: 𝑓𝑒 = (𝑛)(𝑃(𝑋)). The
following table shows the calculations which I did to get the final approximation value of Lambda
squared. I did this by setting up a table of equations where I complete a step by step process in trying
to attain the sum of the final formula, which are all listed on the table below:
Table 1.2

𝑓𝑒 (𝑓0− 𝑓𝑒 ) (𝑓0− 𝑓𝑒 )2 (𝑓0 − 𝑓𝑒 )2


𝑓𝑒

2.79702 0.79702 0.635241 0.227113


9.44968 1.55032 2.40349 0.254346
15.9605 -2.9605 8.76456 0.549141
17.9719 2.0281 4.11319 0.228868
15.1774 2.8226 7.96707 0.52493
10.2541 -3.2541 10.5892 1.03268
5.7728 0.2272 0.05162 0.008942
2.78636 0.21364 0.045642 0.016381
1.1767 0.8233 0.677823 0.576037
0.61008 0.61008 0.372198 0.61008
Total: 4.02652

After completing the table, I am sort of wary about how the digit for the theoretical frequency of
10.2541 came about to be such a high value in comparison to the rest. Nonetheless, the sum from this
Poisson Distribution equation notes that ƛ2 = 4.02652.
Now that the step by step process is complete, I must continue to find the subscript under ƛ. To
determine the subscript under ƛ, I must determine a concept associated with the Poisson Distribution
known as the Degrees of Freedom. The equation is written below as:
ƛ2 = 𝑘 − 𝑝 − 1
We remember that the definitions for the variables k and p have already been stated above.
Therefore, the value of k, which is the number of classes remaining once combining classes, is 9 as
that is the theoretical class which does not appear in the actual observed frequencies. Furthermore,
the value of p is 1 as we only parameter being considered is Lambda which is in co-relation to the
Chi-Squared test. Therefore, I can find the degrees of freedom as followed:
𝑘−𝑝−1= 9−1−1=7
Now that our degrees of freedom are found, we must use the alternative hypothesis testing method to
determine at last if the distribution is a Poisson Distribution. To pursue with this testing method, I
must use the same rejection hypothesis then was mentioned in page 1. At first, I was very confused in
finding the relation between the null hypothesis test and my data’s fitness to a Poisson Distribution. I
then decided to watch some YouTube videos and learn how to carry out the testing method, which
involves a concept that I was taught in class, known as critical points. I first set to set up the rejection
theorem:
𝐻0 : ƛ = 3.37805
𝐻1 : ƛ ≠ 3.37805

Will need to continue work on this. I proceeded with the next step of my IA as I could already infer
that my Distribution was Poisson from qualitative observations, however I still need to work on this
calculation
Now that my Chi-Squared test has proven that the distribution of the Goals Forwarded count of the
Toronto Maple Leaf’s is a Poisson Distribution, however my work is still not complete. Since my
end goal is to create a matrix of probabilities to show the attack strength of each team, I must also
determine the Poisson Distribution of the Montreal Canadiens.
Since my previous Chi-Squared test worked to prove that the Maple Leaf’s goal scoring distribution
was indeed a Poison Distribution, I can now arrive to a conclusion that goal scoring in the sport of
hockey is a Poisson Process. With this knowledge, I can now create a P(X) model of goal scoring in
relation to the found Lambda value of the Montreal Canadiens.
Figure 2

GF by the Montreal Canadiens


20
FREQUENCY

15 17 18
10 14
12
10
5
6 2 1 1 1
0
0 1 2 3 4 5 6 7 8 9 10
# OF GOALS FORWADED

This figure looks very similar to Figure 1, in which the Goals Forwarded data was distributed in a
similar curve. To continue and find my P(X) chart, I must determine the value of ƛ and then calculate
once again.
∑𝑐𝑗=1 𝑚𝑗 𝑓𝑗
𝑋=
𝑛
209
𝑋= = 2.54878 = ƛ
82
# of Goals P(X) from Poisson
Forwarded (X) Distribution with ƛ =
2.54878
0 0.07818 𝑃(𝑋 ≥ 11) = 1
1 0.19926 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)
2 0.25393
+ 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3)
3 0.21574
+ 𝑃(𝑋 = 4) + 𝑃(𝑋 = 5)
4 0.13747
5 0.07007 + 𝑃(𝑋 = 6) + 𝑃(𝑋 = 7)
6 0.02977 + 𝑃(𝑋 = 8) + 𝑃(𝑋 = 9) + 𝑃(𝑋
7 0.01084 = 10)]
8 0.00345 𝑃(𝑋 ≥ 11) = 0.00006
9 0.00098
10 0.00025
11 or more 0.00006
Figure 2.1

Probability of GF with Lambda = 2.54878


0.3
0.25393
0.25
0.21574
PROBABILITY OF (X)

0.19926
0.2

0.15 0.13747

0.1 0.07818 0.07007


0.05 0.02977
0.01084 0.00345 0.00098 0.00025 0.00006
0
0 1 2 3 4 5 6 7 8 9 10 11 or
more
# OF GOALS SCORED (X)

Figure 2.1 looks similar to the graph computed from the data on figure 2. I noticed that the
theoretical probabilities of 𝑃(7 ≤ 𝑋 ≤ 11 𝑜𝑟 𝑚𝑜𝑟𝑒) didn’t really match the results from the original
graph. Reflecting on possible limiting factors, the reason for this is because the observed frequency
of scoring goals in high ranges are extremely rare, this is why there are small fluctuations between o
and 1 for 𝑃(7 ≤ 𝑋 ≤ 11 𝑜𝑟 𝑚𝑜𝑟𝑒) yet a gradual decrease in the theoretical probability chart.
Now that all Poisson Probabilities are found for Goals Forwarded of the Toronto Maple Leaf’s and
the Montreal Canadians, I must compute the probabilities back into the Poisson Distribution formula
and then create a matrix of probabilities with possible score outcomes.

You might also like