Professional Documents
Culture Documents
Throughout my day’s watching these games of hockey, I always noticed advertisements from
betting companies encouraging its viewers to bet on the score and winner of these high tier sports
rivalries in exchange for money. The whole concept of betting on sports seemed extremely
convoluted to myself, especially when I came across a CBC video showcasing that the grand scheme
of winners from sport’s betting website were nerd-like individuals who had created mathematical
models to predict the winners/rankings of future games and/or seasons. The complexity of these
mathematical models didn’t present themselves until I finally learnt the concepts of statistics and
probability distribution in my SL math class. I noticed that specific probability distributions can be
analyzed and extrapolated using probability theorems to create mathematical models of probability
for future situations.
(Figure 1.0)
15 13
11
10 7 6
5 2 3 2
0
0 1 2 3 4 5 6 7 8
# OF GOALS FORWARDED
Now that we have created a frequency table for the goals forwarded of Toronto, this data must be
converted into a graph displaying probability so that it can be interpreted to find expected values of
the team’s goal scoring tendencies. At first it was difficult for me to find distinct curve pattern (e.g.
exponential, normal) which matches with my data as it is discrete. After doing further research, I see
this model best matching a Poisson Distribution curve. The Poisson Distribution is a discrete
frequency distribution that gives the probability of a number of independent events occurring in a
fixed time. To determine whether the number of arrivals per minute follows a Poisson distribution,
the null and alternative hypotheses are written below:
𝐻0 : The number of arrivals per minute follows a Poisson distribution
𝐻1 : The number of arrivals per minute does not follow a Poisson distribution
To analyze is my data follows a Poisson Process, I must first determine the parameter of my data,
which is the mean, modelled as Lambda (ƛ).
∑𝑐𝑗=1 𝑚𝑗 𝑓𝑗
𝑋=
𝑛
277
𝑋= = 3.37805 = ƛ
82
This value of X will not be rounded to three significant figures as I am using this mean value as the
estimate of ƛ. Therefore, using Lambda as an estimate of the mean, the frequencies of X successes
for (𝑋 = 1,2,3,4,5,6,7,8) can be found. Moreover, the theoretical frequency for each value of X is
derived from multiplying the sample size (N) by each X values Poisson probability.
Now that I have found the estimate of the mean for the total data, I must calculate the Poisson
Probability for 10 intervals of X. These intervals include (𝑋 = 1,2,3,4,5,6,7.8.9 𝑜𝑟 𝑚𝑜𝑟𝑒). I must
calculate the probability of 9 or more as it is important to note that the Poisson Probability is
displaying a theoretical concept of the goals forwarded by the Toronto Maple Leaf’s. Thus, since it is
theoretically possible to score more than 9 goals in a game, or estimate of the mean (3.37805) will
provide the expected value of (𝑋 = [9, ∞])
To find the theoretical Poisson Probability of X, I discovered the Poisson Distribution Formula which
is displayed below:
ƛ𝑥 𝑒 −ƛ
𝑃(𝑋 = 𝑥) =
𝑥!
Where:
ƛ = Used as a parameter in the equation (like mentioned previously)
𝑒 = Euler’s Number, a mathematical constant with the value 2.71828
Calculating the expected probabilities of each goals forwarded seem fairly easy at first, however I
approached a convoluted stage when I had to find the expected probability of (X=0). This is when I
remembered the process which my tutor had taught me to show the factorial of 0 is equal to 1.
Now that I have each of the P(X) for Goals Forwarded by the Toronto Maple Leaf’s, I can now
take that data and create a visual curve, which should model the original GF vs Frequency
graph from Figure 1.1
Figure 1.2
0.15 0.12505
0.11524
0.1
0.0704
I can evidently see that this distribution is a Poisson Distribution as Figure 1.2 using Lambda as a
parameter of theoretical probability matches with the original model from Figure 1.1. I can confirm
this as the trendline’s have the same intervals of increase and decrease between the different points.
However, the graph does not perfectly portray the data of the Goals Forwarded.
Instead of using qualitative mathematical models, I needed to find a way to get a mathematical
process which could take my data and determine if my data was Poisson in the most accurately way
possible. To get a more accurate representation of how close my data is to a Poisson model, I decided
to perform a Chi-Squared Test. The Chi-Square goodness of fit test is a non-parametric test that is
used to find out how the observed value of a given phenomenon is significantly different from the
expected value. The equation is displayed as:
(𝑓0 − 𝑓𝑒 )2
ƛ2𝑘−𝑝−1 = ∑
𝑓𝑒
𝑘
Where:
𝑓0 = Is the observed frequency
𝑓𝑒 = Is the theoretical or expected frequency
𝑘 = The number of categories or classes remaining after combining classes
𝑝 = The number of parameters estimated from the data
Looking at the equation below, it is evident that I am missing some variables from the equation. I
need to find the theoretical probability (𝑓𝑒 ). This frequency can be found from: 𝑓𝑒 = (𝑛)(𝑃(𝑋)). The
following table shows the calculations which I did to get the final approximation value of Lambda
squared. I did this by setting up a table of equations where I complete a step by step process in trying
to attain the sum of the final formula, which are all listed on the table below:
Table 1.2
After completing the table, I am sort of wary about how the digit for the theoretical frequency of
10.2541 came about to be such a high value in comparison to the rest. Nonetheless, the sum from this
Poisson Distribution equation notes that ƛ2 = 4.02652.
Now that the step by step process is complete, I must continue to find the subscript under ƛ. To
determine the subscript under ƛ, I must determine a concept associated with the Poisson Distribution
known as the Degrees of Freedom. The equation is written below as:
ƛ2 = 𝑘 − 𝑝 − 1
We remember that the definitions for the variables k and p have already been stated above.
Therefore, the value of k, which is the number of classes remaining once combining classes, is 9 as
that is the theoretical class which does not appear in the actual observed frequencies. Furthermore,
the value of p is 1 as we only parameter being considered is Lambda which is in co-relation to the
Chi-Squared test. Therefore, I can find the degrees of freedom as followed:
𝑘−𝑝−1= 9−1−1=7
Now that our degrees of freedom are found, we must use the alternative hypothesis testing method to
determine at last if the distribution is a Poisson Distribution. To pursue with this testing method, I
must use the same rejection hypothesis then was mentioned in page 1. At first, I was very confused in
finding the relation between the null hypothesis test and my data’s fitness to a Poisson Distribution. I
then decided to watch some YouTube videos and learn how to carry out the testing method, which
involves a concept that I was taught in class, known as critical points. I first set to set up the rejection
theorem:
𝐻0 : ƛ = 3.37805
𝐻1 : ƛ ≠ 3.37805
Will need to continue work on this. I proceeded with the next step of my IA as I could already infer
that my Distribution was Poisson from qualitative observations, however I still need to work on this
calculation
Now that my Chi-Squared test has proven that the distribution of the Goals Forwarded count of the
Toronto Maple Leaf’s is a Poisson Distribution, however my work is still not complete. Since my
end goal is to create a matrix of probabilities to show the attack strength of each team, I must also
determine the Poisson Distribution of the Montreal Canadiens.
Since my previous Chi-Squared test worked to prove that the Maple Leaf’s goal scoring distribution
was indeed a Poison Distribution, I can now arrive to a conclusion that goal scoring in the sport of
hockey is a Poisson Process. With this knowledge, I can now create a P(X) model of goal scoring in
relation to the found Lambda value of the Montreal Canadiens.
Figure 2
15 17 18
10 14
12
10
5
6 2 1 1 1
0
0 1 2 3 4 5 6 7 8 9 10
# OF GOALS FORWADED
This figure looks very similar to Figure 1, in which the Goals Forwarded data was distributed in a
similar curve. To continue and find my P(X) chart, I must determine the value of ƛ and then calculate
once again.
∑𝑐𝑗=1 𝑚𝑗 𝑓𝑗
𝑋=
𝑛
209
𝑋= = 2.54878 = ƛ
82
# of Goals P(X) from Poisson
Forwarded (X) Distribution with ƛ =
2.54878
0 0.07818 𝑃(𝑋 ≥ 11) = 1
1 0.19926 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1)
2 0.25393
+ 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3)
3 0.21574
+ 𝑃(𝑋 = 4) + 𝑃(𝑋 = 5)
4 0.13747
5 0.07007 + 𝑃(𝑋 = 6) + 𝑃(𝑋 = 7)
6 0.02977 + 𝑃(𝑋 = 8) + 𝑃(𝑋 = 9) + 𝑃(𝑋
7 0.01084 = 10)]
8 0.00345 𝑃(𝑋 ≥ 11) = 0.00006
9 0.00098
10 0.00025
11 or more 0.00006
Figure 2.1
0.19926
0.2
0.15 0.13747
Figure 2.1 looks similar to the graph computed from the data on figure 2. I noticed that the
theoretical probabilities of 𝑃(7 ≤ 𝑋 ≤ 11 𝑜𝑟 𝑚𝑜𝑟𝑒) didn’t really match the results from the original
graph. Reflecting on possible limiting factors, the reason for this is because the observed frequency
of scoring goals in high ranges are extremely rare, this is why there are small fluctuations between o
and 1 for 𝑃(7 ≤ 𝑋 ≤ 11 𝑜𝑟 𝑚𝑜𝑟𝑒) yet a gradual decrease in the theoretical probability chart.
Now that all Poisson Probabilities are found for Goals Forwarded of the Toronto Maple Leaf’s and
the Montreal Canadians, I must compute the probabilities back into the Poisson Distribution formula
and then create a matrix of probabilities with possible score outcomes.