THE EFFECTS OF DECOMPOSITION OF THE GOALS SCORED IN CLASSIFYING THE OUTCOMES OF FIVE ENGLISH PREMIER LEAGUE SEASONS USING MACHINE LEARNING MODELS
No Thumbnail Available
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Pushpa Publishing House, Prayagraj, India
Abstract
Description
The English Premier League (EPL) is one of the best football
championships in the world and thus, data generated from it is highly
sought after by users of football data. One of the uses of the data is in
the prediction of outcome of the league matches. This paper applies
four machine learning (ML) models in classifying the outcome (home win, draw, and away win) of five consecutive seasons of EPL using
only six independent variables. Information Gain Ratio (IGR) and
ReliefF were the feature selection algorithms that reduced the
independent variables from 16 to 6. Spearman rank correlation gave a
high significant positive correlation between the ranks of the 2 feature
selection algorithms. The Kruskal-Wallis H test indicated that there is
a significant difference in the dependent variable between the different
Seasons (Chi-square = 15.36, Degrees of freedom = 4, P = 0.004).
Adaptive boosting (AB), gradient boosting (GB), logistic regression
(LR) and random forests (RF) were used in the classification of the
outcome using the six independent variables and the performance
metrics showed a perfect classification in almost all the models. This
paper concluded that the knowledge of the number of goals scored by
the home and away teams, and the number of Goals scored by home
and away teams in the first half and second half are all that is needed
to correctly classify the outcomes of the English Premier League
(EPL). Secondly, the knowledge of the own goals and goals scored by
penalty, and yellow and red cards conceded by the home or away
teams is not necessarily needed in the determination or prediction of
the outcomes of the EPL.
Keywords
QA Mathematics