As we know, correlation and regression method of analysis is used to determine the relationship between two or more parameters. It is often used in economics and business forecasting. But today we will find out: can this method be used to forecast football match results and position of clubs in tournament table.
In my investigation I use statistical data of PFC "CSKA" from Russian football Premier league.
So, the maximum amount of scores that team can reach is 90, because each team in Russian tournament have 30 games, and for each win they receive 3 points. As I said, my hypothesis is based on the higher number of scored goals that team can make, the higher number of victorious games in tournament. So, we can forecast how much victorious games team can have. And this can give us relevant information about amount of scores that team can reach.
The main problem which I faced with is forecasting of number of scored goals.
There is a statistical data about each player of PFC "CSKA”.
The first column give us information about total number of matches that football player have participated in.
The second column give us information about number of scored goals, but first two players are goalkeepers, so for them, there is a statistical data about missed goals.
The third column shows us average repulsiveness of each player.
Let’s imagine following situation: one football player has resultativeness coefficient equal - 1, which means that this player can score one goal in each match.
But if a team have 30 games in tournament, it doesn’t mean that this player score 30 goals, because probably he will play only 17 games, and according to his resultativness, he will score 17 goals.
So, we also need to predict quantity of played matches for each player.
The fifth column gives us such prognoses. Share of game participation – it is a forecast aimed on a prediction of played matches of each player. We can calculate this coefficient by dividing of number of played games in 2010 on total number of team’s match in a season.
After that we can easily predict number of scored goals by multiplication of receptiveness on total number of team’s match in a season and dividing this on share of game participation.
According to my table, number of scored goals is 48, 54737 and number of missed goals is 21, 50515.
According to statistical data, PFC "CSKA” has 51 scored goals and 22 missed goals.
And now we need to find correlation between two parameters: number of victorious games and number of scored goals. We use such formula: Yx= a0 + a1X. As you understand, factorial parameter will be the number of victorious games.
year wins goals y2 x2 xy Yx ∆y
2007 17 50 289 2500 850 16,179 0,821
2008 16 48 256 2304 768 15,533 0,467
2009 16 53 256 2809 848 17,148 -1,148
total 49 151 801 7613 2466
So, by using the formula Yx = 0,029 + 0,323X where x is a number of forecasted scored goals, we can predict number of wins, and this give us relative information about amount of scores that team can reach during a season.
Also by this method we can predict number of lost games, if we build a correlation between number of missed goals and number of lost games. As the result of instability of PFC "CSKA” the connection degree will be low.
I think, my method is mostly appropriate to such tournament, where teams are more stable, like English football clubs, Italian, Spanish.