The 2018 soccer World Cup has started in Russia and it is one of the most widely viewed sporting events in the history. So the potential winners are of significant interest. Andrew Groll and his colleagues at the Technical University of Dortmund have paired machine learning and statistical data methods at arriving at the most likely winner of the 2018 FIFA World Cup.

Bookmaker method

The better estimate results came from combining the odds from lots of different bookmakers. This approach informs Brazil is the clear favorite to win the 2018 World Cup, with a probability of 16.6% followed by Germany (12.8%) and Spain (12.5%).

random forest approach

Recently, the researchers have developed machine learning techniques that have the potential to outperform conventional statistical approaches. Andreas Groll at the Technical University of Dortmund in Germany and a few colleagues. These guys use a combination of machine learning and conventional statistics. The method named the random-forest approach, to identify a different most likely winner. It included factors like country’s GDP and population, FIFA’s team and their ranks, etc.

They also factored in the average age of players, number of Champions League players in the team, etc.

The predictions arrived at through this process differ from others in some important ways. For a start, the random-forest method picks out Spain as the most likely winner, with a probability of 17.8%. A big factor in this prediction is the structure of the tournament itself.

If Germany clears the group phase of the competition, it is more likely to face strong opposition in the 16-team knockout phase. Because of this, the random-forest method calculates Germany’s chances of reaching the quarter-finals with 58%. By contrast, Spain is unlikely to face strong opposition in the final 16. So has a 73% chance of reaching the quarter-finals.

Groll and co. said, “Spain is slightly favored over Germany mainly due to the fact that Germany has a comparatively high chance to drop out in the round-of-sixteen”.

But there is an additional twist. The random-tree process makes it possible to simulate the entire tournament, and this produces a different result.

