Five machine-learning lessons from the Tour de FranceBlog
How we used advanced data analytics to predict results at the Tour de France
Predictive analytics and machine learning have the potential to transform the world of sport and your business. The exponential growth in this area is making it possible to predict the outcomes of sports events, a step beyond merely analysing historical data.
This year we took our big data solution for the Tour de France to the next level, introducing machine learning and predictive analytics, the first time this has been used in the world of professional cycling.
Taking three years of data and combining it with additional relevant information, such as stage information, weather and rider profiles, our #ddpredictor correctly called the top-placed riders 71.4% of the time. We also generated detailed profiles of the riders and, each day, predicted the likelihood of the breakaway making it to the finish without being caught.
We processed billions of data points to power the #ddpredictor.
Data has become a critical part of all elements of sports with top teams using it extensively to improve performance. Our focus was to provide a new level of insight to fans looking for more than a traditional viewing experience.
The success of the project was a direct result of a five-step approach that any organisations looking to use the power of machine learning and predictive analytics can build on.
1. You need the data
Machine learning relies entirely on access to large amounts of high-quality data. At the Tour, we had access to three years of telemetry data as well as a number of external data sources.
These included five years of results from other races and detailed stage and weather information. During the Tour itself, we processed 3 billion data points and 147GB of data.
All of this was vital in enabling the systems to deliver on its promise. This was not, however, something that was just turned on at the start of the Tour. We had to start with the data and move forward from there.
2. Create a proof of concept
It’s no use spending time and money building a system that may not work. To avoid this, you need to create a proof of concept to see whether you’re on the right track to start with.
For the Tour de France we started our proof of concept in January and we were able to demonstrate the basic capabilities of the system before we commited to proceed. Even once we decided to go ahead, we had to think about how we would implement the solution.
3. Use the cloud
There are many solutions available in the cloud today that can be used as a service to create a machine-learning platform. The real advantage of going this way is that it allows you to create a solution without incurring the risk of investing in either specialist software and infrastructure early on, or in the wrong technology altogether.
For the Tour de France solution we made use of cloud services from third-party technology partners. This gave us the flexibility to scale the solution up and down as and when we needed to. While the limited timeframe of the Tour dictated this, for organisations that are building a machine-learning capability for use on an ongoing basis, the flexibility of cloud-based services offers significant benefits.
4. Build a cross-functional team
Creating the right team to deliver on machine learning’s potential requires that organisations use not only experts in the field, but also experts from the business.
At the Tour, we had a team of five, including data scientists, engineers, and subject matter experts – in this case a former professional cyclist – working together. Having a mixture of people in the team is critical as machine learning is an area that crosses a number of fields.
Even with the right team and the best software, you should not expect a machine-learning system to produce the right results from day one. Getting the best out of it requires constant revision.
Having different skill sets strengthened the team.
5. Always iterate
To get the best out of any machine-learning system, the team should constantly analyse the results, looking for weaknesses, and tweaking the algorithms to generate more accurate results. You can’t expect it to magically work perfectly from the outset; you need to test the results and learn from what you see. As you increase the amount of data available to the system, it should be possible to observe the behaviour of the system and continue to improve it over time. For any company looking to benefit from the capabilities of machine learning, an iterative approach is essential.
These five strategies allowed us to create a system that can be expanded for next year’s race. We’ve identified three key areas to take the solution forward. These include continuing to iterate and improve the accuracy of system, including new data sets, and using those to introduce new or different predictions, and expanding the solution into other sports and industries. But all organisations need to be looking at how they can leverage machine learning in their own environment.
The explosion of data in the corporate world is making the need for machine-learning capabilities almost non-negotiable. And with the market expected to top USD 18 billion by 2020, it’s time to make the most of the opportunities it will bring to learn from, and grow, your business.