Topics in this article
Edge as a Service
Tour de France
This July, it’s not just images of the world’s toughest cyclists in action that will be beamed all over the world once the Tour de France hits the road – first in Spain, then in France. At the same time, streams of real-time data from myriad sensors will be turned into useful insights for organizers, competitors and fans alike.
While the world’s eyes are on the racers, a complex, interlinked system of sensors, networks, edge computing, cloud, real-time analytics and machine learning will operate in the background to deliver statistics and insights to fans, broadcasters, support crews, the race organizers and the LeTourData team.
On the other side of the world – in Johannesburg, South Africa – is Dimension Data’s Tour de France data hub. This is our central command center that receives the race data from sensors in the race – in other words, right at the edge. It is critical to race operations and staffed by a mix of technical and cycling specialists.
Bringing bicycles, people and data together
The raw GPS tracker data comes from the Tour de France itself, where coordinates and speed readings are collected from the bikes and race support vehicles (including cars, motorbikes and helicopters), among other sources along the route.
In the past two years, we have deployed a real-time analytics platform that uses the data collected from these sensors to create a digital twin of the event. Spectators can interact with the digital twin to better engage with the event. The race teams and officials also use this information to navigate and manage the event.
From the data hub, our Middle East and Africa team monitors all the raw data and the platforms and systems that transform it into usable data.
A service delivery manager coordinates the hub operations and handles communications with key stakeholders who connect to the data hub via Webex from various parts of the world – including Europe, the UK, Australia and our NTT Data Truck in France.
While the data stays in the cloud, some data management is done manually and sent from the hub to our processing platform in the Microsoft Azure cloud. For example, we need to flag to the processing system that a rider has swapped bikes – perhaps after a crash – so our data does not suddenly show the rider speeding off in the wrong direction because their damaged bike is now in a support vehicle.
Making data usable to enhance the race experience
The raw data is “messy” because there are many remote, mountainous areas and tunnels along the 3,404km route of the Tour de France where signals from the bikes are lost altogether – sometimes for minutes at a time. These signals may also be duplicated or inaccurate in terms of speed and position due to the limitations of GPS.
Our real-time analytics platform has been developed from scratch by our team over the past four years using open-source frameworks including Apache NiFi, Apache Beam and tens of thousands of lines of Python code. It cleans, interpolates and transforms the data into useful, human-readable data fields such as “Distance from the Start”, “Gap to Previous Rider”, “Group Membership”, “Current Braking Force” and “Relative Wind Speed and Direction”.
But, of course, knowing where a cyclist is in the peloton at any given time is only half the fun. We also want to predict stage and race winners, or the probability of a successful breakaway, for example. And which fan would not want to compare rider and team strengths and strategies as the race progresses?
To do this, we have combined mathematical modeling with the knowledge of a sports scientist to develop our own prediction models, which are generated in near real time using cloud-based virtual machines on Microsoft Azure. We then deliver the processed data to the racing teams, organizers and fans.
Topics in this article
Edge as a Service
Tour de France
Solving the messiest data challenge you can imagine
The Tour de France is very different from everything else we do in terms of the pressures, the stress and the timelines. Our solution tackles complicated data science problems, and everything we do must support the planning and smooth operation of the race.
The event is more important than just the data, but the data we create is central to running and broadcasting the event.
Broadcasters use it in production (such as setting up the most compelling shots) and to enrich their commentary with accurate, real-time numbers. The data also feeds into the publicly accessible Race Center website, which can display the near-real-time position of racers, their speed and the gaps between them.
Our LeTourData team uses the data to send out tweets enriched with in-depth analysis and information as each stage unfolds. In addition, the system transmits data to the team vehicles monitoring the position and support needs of their riders during the race.
Because everything's live and happening in real time, the pressure rises lightning-fast when things go wrong. If we fall more than one second behind, it creates a bottleneck. Things that break need to be fixed quickly – and things do break, because this grueling race winds through remote mountains that interfere with connectivity. In the time it takes us to fix something, fans watching the race may be missing out on important race data.
It takes a lot to design, develop, implement, monitor and manage a system this complicated. The specialists working in our data hub are not just cycling fanatics – they also care deeply about making magic out of data in real time. Here, they can prove themselves in what is probably the toughest data challenge anyone can imagine – but we know they’ll make it to the finish line.
WHAT TO DO NEXT