Le Tour de France as a CSV File

Author

Thomas Camminady

Published

July 4, 2023

LeTourDataSet

Distance and winner average pace

TL;DR

If you use pandas, just get the data via:

import pandas as pd 
df = pd.read_csv("https://raw.githubusercontent.com/camminady/LeTourDataSet/master/data/TDF_Riders_History.csv")

If you use R instead of python, you can run:

library(readr)
df <- read_csv("https://raw.githubusercontent.com/camminady/LeTourDataSet/master/data/TDF_Riders_History.csv")

Disclaimer

For issues with this data set, see the Issues tab. There are some entries that are incorrect. However, so far it seems that the mistake stems from wrong data on the letour.fr website. Looking back, I should have probably scraped another website.

Data

Every cyclist of the Tour de France in a single CSV file, stored in the file data/TDF_Riders_History.csv. There’s also data on every stage in data/TDF_Stages_History.csv.

How to run

In your shell, just run these commands:

poetry install # to install the environment
poetry run python letourdataset/Downloader.py # get the data

Legacy code

This code has been completely rewritten. The previous code, including the output, is in the legacy repository. Especially legacy/README.txt should be read.