Stop paying for APIs to calculate distances and use this Open Source tool!

Daniel Sharp
Applied Data Science
5 min readFeb 7, 2022

--

How to use OSRM to calculate distances reliably and for free.

Photo by T.H. Chia on Unsplash

Calculating distances between a set of coordinates is something that regularly comes up in Data Science projects. Whether it is planning routes for delivery services, or measuring a customer’s willingness to travel to certain locations, getting an accurate measure of distance is always key.

Up until recently, I thought there were only two ways to do this:

1. Pay for a Cloud provider’s API, such as Google’s, AWS, or on Azure.

Costs for using Google’s API

Based on Google’s costs, calculating the distance between 60,000 pairs of locations would amount to a cost of £222. This might or might not be significant to you, depending on the company budget and how often you need to perform these calculations.

2. Use haversine distance

This distance assumes an ‘as the crow flies’ type of travel, which basically means travelling in a straight line between two locations, ignoring roads and other geographic features. This distance calculation will probably generally correlate with ‘street’ distance. However, it will underestimate distances in some situations.

For instance, if you wanted to walk from the London Eye to the Korean War Memorial, using haversine distance you would think it’s only 300 metres away. However, once you account for the Thames in the middle and decide you will not try to swim through it, you’ll find that the distance is actually three times what you thought, 900 metres.

Getting from one place to the other in a straight line isn’t really an option

So, how can you save the costs of using Cloud APIs but still get an accurate calculation of the distance?

I very recently learned of OSRM. As the letters in orange suggest, it’s an Open Source tool that allows you to calculate routes between pairs of points anywhere in the World. It works off OpenStreetMap extracts, which have coverage of pretty much all of the planet. You can view a demo of the tool here.

Their documentation is very easy to follow if you know Docker (if you don’t, read this) and hosted on the README.md file on their repository. However, for sake of completeness, I will add the instructions here.

Step 1: Download the map files for the region you want to cover.

This is a key step, as it will define what regions your tool will cover. It is important to note that the larger the region, the more RAM the server requires to be able to run.

The map extracts can be downloaded from Geofrabrik. You’ll need the obm.pbf files.

In my case, I’ve previously tested it using map data for the whole of Mexico and I was just about able to run that on a server with 12 GB of RAM. The data of the map files for Mexico is close to 500MB. However, these RAM requirements only apply for the pre-processing step and not for the general operation of the tool.

For this example, I’ll create a distance calculator for London, which is only 76MB. I downloaded the files using the following command:

wget http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf
Downloaded the map extract

Step 2: Pre-process the extract

In this step, you can define what profile you want to use for the distance calculations. You can choose one of the following:
- car 🚙
- foot 🚶
- bicycle 🚲

The command to run this step, using foot as the profile, is the following:

docker run -t -v “${PWD}/data:/data” osrm/osrm-backend osrm-extract -p /opt/foot.lua /data/greater-london-latest.osm.pbf

To change the profile, just change the word in the -p argument of the command. Once this command is run, you’ll see that several new files are created.

The commands to run the next pre-processing steps are the following:

docker run -t -v “${PWD}/data:/data” osrm/osrm-backend osrm-partition /data/greater-london-latest.osrmdocker run -t -v “${PWD}/data:/data” osrm/osrm-backend osrm-customize /data/greater-london-latest.osrm

These take a few seconds to complete.

Step 3: Run the API

This command will initialise an API endpoint which you can send the coordinates to in order to obtain the distance measures and travel times. By default it’s mapped to port 5000.

docker run -t -i -p 5000:5000 -v "${PWD}/data:/data" osrm/osrm-backend osrm-routed --algorithm mld /data/greater-london-latest.osrm 

Then you can use curl, Python, or any programming language, to calculate the distance between two pairs of coordinates. We can run it for the two locations we mentioned above (Korean War Memorial and the London Eye):

curl "http://127.0.0.1:5000/route/v1/foot/-0.123933,51.503716;-0.119513,51.503361"

This returns the distance (in metres) and the duration (in seconds) of travel.

I hope this blog post helps you get more accurate distance measurements in your Data Science projects and/or reduces costs related to that. Let me know in the comments if this was useful and how you applied to your projects.

Applied Data Science Partners is a London based consultancy that implements end-to-end data science solutions for businesses, delivering measurable value. If you’re looking to do more with your data, please get in touch via our website. Follow us on LinkedIn for more AI and data science stories!

--

--