Controlling a Google Nest thermostat with Python

Google recently released its Device Access API. This article is a short explanation of how to access and use this API to control a Google Nest thermostat.

We moved to a new house a few months ago and the previous owners left their thermostat behind. This thermostat turned out to be a brand new Nest smart thermostat. Of course, I wanted to connect this smart device to my home automation. But unfortunately, after some searching, I found out that most connections were deprecated after Google acquired Nest a couple of years ago, along with the API access for consumers. The only option was to sign up for a waitlist.

But to my surprise, I got an email from Google a couple of weeks ago! The Device Access Console is now available to individual consumers as well as commercial partners. I wanted to find out if I could control my Google Nest thermostat so I wrote some Python code for this.

E-mail from Google
E-mail from Google

Get started with Google documentation

Google has written some quite elaborate documentation about all the steps needed to access the API. The purpose of this article is not to rewrite their help articles. These articles will get you started:

And you’ll also need these links:

How much does it cost?

Before creating your first project, you must pay a one-time, non-refundable fee (US$5) per account. There can also be cost involved by using Google Cloud services, but I have been polling my thermostat every 5 minutes for a couple of weeks now without cost.

Gather all requirements

It’s tempting to begin creating a Device Access project right away (I did so too), but I found out that it is probably easier to get all the necessary prerequisites first. Before you can start coding you should follow these steps:

  1. Create a Google Cloud Console project at console.cloud.google.com
    1. Create Cloud Console oAuth 2.0 credentials
    2. Set a redirect URI
    3. Turn on the Smart Device Access API
  2. Create a Device Access project at console.nest.google.com

I made some screenshots from each step:

After creating an OAuth 2.0 client ID and a Device Access project you should have these values:

  • A project ID. You can find this in your Device Access console.
  • A client ID. You can find this with your OAuth credentials.
  • A client secret. You can find this with your OAuth credentials.
  • A redirect URI. This URI can be almost any URL you want, as long as this URL matches a URI that is entered as an authorized redirect URI in your Cloud Console project. 

Some example API calls with Python

Now we can start making API calls! Below is a Jupyter Notebook that walks through all the steps to control a Nest thermostat.

You can also make a copy of the Google Colab notebook I made.

Open In Colab

Example for making Nest API calls

Please start by making a copy of this notebook.

Enter your credentials

Enter your credentials and run the cell to generate a login URL. Click the URL and log in to your Google account.

In [ ]:
project_id = 'your-project-id'
client_id = 'your-client-id.apps.googleusercontent.com'
client_secret = 'your-client-secret'
redirect_uri = 'https://www.google.com'

url = 'https://nestservices.google.com/partnerconnections/'+project_id+'/auth?redirect_uri='+redirect_uri+'&access_type=offline&prompt=consent&client_id='+client_id+'&response_type=code&scope=https://www.googleapis.com/auth/sdm.service'
print("Go to this URL to log in:")
print(url)

After logging in you are sent to URL you specified as redirect_url. Google added a query to end that looks like this: ?code=.....&scope=... Copy the part between code= and &scope= and add it below:

In [ ]:
code = '4/add-your-code-here'

Get tokens

Now we can use this code to retrieve an access token and a refresh token:

In [ ]:
# Get tokens

import requests

params = (
    ('client_id', client_id),
    ('client_secret', client_secret),
    ('code', code),
    ('grant_type', 'authorization_code'),
    ('redirect_uri', redirect_uri),
)

response = requests.post('https://www.googleapis.com/oauth2/v4/token', params=params)

response_json = response.json()
access_token = response_json['token_type'] + ' ' + str(response_json['access_token'])
print('Access token: ' + access_token)
refresh_token = response_json['refresh_token']
print('Refresh token: ' + refresh_token)

Refresh access token

The access token is only valid for 60 minutes. You can use the refresh token to renew it.

In [ ]:
# Refresh token

params = (
    ('client_id', client_id),
    ('client_secret', client_secret),
    ('refresh_token', refresh_token),
    ('grant_type', 'refresh_token'),
)

response = requests.post('https://www.googleapis.com/oauth2/v4/token', params=params)

response_json = response.json()
access_token = response_json['token_type'] + ' ' + response_json['access_token']
print('Access token: ' + access_token)

Get structures and devices

Now lets get some information about what devices we have access to and where these are "located". Devices are part of a structure (such as your home). We can get information about the structures we have access to:

In [ ]:
# Get structures

url_structures = 'https://smartdevicemanagement.googleapis.com/v1/enterprises/' + project_id + '/structures'

headers = {
    'Content-Type': 'application/json',
    'Authorization': access_token,
}

response = requests.get(url_structures, headers=headers)

print(response.json())

But we can also directly retrieve the devices we have access to:

In [ ]:
# Get devices

url_get_devices = 'https://smartdevicemanagement.googleapis.com/v1/enterprises/' + project_id + '/devices'

headers = {
    'Content-Type': 'application/json',
    'Authorization': access_token,
}

response = requests.get(url_get_devices, headers=headers)

print(response.json())

response_json = response.json()
device_0_name = response_json['devices'][0]['name']
print(device_0_name)

Get device stats

For this example I simply took the first item of the array of devices. I assume most people probably have one Nest thermostat anyway.

The name of a device can be used to retrieve data from this device and to send commands to it. Lets get soms stats first:

In [ ]:
# Get device stats

url_get_device = 'https://smartdevicemanagement.googleapis.com/v1/' + device_0_name

headers = {
    'Content-Type': 'application/json',
    'Authorization': access_token,
}

response = requests.get(url_get_device, headers=headers)

response_json = response.json()
humidity = response_json['traits']['sdm.devices.traits.Humidity']['ambientHumidityPercent']
print('Humidity:', humidity)
temperature = response_json['traits']['sdm.devices.traits.Temperature']['ambientTemperatureCelsius']
print('Temperature:', temperature)

Set thermostat to HEAT

And last but not least, lets send some commands to our thermostat. The cell below contains the code to set the mode to "HEAT":

In [ ]:
# Set mode to "HEAT"

url_set_mode = 'https://smartdevicemanagement.googleapis.com/v1/' + device_0_name + ':executeCommand'

headers = {
    'Content-Type': 'application/json',
    'Authorization': access_token,
}

data = '{ "command" : "sdm.devices.commands.ThermostatMode.SetMode", "params" : { "mode" : "HEAT" } }'

response = requests.post(url_set_mode, headers=headers, data=data)

print(response.json())

Set a new temperature

And finally we can set a temperature by executing this command:

In [ ]:
set_temp_to = 21.0
In [ ]:
# Set temperature to set_temp_to degrees

url_set_mode = 'https://smartdevicemanagement.googleapis.com/v1/' + device_0_name + ':executeCommand'

headers = {
    'Content-Type': 'application/json',
    'Authorization': access_token,
}

data = '{"command" : "sdm.devices.commands.ThermostatTemperatureSetpoint.SetHeat", "params" : {"heatCelsius" : ' + str(set_temp_to) + '} }'

response = requests.post(url_set_mode, headers=headers, data=data)

print(response.json())

Using Google Cloud Functions to get around Google Ads Scripts limitations

Cloudy lake

My first real venture into the more technical side of online marketing was when I learned some JavaScript to write my own Google Ads (Adwords) Scripts. With Google Ads Scripts you can automate stuff within your Google Ads account. You can automate to the extent that it exceeds the standard functionality of the regular interface. So if you want to go beyond the regular options and buttons that any SEA specialist can use, Google Ads Scripts is the way to go.

But you probably already know this if you’re reading this article. You’re probably here because you found out that Ads Scripts also have limitations. This article explores the possibility of using Google Cloud Functions together with Google Ads Scripts as a workaround for these limitations. (Spoiler: it’s possible!)

I’ll show you a real-life example of something quite hard to do directly within Ads Scripts: fuzzy string matching on search terms. This is useful for classifying search terms that have spelling errors for example. We’ll get to that further down this article.

The limitations of Google Ads Scripts

Like I said, Google Ads Scripts are powerful, but they also have limitations. These limitations include:

  • Maximum execution time of 30 minutes. This sounds like a lot, but it’s a real limitation. Mainly because:
    • You can only use synchronous JavaScript from the older JavaScript 1.6 standard, plus a few features from 1.7 and 1.8.
    • Your script runs on a server somewhere that is probably not solely dedicated to the fastest possible execution of your script. Execution is quite slow.
    • No external libraries like lodash or async. You can only use the methods that Google provided, so if you need something else you’re out of luck.

All in all, it’s easy to find yourself stuck in a situation where you’re looping over a loop inside another loop with runtimes that easily exceed 30 minutes. What if we’d move some of the heavy work outside Ads Scripts to another tool that does not have these limitations? That’s where Google Cloud functions come in!

Google Cloud Functions to the rescue

Cloud Functions is Google Cloud’s event-driven serverless compute platform. This is a complicated way of saying that you can execute functions written in Node.js, Python or Go in the cloud. It’s not a free tool, but there is a free tear that lets you make up to two million invocations per month without charge. This plenty for our use case. I won’t go into detail here about how to set up a new Cloud Function. Check out one of these blogs to learn about setting up Cloud Functions: herehere or here.

With Cloud Functions, we can run our code on faster machines. We can also import external libraries from NPM for Node.js or pip for Python. And last but not least, we can run asynchronous code. The only limitation here is a maximum execution time of 9 minutes. But if this is a problem you could divide the workload over multiple Cloud Functions that run in parallel.

Google provides several ways to trigger a Cloud Function. For our use case, we need to create a Cloud Function that uses an HTTP trigger. This means that it can be triggered by either a GET request or a POST request. We can make these requests from our Google Ads Script as our gateway to pass information between Google Ads and Google Cloud.

How to make a POST request from Ads Scripts to a Cloud Function

Making a POST request from a Google Ads Script is fairly straightforward. We can use UrlFetchApp.fetch() to make a request. We can add an options object that contains JSON payload. Use this to send data from our Google Ads account to the Cloud Function.

// Add your own Cloud Function URL here:
var CLOUDFUNCTIONSURL = 'https://europe-west1-example.cloudfunctions.net/example';
function main() {
var json = {
"example_content": "Whatever you want to send to your Cloud Function",
"another_example_content": ["You", "can", "also", "use", "arrays", "here"]
};
var options = {
method: 'POST',
contentType: 'application/json',
payload: JSON.stringify(json)
};
var response = UrlFetchApp.fetch(CLOUDFUNCTIONSURL, options);
Logger.log(response);
// Do whatever you want with the response...
}

This specific example populates a JSON payload with two key-value pairs: ‘example_content‘ and ‘another_example_content‘. The next section shows how to access this data in the Cloud Function.

The Cloud Function code

When you create a new Cloud Function, you can choose between a Node.js, Python or Go runtime. Google provides you with a code example for the runtime of your choice. I’ll give you an example that works with the Google Ads Script code example above for both Node.js and Python. I’m not familiar with Go, so I’ll skip that programming language if you don’t mind. 🙂

One thing worth mentioning is that the Cloud Function should always return a string. So keep this in mind when you’re writing your code.

Python

def hello_world(request):
"""Responds to any HTTP request.
Args:
request (flask.Request): HTTP request object.
Returns:
The response text or any set of values that can be turned into a
Response object using
`make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
"""
request_json = request.get_json()
if request_json and 'example_content' and 'another_example_content' in request_json:
example_content = request_json['example_content']
another_example_content = request_json['another_example_content']
# Do whatever you want with the data from the json
return str(example_content)
else:
return f'Error, no example_content or another_example_content found in json'

With Python, you can get to the ‘example_content’ from Google Ads through request.get_json().

Node.js

/**
* Responds to a HTTP request.
*
* @param {!express:Request} req HTTP request context.
* @param {!express:Response} res HTTP response context.
*/
exports.helloWorld = (req, res) => {
let example_content = req.body.example_content || 'Error, no example_content found in json';
let another_example_content = req.body.another_example_content || 'Error, no another_example_content found in json';
// Do whatever you want with the data from the json
res.status(200).send(example_content);
};

With Node.js, you can access the ‘example_content’ from Google Ads through the express.js request with req.body.

Example use case: fuzzy string matching

Let’s see if we can use the ideas above in practice with an example. Let’s say you want to classify incoming search terms based on whether they contain a word from a list of words. This list of words could, for example, be a list of brand names. In JavaScript this could be accomplished like this:

var search_terms = ['buy nike shoes','adidas online','rebok cloths']; // With some spelling errors
var brands = ['nike','adidas','reebok'];
for (var i=0; i<search_terms.length; i++) {
for (var j=0; j<brands.length; j++) {
if(search_terms[i].indexOf(brands[j]) !== -1) {
// Do whatever you want to do when the search term contains a brand name
}
}
}

However, this only works for exact matches. If the search term contains a spelling error it doesn’t work. We need fuzzy string matching for this. 

We could do this with JavaScript, as explained in this article, but there is an extra problem. We would use this code to compare two strings, such as “facebook” and “facebok”. But we want to find a string that might be part of a longer search query, such as “sign up for facebok online”. This requires us to compare the word we’re looking for to every part of this search query. These parts are called N-grams

A search query with 5 words consists of 5+4+3+2+1 = 15 N-grams. Our example search query consists of these N-grams: “sign”, “up”, “for”, “facebok”, “online”, “sign up”, “up for”, “for facebok”, “facebok online”, “sign up for”, “up for facebok”, “for facebok online”, “sign up for facebok”, “up for facebok online”, and finally “sign up for facebok online”.

Now you probably see why this could lead to problems with the maximum execution time. 😉 We would end up looping over every N-gram inside a loop over every search term. And if we’d be looking for multiple words we’d get a nested loop that is three layers deep. That’s not efficient at all! Can we find a more efficient way?

Requesting a Cloud Function from Google Ads

The code snippet below looks a lot like one of the examples above. Instead of example content, we filled the JSON object with some search terms and a list of brands. In the real world, these arrays could contain hundreds or even thousands of items.

// Add your own Cloud Function URL here:
var CLOUDFUNCTIONSURL = 'https://europe-west1-cloud-functions-example.cloudfunctions.net/example';
var search_terms = ["buy nike shoes", "adidas online", "rebok cloths"]; // With some spelling errors
var brands = ["nike", "adidas", "reebok"];
function main() {
var json = {
"searchterms": search_terms,
"brands": brands
};
var options = {
method: 'POST',
contentType: 'application/json',
payload: JSON.stringify(json)
};
var response = UrlFetchApp.fetch(CLOUDFUNCTIONSURL, options);
Logger.log(response);
}

A Cloud Function with FuzzyWuzzy

The Python package FuzzyWuzzy uses Levenshtein Distance to calculate the differences between strings. It also has a method that accepts an entire list (array) of items at once and that returns the item with a word that is most alike.

Importing FuzzyWuzzy

Before we can use FuzzyWuzzy in our Cloud Function, we must include it in our package.txt. To do this, add these two lines to package.txt:

python-Levenshtein>=0.12.0
fuzzywuzzy>=0.17.0

The Cloud Function code

Just as with the examples above, this code snippet accepts a request with a JSON object. This JSON object contains the two lists or arrays with search terms and brands. For each search term, it finds the brand that is most alike and returns that brand together with a score.

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
def fuzzy_flask(request):
"""Responds to a HTTP request.
Args:
request (flask.Request): HTTP request object.
Returns:
The response text or any set of values that can be turned into a
Response object using
`make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
"""
request_json = request.get_json()
if request_json and 'searchterms' in request_json and 'brands' in request_json:
return fuzzy_match_list(request_json['searchterms'], request_json['brands'])
else:
return f'No JSON with searchterms or brands found!'
def fuzzy_match_list (searchterms, brands):
output = {}
for term in searchterms:
highest = process.extractOne(term, brands, scorer=fuzz.token_set_ratio)
output[term] = highest
return str(output)

The result

Our example returns this output:

{'buy nike shoes': ('nike', 100), 'adidas online': ('adidas', 100), 'rebok cloths': ('reebok', 56)}

The search term ‘buy nike shoes’ matches with the brand Nike with a 100% score. The search term ‘rebok cloths’ on the other hand matches the brand Reebok, but with a 56% score. A higher score means a closer match.

Now, all that’s left is to process these results. I’ll leave it up to you to come up with ideas on how to use Cloud Functions together with Google Ads Scripts! 🙂

How to find the fastest section within a GPX file with Python & Jupyter Notebooks

One of the fun things when you like both data and running is that you can collect loads of data by running with a GPS tracker or a mobile phone. With apps such as Strava, Runkeeper or Endomondo you get all kinds of statistics about your workouts. I’ve been using Endomondo for several years now and when I found this really interesting article by Steven van Dorpe about how to analyze GPS data with Python I wanted to try doing some analysis for myself.

Specifically, in this blog we’ll go through the steps to find the best or fastest section within a workout. For example, if I ran 12 kilometers, started slow and finished fast, what was the time for the fastest 10 kilometers during that entire workout? And what was my fastest 10 kilometer this year for example? Let’s find out!

There is this specific chart from Endomondo that I like a lot: my personal best for a specific distance over time. The interesting thing about this graph is that it shows my personal best for a specific distance from within a longer workout.

Endomondo graph for my personal best for 5K over time.

Unfortunately this graph has some limitations, especially when you’re not a premium (paying) user of Endomondo. The free version only shows personal bests for 5 kilometers or 3 miles. PB’s for other distances are only available for premium users. (I leave it up to you to decide if you find it ethical to replicate some functionality from the premium features.) 😉 And interesting stats such as my PB’s for specific combinations are not available at all, such as my fastest 5K from workouts over 10K over the past 12 months. (Sidenote: I’ve been a paying user for quite some time in the past and I also bought the app for a fixed price many years ago.)

Parsing and preparing the data

Anyway, lets see if we can build on Steven’s instructions to find the fastest 5K, 10K, and so on within a workout. Endomondo and most other running apps let you download your workouts as GPX files. These files contain the data about your GPS location, altitude and time. I use Python and Jupyterlabs Notebooks (which is included with Anaconda) for parsing and analysing these files.

Let’s get coding! The first step is to include some packages:

import gpxpy
import matplotlib.pyplot as plt
import datetime
from geopy import distance
from math import sqrt, floor
import numpy as np
import pandas as pd
import haversine

Then declare some variables and parse the GPX file:

# All the sections you PB's for in meters:
sections = [1000,(1000*1.60934),3000,5000,(5000*1.60934),10000,15000,(10000*1.60934),20000,21097.5,25000,30000,40000,42195]
# The file you want to import:
file = 'my_run_001.gpx'
# This Pandas DataFrame will contain the final output of our analysis:
df_final = pd.DataFrame(columns=['time', 'distance', 'minutes_per_kilometer'])
gpx_file = open(file, 'r')
gpx = gpxpy.parse(gpx_file)
df = pd.DataFrame(columns=['lon', 'lat', 'alt', 'time'])

In Steven van Dorpe’s article, all data is added by using data = gpx.tracks[0].segments[0].points but I found that some GPX files contain multiple segments with relevant data, some don’t, some contain multiple segments where the last segment should be omitted, and so on. You should really inspect your data to see what data you need. For recent Endomondo GPX files I found that if the file contains multiple segments, I should import all segments except the last segment, unless the file contains only one segment, because then we need all segments.

For this blog let’s assume you can include all segments. Oh, and let’s also apply some sorting and filling to clean any problems in the data:

for segment in gpx.tracks[0].segments: # all segments
data = segment.points
for point in data:
df = df.append({'lon': point.longitude, 'lat' : point.latitude, 'alt' : point.elevation, 'time' : point.time}, ignore_index=True)
df = df.sort_values(by=['time'])
df = df.reset_index()
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')

So far we’ve been following the instructions from the article by Steven, but from here we’ll be doing some things different. Instead of looping over data to calculate distances I decided to stick with Pandas and use vectorization to try to speed things up. At this point we’re working on a single GPX file, but eventually we may want to loop over a folder with multiple GPX files (270 in my case) and speed suddenly becomes important (as we’ll see later). Using vectorization and the apply() method to calculate distances involved adding some extra columns to the DataFrame with values that are “shifted one row backwards”:

# Create a column with values that are 'shifted' one forwards, so we can create calculations for differences.
df['lon-start'] = df['lon']
df['lon-start'].iloc[-1] = np.nan
df['lon-start'] = np.roll(df['lon-start'], 1)
df['lat-start'] = df['lat']
df['lat-start'].iloc[-1] = np.nan
df['lat-start'] = np.roll(df['lat-start'], 1)
df['alt-start'] = df['alt']
df['alt-start'].iloc[-1] = np.nan
df['alt-start'] = np.roll(df['alt-start'], 1)
df['time-start'] = df['time']
df['time-start'].iloc[-1] = np.nan
df['time-start'] = np.roll(df['time-start'], 1)
df = df.fillna(method='bfill')

Some GPX files gave me a hard time about timezones. These lines fixed this:

df['time'] = pd.to_datetime(df['time'], utc=True)
df['time'] = df['time'].dt.tz_localize(tz=None)
df['time-start'] = pd.to_datetime(df['time-start'], utc=True)
df['time-start'] = df['time-start'].dt.tz_localize(tz=None)

And then the actual calculations for distances and time differences with Pandas apply(). The math behind this is all explained thoroughly in the article by Steven van Dorpe I mentioned earlier. If you’re at this point in this blog and you still haven’t read his article, I can definitely recommend you do this first and then continue with the code below:

# Calculate distances and time deltas
df['distance_dis_2d'] = df.apply(lambda x: distance.distance((x['lat-start'], x['lon-start']), (x['lat'], x['lon'])).m, axis = 1)
df['alt_dif'] = df.apply(lambda x: x['alt-start'] - x['alt'], axis=1)
df['distance_dis_3d'] = df.apply(lambda x: sqrt(x['distance_dis_2d']**2 + (x['alt_dif'])**2), axis=1)
df['time_delta'] = df.apply(lambda x: (x['time'] - x['time-start']).total_seconds(), axis=1)

Then as a final substep before the actual search for the fastest sections I create a new DataFrame that only contains the columns needed for further calculations to speed things up a little bit more. Here I also add two columns for cumulative sums for these columns.

df_selected = df.loc[:, ['distance_dis_3d','time_delta']]
df_selected['distance_cumsum'] = df_selected['distance_dis_3d'].cumsum()
df_selected['time_cumsum'] = df_selected['time_delta'].cumsum()

Finding the actual fastest section

Remember that at the beginning of this blog we created a list with sections? Now it’s time to loop over this list and find the fastest kilometer, fastest 5 kilometer, fastest 5 miles, and so on.

If the current section is longer than the total distance of the entire workout there is no need to do any further analysis and we can skip ahead to the next iteration.

To find the fastest section within the entire workout we must loop over the DataFrame.* For every row we locate the first row further ahead where the cumulative sum of the total distance minus the cumulative distance at this row is greater or equal to the section.

*) I know that looping over a DataFrame is not the best solution in terms of speed. However, I have not found a way to get the same results without this loop. Did you find a faster way to do this? Let me know! 🙂

That’s a lot to take in… 😉 So let me give you an example. Lets say we want to find the fastest 50 meters from the virtual workout below:

TimeDistance
0 seconds0 meters
10 seconds10 meters
20 seconds25 meters
30 seconds35 meters
40 seconds45 meters
50 seconds55 meters
60 seconds60 meters
70 seconds70 meters
80 seconds85 meters
90 seconds90 meters
  • At 0 seconds and 0 meters we find the first row that is 50 meters or more ahead, which is the row at 50 seconds and 55 meters. The speed over this section was (55 meter / 50 seconds) = 1.1 meter per second.
  • Then we start at the second row with 10 seconds and 10 meters and find the first row that is 50 meters or more ahead, which is the row at 60 seconds and 60 meters. The speed over this section was ((60 meter – 10 meter) / (60 seconds – 10 seconds)) = 1.0 meter per second.
  • Then we start at the third row with 20 seconds and 25 meters and find the first row that is 50 meters or more ahead, which is the row at 80 seconds and 85 meters. The speed over this section was ((80 meters – 25 meters) / (80 seconds – 20 seconds)) = 0.92 meters per second.
  • … and so on…

The first iteration with 1.1 meter per second is the fastest section, because it traveled the section of 50 meter with the highest average speed.

This is what this looks like in code:

# Here we loop over sections
for section in sections:
if df['distance_dis_3d'].sum() < section: # If the total distance of the workout is smaller then the section we're looking for we can skip this iteration.
continue
df_output = pd.DataFrame(columns=['date', 'section', 'filename', 'time', 'distance', 'minutes_per_kilometer', 'total_distance', 'total_time'])
for i in range(len(df_selected.index)):
df_section = df_selected[(df_selected['distance_cumsum'] - df_selected['distance_cumsum'].iat[i]) >= section]
if(len(df_section.index) != 0):
time = df_section['time_cumsum'].iat[0] - df_selected['time_cumsum'].iat[i]
distance_i = df_section['distance_cumsum'].iat[0] - df_selected['distance_cumsum'].iat[i]
minutes_per_kilometer = (time/60)/(distance_i/1000)
df_output = df_output.append({'date': df['time'].min(), 'section': section, 'filename': file, 'time': time, 'distance': distance_i, 'minutes_per_kilometer': minutes_per_kilometer, 'total_distance': df['distance_dis_3d'].sum(), 'total_time': df['time_delta'].sum()}, ignore_index=True)
s_best = df_output.loc[df_output['minutes_per_kilometer'].idxmin()]
df_final = df_final.append(s_best)
df_final['start_index_best_section'] = df_final.index
df_final = df_final.set_index(['filename','section'])

The result in df_final looks something like this:

The output of df_final
The best sections according to Endomondo
SectionCalculations based on timeEndomondo
1 km283 seconds = 4:434:42
1 mi468 seconds = 7:487:47
3 km887 seconds = 14:4714:45
5 km1485 seconds = 24:4524:43
10 km3031 seconds = 50:3150:31

As you can see, our calculations are pretty close, but not exactly the same. When you study the screenshot of the DataFrame more closely, the distance of each section is also just a little bit more than the actual section. So for the time it took to travel 3 km, I actually travelled 3.005 km. This would explain why we got one or two seconds more for each section. Let’s see if the results are better when we account for this by using the average speed over the fastest section for our calculations:

SectionCalculations based on minutes per kilometerEndomondo
1 km4.699604 * 1 = 4:424:42
1 mi4.839385 * 1.60934 = 7:477:47
3 km4.918145 * 3 = 14:4514:45
5 km4.944354 * 5 = 24:4324:43
10 km5.051423 * 10 = 50:3150:31

Now my calculations exactly match the results from Endomondo! Nice!

Working with multiple GPX files

So far we’ve used only a single GPX file, but for statistics like the fastest 10K over the past 12 months we need to import all workouts during that timeframe. For my personal project I wanted to include all my workouts. So that’s what I did!

The following code is a combination of all the code snippets above, but it also collects all GPX files included in a subfolder /tracks/ and loops over all of them. This might take a while to run though, so make sure not to include too many files.

import gpxpy
import matplotlib.pyplot as plt
import datetime
from geopy import distance
from math import sqrt, floor
import numpy as np
import pandas as pd
import haversine
import os
from os import listdir
from os.path import isfile, join
# All the sections you PB's for in meters:
sections = [1000,(1000*1.60934),3000,5000,(5000*1.60934),10000,15000,(10000*1.60934),20000,21097.5,25000,30000,40000,42195]
path = os.path.join(os.path.abspath(''), 'tracks')
allfiles = [f for f in listdir(path) if isfile(join(path, f))]
df_final = pd.DataFrame(columns=['time', 'distance', 'minutes_per_kilometer'])
for file in allfiles:
path = os.path.join(os.path.abspath(''), 'tracks', file)
gpx_file = open(path, 'r')
gpx = gpxpy.parse(gpx_file)
df = pd.DataFrame(columns=['lon', 'lat', 'alt', 'time'])
for segment in gpx.tracks[0].segments: # all segments
data = segment.points
for point in data:
df = df.append({'lon': point.longitude, 'lat' : point.latitude, 'alt' : point.elevation, 'time' : point.time}, ignore_index=True)
df = df.sort_values(by=['time'])
df = df.reset_index()
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')
# Create a column with values that are 'shifted' one forwards, so we can create calculations for differences.
df['lon-start'] = df['lon']
df['lon-start'].iloc[-1] = np.nan
df['lon-start'] = np.roll(df['lon-start'], 1)
df['lat-start'] = df['lat']
df['lat-start'].iloc[-1] = np.nan
df['lat-start'] = np.roll(df['lat-start'], 1)
df['alt-start'] = df['alt']
df['alt-start'].iloc[-1] = np.nan
df['alt-start'] = np.roll(df['alt-start'], 1)
df['time-start'] = df['time']
df['time-start'].iloc[-1] = np.nan
df['time-start'] = np.roll(df['time-start'], 1)
df = df.fillna(method='bfill')
df['time'] = pd.to_datetime(df['time'], utc=True)
df['time'] = df['time'].dt.tz_localize(tz=None)
df['time-start'] = pd.to_datetime(df['time-start'], utc=True)
df['time-start'] = df['time-start'].dt.tz_localize(tz=None)
df['distance_dis_2d'] = df.apply(lambda x: distance.distance((x['lat-start'], x['lon-start']), (x['lat'], x['lon'])).m, axis = 1)
df['alt_dif'] = df.apply(lambda x: x['alt-start'] - x['alt'], axis=1)
df['distance_dis_3d'] = df.apply(lambda x: sqrt(x['distance_dis_2d']**2 + (x['alt_dif'])**2), axis=1)
df['time_delta'] = df.apply(lambda x: (x['time'] - x['time-start']).total_seconds(), axis=1)
df_selected = df.loc[:, ['distance_dis_3d','time_delta']]
df_selected['distance_cumsum'] = df_selected['distance_dis_3d'].cumsum()
df_selected['time_cumsum'] = df_selected['time_delta'].cumsum()
# Here we loop over sections
for section in sections:
if df['distance_dis_3d'].sum() < section: # If the total distance of the workout is smaller then the section we're looking for we can skip this iteration.
continue
df_output = pd.DataFrame(columns=['date', 'section', 'filename', 'time', 'distance', 'minutes_per_kilometer', 'total_distance', 'total_time'])
for i in range(len(df_selected.index)):
df_section = df_selected[(df_selected['distance_cumsum'] - df_selected['distance_cumsum'].iat[i]) >= section]
if(len(df_section.index) != 0):
time = df_section['time_cumsum'].iat[0] - df_selected['time_cumsum'].iat[i]
distance_i = df_section['distance_cumsum'].iat[0] - df_selected['distance_cumsum'].iat[i]
minutes_per_kilometer = (time/60)/(distance_i/1000)
df_output = df_output.append({'date': df['time'].min(), 'section': section, 'filename': file, 'time': time, 'distance': distance_i, 'minutes_per_kilometer': minutes_per_kilometer, 'total_distance': df['distance_dis_3d'].sum(), 'total_time': df['time_delta'].sum()}, ignore_index=True)
s_best = df_output.loc[df_output['minutes_per_kilometer'].idxmin()]
df_final = df_final.append(s_best)
df_final['start_index_best_section'] = df_final.index
df_final = df_final.set_index(['filename','section'])

Final result

With the output data in df_final we can continue to make plots, extract data and so on. With just a few more lines of code I can for example find out how far I’ve ran on my current shoes so far (709 km), make scatter plots of distance and date (apparently I had a habit of stopping at exactly 10 kilometers in 2016) or make my own version of the Endomondo graph at the top of this blog (it’s not as fancy yet, but close).

All my workouts displayed as a scatter plot
Best 10 km’s per month

So this is it! We’ve looped over a DataFrame to find the fastest sections from a list of GPX files. Perhaps I’ll build on this script to build a nice dashboard for my running activities, but that’s something for later.

Some final thoughts

Probably my biggest “concern” with this entire endeavour is the contents of the GPX files compared to the Endomondo app and interface. As it turns out, the total distance of the workouts from the GPX files is just slightly lower compared to what Endomondo reports. And I know this discrepancy is not caused by my calculations, because I found out that if I would upload a GPX file back to Endomondo it would also have a lower total distance!

An example workout sums up to a maximum total distance of 13303 meters or 13.30 kilometers in my calculations. According to the Endomondo interface, this workout should actually be 13.32 kilometers. However, when I upload this GPX file back to Endomondo it would only be 13.29 kilometers. Weird…

Endomondo reports 13.32 km
Uploading the same workout would report 13.28 km

Also, I learned a lot about the importance of speed with this project. The original instructions by Steven van Dorpe were too slow for my use case (but very educational nonetheless). But even with all the vectorization and apply() methods that I could get to work in this project it will still take minutes or more to loop over multiple GPX files. So keep this in mind if you’re trying to build on my examples. 🙂

View your Google Ads creatives as a Google Optimize website experiment

The idea

Doing A/B experiments with text ads and other creatives in Google Ads is getting less important with the introduction of Responsive Search Ads. But that doesn’t mean that it’s gone completely. It’s still relevant to optimise text ads based on data, just as we should never stop optimising our websites. At Adwise, we usually do this with a Google Ads script that is an improved version of the A/B testing script by Russ Savage from Freeadwordsscripts.com or by simply calculating for a statistical significant difference manually. This works, but I feel it lacks a graphical component to clarify the outcome.

The Google Optimize interface on the other hand contains a beautiful graph to visualise the results of an experiment.

The Google Optimize interface, image by Google.

The graph shows the cumulative conversion rate of a variant together with a confidence interval over time. It provides a clear overview of the experiment and makes it easy to compare and interpret the statistical differences between each variant.

Wouldn’t this graph be perfect for other tests and experiments as well? What about using this graph to compare the results of different Google Ads text ads? I think it’s worth trying out, so I created a little “proof of concept” that looks like the image below. It’s also interactive! Try for yourself here!

An example graph made with Bokeh in Google Colab

I created this graph using Bokeh. I found a Bokeh example that provided a good starting point for building my own graph in a (Jupyter) Notebook. I also opted to use Google Colaboratory, because I could create my notebook online and have easy access to Google Drive. This is convenient, because this would open the possibility to use Google Sheets with the Google Analytics plugin to retrieve data about Google Ads text ads. If you’ve lost me here, I understand! I’ll go over the steps one by one below. The important thing to note here is that I use only freely available tools so you can recreate (and probably improve) my idea.

The math

Statistics was not my favourite subject in high school, but I ended up in an analytical job anyway. 😉 If I’m not mistaken, we should calculate a binomial confidence interval. A user can either click or not click. Or a user can either convert or not convert. So there are always two possible outcomes, which brings us to a binomial distribution.

Build your own version of this graph

Step 1: Copy the example spreadsheet

Make a copy of this spreadsheet:
https://docs.google.com/spreadsheets/d/1ZYHolhOfdcCzbhNLCAZfrJ1HPA2fbYu2StDDCOEGCt8/edit?usp=sharing
The copy should also include the Google Analytics plugin, which you can use to retrieve data about your Google Ads text ads from your Google Analytics account. You should change the value in cell B3 to your own Analytics profile ID and change the filter in cell B9 on the sheet named ‘Report Configuration’ to the name of an ad group in your Google Ads account. You can also make other changes if needed. Then run the report to refresh the data. If you’re not familiar using the Analytics plugin, you can read about it at the Google Developers website.

Step 2: Copy the Colab notebook

Then make copy of this Google Colaboratory notebook:
https://colab.research.google.com/drive/1N4aXN8hmWjU6YY7ycasizHlWiwxqgVAP
After making a copy check if the first cell contains the name of the spreasheet you created in step 1 and edit if necessary. Then you can run all the cells. At cell 3 you’ll be presented with a URL to authenticate and connect Google Drive. Click the URL and follow the steps. You’ll receive a code that you can paste in the new field that appeared in your notebook.

Step 3: check out your awesome graph.

If everything worked the way it should, you should see an interactive graph at the bottom of the notebook that looks similar to the graph in the screenshot above. There are probably easier ways to get the same kind of graph, but I chose to use free tools that require some extra steps. It’s not perfect, but I guess it’s a MVP that can be improved upon. 🙂

Anyway, I’m eager to hear what you think about using this type of graph to analyse the results of Google Ads text ads!

Adding Instant.page to Google Tag Manager

One of my colleagues at Adwise recently told me about Instant.page. By adding a small code snippet to your HTML, you get just in time preloading to decrease latency. It’s really easy to implement and free to use!

Of course we could manually add this code snipped to the page, but since we use Google Tag Manager for most of our clients we thought it would be even quicker and easier if we could just add the code snippet to a Custom HTML Tag in GTM. However, if we add the snippet and try to publish the container we get an error…

Tag Manager error

Luckily, there’s a blog post by Simo Ahava for almost every Google Tag Manager issue you can think of and this is no exception. About a year ago Simo wrote a blogpost about adding HTML Elements to the page programmatically and it introduces a workaround for the error message we got. Using his instructions, I rewrote the Instant.page snippet so it can be added to the page through a Custom HTML Tag. It looks like this:

<script>
(function() {
// Rewrite of the Instant.page <html> snippet for Google Tag Manager.
var el = document.createElement('script');
el.setAttribute('src', '//instant.page/1.2.2');
el.setAttribute('type', 'module');
el.setAttribute('integrity', 'sha384-2xV8M5griQmzyiY3CDqh1dn4z3llDVqZDqzjzcY+jCBCk/a5fXJmuZ/40JJAPeoU');
document.body.appendChild(el);
})();
</script>

It can then be added to Google Tag Manager like this:

Add the rewritten snippet to a Custom HTML Tag.

And the resulting rendered HTML looks like this (you might have to click the image to zoom in a bit):

rendered HTML
The resulting HTML.