How to use GA 360 data and BigQuery’s Geography functions to determine if proximity to a physical store influences online user behaviour
BigQuery’s Geographic functions provide a simple way to measure the distance between a session location and a physical location. We can use this data to analyse if conversion rate is influenced by proximity to a store.
Here we will walk you through how to run an analysis to test this for your own brand.
In order to run this test, you will need to have:
- A list of physical store locations with the latitude and longitude of each
- Google Analytics 360 (GA 360) session data, exported to BigQuery tables.
If you don’t know the latitude and longitude of your stores, there are free mapping websites that can identify them for you.
You will also need to know how to both import your store list into BigQuery, and how to run queries against data in BigQuery in order to run this analysis. If you’re unsure how to do either, Google has some easy to navigate walk throughs here.
Building the query
In this guide, we have built up the query in stages using a WITH clause to alias each sub-query. This allows us to explain the reasoning behind the method, making it easier to replicate.
However, we have also provided the full query at the end of the guide which can be copy and pasted with only minimal changes required, should you wish to head straight there.
Step 1 – Importing store locations to BigQuery
To run this analysis, you will need to have your list of store locations with latitude and longitude values in a BigQuery table, ensuring that the fields are labelled as latitude and longitude respectively. If you have this list of store locations in a local file, you can follow this guide to import it correctly.
Step 2 – Extracting the latitude and longitude values
The first sub-query you will need to do is extracting the latitude and longitude values for each store:
The data returned will look like the following:
Step 2 – Identifying session locations
Next, you need to extract the latitude and longitude of session locations. This data is available using the geoNetwork.latitude and geoNetwork.longitude fields in the GA360 BigQuery Export.
The first few results of this query look a little like this:
- In order to get a unique session id column we need to combine visitId and fullVisitorId which I have done by simply concatenating these values separated by a full-stop
- Occasionally no location can be found for a session which shows up as a value of ‘0.000‘ in the latitude & longitude fields in the export. These will need to be filtered out
- geoNetwork.latitude and geoNetwork.longitude fields are stored as strings so we CAST them to FLOAT64
- In this example we filtered the sessions to the year ending 30th November 2018.
Step 3 – Calculate the distance to the nearest store
Once you have extracted the latitude and longitude for physical locations and sessions, you will need to calculate the distance between each session and the nearest store. This can be calculated via BigQuery’s Geography functions using the following function:
Calculate the distance from each store to each session using a CROSS JOIN, and then GROUP BY sessionId taking the minimum of each calculated distance in order to find the nearest store as follows:
This will map out the sessionId to distance from the nearest store location:
Step 4 – Calculating the conversion rate
In this example, we have defined a conversion as any session with a transaction. However, you can choose what to use here according to the definition of conversion you would like to measure.
In our case we check if the transactions count is greater than zero using this expression:
IFNULL(totals.transactions, 0) > 0
Note that we have used the IFNULL operator in this example – this is because BigQuery returns a NULL value, rather than a zero in cases where a session has zero transactions.
Pull this in using an INNER JOIN from the original export, to the SessionLocations:
The result should look like the following:
Step 5 – Aggregating values by distance
There are two main ways you can aggregate these values by distance:
- You can manually choose aggregation ‘buckets’ based on human intuition. For example, you may want to test if there is a difference in conversion rate for sessions within five miles of a store
- You can divide the data up into segments automatically, plot it out and identify any trends.
Manually chosen distance buckets
For this method, you need to:
- Define a sub-query with our distance buckets and their thresholds
- Do a CROSS JOIN to our session conversion data then filter rows where the distance is outside the threshold
- GROUP BY ‘bucket’, then COUNT the sessions and SUM the conversions in order to calculate the conversion rate.
It is important to ensure that you only allocate each session to one distance bucket; it’s easy to cross check the number of rows returned from this query against the ConveredByDistance sub-query to make sure this is the case.
This query will return your final results:
|1. Within 1 Miles||13453322||421148||0.031304|
|2. Between 1 and 5 Miles||1457340||45268||0.031062|
|3. Between 5 and 10 Miles||2638155||89445||0.033904|
|4. Between 10 and 25 Miles||3197742||104457||0.032666|
|5. Over 25 Miles||429943||14556||0.033856|
Automatically calculated distance buckets
This method has a different approach to deriving the Thresholds sub query:
- Use BigQuery’s APPROX_QUANTILES() function to split your set of distances into quantiles. For this example I have split it into 32 quantiles
- UNNEST this array and use the RANK() function to number each row
- Having numbered each row, you can now INNER JOIN this sub-query to itself with a row offset. This will enable you to get an upper-bound and a lower-bound value from the set of quantiles
- Continue as before replacing the manually constructed Thresholds with your calculated ones.
For this approach, use the following Thresholds sub query:
Note that we replace the first lower bound with zero and the last upper bound by infinity to ensure we do not miss any values as the original values returned from APPROX_QUANTILES are only approximate min and max values.
This query returns a table similar to the output from the manually chosen distance buckets, but has more data which allows you to visualise conversion rate by distance by plotting it on a scatter chart like this:
In this example, there was no clear trend between store proximity and conversion rate. However, it’s feasible that a trend may be spotted; for example, we might see online conversion rates drop as sessions become closer to physical stores with users more likely to pop into the store. Or, we might see higher conversion rates for sessions carried out near a store, as people see the physical store and decide to look online when they get home.
Either way, carrying out this analysis gives you the data to understand if you should remarket to these users differently or even tailor their on-site experience giving more prominence to in-store features or calls-to-action.
The complete query
As promised, here is the complete query you can use to calculate whether proximity to a physical store influences session behaviour.
In order to use this query, you will need to replace the table identifiers with your own table identifiers and uncomment the correct section for the Thresholds sub-query depending on whether you would like to use automatically calculated distance buckets or to define your own. You may also want to filter for certain dates.