Enterprise-level reports often need to be summarized by quarter/year. How can we make that work if we’re starting with a dataset in BigQuery
and we want to create a chart in Google Data Studio?
Let’s start with the nyc-tlc:yellow.trips public dataset in BigQuery. At the time of this writing, Data Studio is not yet compatible with queries written in Standard SQL. So even though we can write Standard SQL queries in the regular BigQuery UI, we can’t yet connect a Standard SQL query to Data Studio.
So, we’ll work with BigQuery’s original Legacy SQL dialect that is compatible with Data Studio.
Typically, we might groups results by creating “quarter” and “year” columns, like this:
SELECT QUARTER(pickup_datetime) AS quarter, YEAR(pickup_datetime) AS year, COUNT(*) as num_trips FROM [nyc-tlc:yellow.trips] GROUP BY quarter, year ORDER BY year, quarter
That works fine if we’re exporting to Excel or Google Sheets. However, if we try to use this query in Data Studio we’ll run into some trouble. Data Studio expects a single column to provide all time-series date information, but we currently have two columns providing that information.
With this query, Data Studio will allow us to group results by year or by quarter, but not by both.
For example, if we choose “year” as our time dimension…
… we’ll see our results grouped only by year, like this:
Not good. And we’ll have a similar problem if we choose “quarter” instead of “year” in the Dimension Picker – the results will just be aggregated by quarter, and we’ll have no insight into year-over-year trends.
To group results by quarter and year, we’ll need to modify our query.
Here are two methods that will work in Data Studio:
1. Group results by day in the query, and let Data Studio group those results by quarter/year
This method works well if we’re totaling up volume by quarter. We can feed Data Studio a table that lists volume per day, and Data Studio is happy to further group those results by quarter for us.
SELECT DATE(pickup_datetime) as date, COUNT(*) as num_trips FROM [nyc-tlc:yellow.trips] GROUP BY date ORDER BY date
The query creates a table of results that is grouped by date, so we just need to tell Data Studio we’d like the data further grouped by quarter.
This is easily done in the Fields settings – just click on the Type settings for the date field, and change it from “Date (YYYYMMDD)” to “Year Quarter”:
Our results are now grouped by quarter/year!
2. Calculate the “quarter start” date for each row
If we’re not totaling up volume by quarter but instead doing calculations inside our query to produce percentages, then method #1 isn’t optimal because we don’t want Data Studio to take daily percentages and then average them out into a “quarterly” average. That would produce an “average of averages”, and that’s usually a bad idea.
So, if our query is going to produce a table of averages, we need to do all grouping inside the query itself and provide Data Studio with a single row per quarter.
One solution is to take each row in the source trips table and get the first day of the quarter in which that rows falls.
In other words, we want to group results such that the date is formatted as the first date from a quarter:
As an example, a row with a
2015-07-22 00:00:00 would have a “quarter start date” of
2015-07-01, since that is the first day of the quarter in which the datetime falls.
By grouping results in this way, all results that fall in the same quarter will have the same quarter start date.
But, given any potential random datetime, how can we calculate the quarter start date?
Use the CONCAT function to build a string that looks like a date:
SELECT CONCAT(STRING(YEAR(pickup_datetime)),"-",STRING((QUARTER(pickup_datetime)*3)-2),"-01") as quarter_start_date
We’re feeding CONCAT four separate strings:
The meaty part of that CONCAT function is the third part. Working from the inside out, here is what’s happening:
- Step 1: Start with
QUARTER(pickup_datetime)and multiply by 3. (This returns either 3, 6, 9, or 12.)
- Step 2: Subtract 2. (This returns 1, 4,7, or 10.)
- Step 3: Turn that number into a
STRING(so that we can use it in the CONCAT function).
The result is that we’re left with either YYYY-1-01, YYYY-4-01, YYYY-7-01, or YYYY-10-01. Perfect!
We’ve built a string that looks like a date, formatted as
YYYY-M-01. Data Studio is smart enough to realize that we’re trying to provide it with dates, and so it reads that string and handles the type conversion for us.
Once the query is in place, we just tell Data Studio once again that we’d like to format these dates as “Year Quarter”: