Grouping BigQuery Data by Quarter in a Google Data Studio Chart

Enterprise-level reports often need to be summarized by quarter/year. How can we make that work if we’re starting with a dataset in BigQuery
and we want to create a chart in Google Data Studio?

Let’s start with the nyc-tlc:yellow.trips public dataset in BigQuery. At the time of this writing, Data Studio is not yet compatible with queries written in Standard SQL. So even though we can write Standard SQL queries in the regular BigQuery UI, we can’t yet connect a Standard SQL query to Data Studio.

So, we’ll work with BigQuery’s original Legacy SQL dialect that is compatible with Data Studio.

Typically, we might groups results by creating “quarter” and “year” columns, like this:

SELECT
 QUARTER(pickup_datetime) AS quarter,
 YEAR(pickup_datetime) AS year,
 COUNT(*) as num_trips
FROM
 [nyc-tlc:yellow.trips]
GROUP BY
 quarter,
 year
ORDER BY
 year,
 quarter

The output:

That works fine if we’re exporting to Excel or Google Sheets. However, if we try to use this query in Data Studio we’ll run into some trouble. Data Studio expects a single column to provide all time-series date information, but we currently have two columns providing that information.

With this query, Data Studio will allow us to group results by year or by quarter, but not by both.

For example, if we choose “year” as our time dimension…

… we’ll see our results grouped only by year, like this:

Not good. And we’ll have a similar problem if we choose “quarter” instead of “year” in the Dimension Picker – the results will just be aggregated by quarter, and we’ll have no insight into year-over-year trends.

To group results by quarter and year, we’ll need to modify our query.

Here are two methods that will work in Data Studio:

1. Group results by day in the query, and let Data Studio group those results by quarter/year

This method works well if we’re totaling up volume by quarter. We can feed Data Studio a table that lists volume per day, and Data Studio is happy to further group those results by quarter for us.

The query:

SELECT
 DATE(pickup_datetime) as date,
 COUNT(*) as num_trips
FROM
 [nyc-tlc:yellow.trips]
GROUP BY
 date
ORDER BY
 date

The query creates a table of results that is grouped by date, so we just need to tell Data Studio we’d like the data further grouped by quarter.

This is easily done in the Fields settings – just click on the Type settings for the date field, and change it from “Date (YYYYMMDD)” to “Year Quarter”:

Our results are now grouped by quarter/year!

2. Calculate the “quarter start” date for each row

If we’re not totaling up volume by quarter but instead doing calculations inside our query to produce percentages, then method #1 isn’t optimal because we don’t want Data Studio to take daily percentages and then average them out into a “quarterly” average. That would produce an “average of averages”, and that’s usually a bad idea.

So, if our query is going to produce a table of averages, we need to do all grouping inside the query itself and provide Data Studio with a single row per quarter.

One solution is to take each row in the source trips table and get the first day of the quarter in which that rows falls.

In other words, we want to group results such that the date is formatted as the first date from a quarter:

  • YYYY-01-01
  • YYYY-04-01
  • YYYY-07-01
  • YYYY-10-01

As an example, a row with a pickup_datetime of 2015-07-22 00:00:00 would have a “quarter start date” of 2015-07-01, since that is the first day of the quarter in which the datetime falls.

By grouping results in this way, all results that fall in the same quarter will have the same quarter start date.

But, given any potential random datetime, how can we calculate the quarter start date?

Use the CONCAT function to build a string that looks like a date:

SELECT
 CONCAT(STRING(YEAR(pickup_datetime)),"-",STRING((QUARTER(pickup_datetime)*3)-2),"-01") as quarter_start_date

We’re feeding CONCAT four separate strings:

  1. STRING(YEAR(pickup_datetime))
  2. "-"
  3. STRING((QUARTER(pickup_datetime)*3)-2)
  4. "-01"

The meaty part of that CONCAT function is the third part. Working from the inside out, here is what’s happening:

  • Step 1: Start with QUARTER(pickup_datetime) and multiply by 3. (This returns either 3, 6, 9, or 12.)
  • Step 2: Subtract 2. (This returns 1, 4,7, or 10.)
  • Step 3: Turn that number into a STRING (so that we can use it in the CONCAT function).

The result is that we’re left with either YYYY-1-01, YYYY-4-01, YYYY-7-01, or YYYY-10-01. Perfect!

We’ve built a string that looks like a date, formatted as YYYY-M-01. Data Studio is smart enough to realize that we’re trying to provide it with dates, and so it reads that string and handles the type conversion for us.

Once the query is in place, we just tell Data Studio once again that we’d like to format these dates as “Year Quarter”:

Leave a Reply

Your email address will not be published. Required fields are marked *