Learn BigQuery!

Assuming Standard SQL is known, this material aims to provide a path to master SQL in semi-structured nested data.


Subqueries on a table

A little note before we start: It is expected you follow along in BigQuery so you see the output of all the queries shown here.

Preparation

First we need to create a table on which we can run queries. For that we use a WITH statement to create a temporary table from a select.

Our base query looks like this:

WITH fruitColors AS (
  SELECT * FROM UNNEST([
    STRUCT('orange' as color, [struct('lemon' as fruit, 3 as amount), ('papaya', 4)] as fruits),
    ('green', [('apple', 5), ('avocado', 8)]),
    ('red', [('apple', 5),('cherry', 43)])
  ])
)

SELECT * FROM fruitColors

Note, how we only use the function struct() for the first struct to provide names.

You can save the result of this query as a table in your project and dataset of choice to get easier access. For future queries I will simply assume the WITH statement and not write it down anymore!

Simple Queries

First, let’s try and see how to retrieve the whole array:

SELECT  
  color,
  fruits
FROM fruitColors

As simple as that! Now, what if we only want one field from the array - something like this doesn’t work

SELECT  
  color,
  fruits.fruit
FROM fruitColors

… because the array itself only consists of single elements: structs. If we want a sub-field from a struct, we’d have to select a struct first. As in:

SELECT  
  color,
  fruits[SAFE_OFFSET(0)].fruit
FROM fruitColors

We select one struct from the array and retrieve their sub-field. But we want all of them!

To do that we need to run SQL on the array and select the field this way. UNNEST() will turn the array into rows. After selecting fruit from these rows we need to feed them back into an array using the function array()!

SELECT  
  color,
  array(select fruit from unnest(fruits)) AS fruits
FROM fruitColors

You’ll see in the results that this gives us an array of strings!

If we want an array of structs (containing the strings) we simply select as struct in the sub-query!

SELECT  
  color,
  array(select as struct fruit from unnest(fruits)) AS fruits
FROM fruitColors

Pretty cool, no? But instead of reducing information, we can also produce new information out of what we already have!

For example we can uppercase the fruit and calculate the share

SELECT  
  color,
  array(select as struct 
      *,
      upper(fruit) bigFruit,
      round( amount / sum(amount) over (), 2) as share
    from unnest(fruits)
  ) AS fruits
FROM fruitColors

Window functions are no problem at all!