Learn BigQuery!

Assuming Standard SQL is known, this material aims to provide a path to master SQL in semi-structured nested data.


Arrays

A little note before we start: It is expected you follow along in BigQuery so you see the output of all the queries shown here.

Knowing how to deal with nested arrays is key when using BigQuery! But what is an array? It’s an ordered list of things - as per convention you list these things between [ and ]. Things you can list in an array:

  • Integers: [1, 4, 832]
  • Strings: ['a', 'b', 'c', 'xyz', 'Hallo!' ]
  • Booleans: [true, true, false, true ]
  • Records (STRUCTs) and within them:
    • all of the above: [ { int : 1, bool: false }, { int : 25, bool: false }, { int : 620, bool: true } ]
    • arrays [ { arr : [5, 2, 9] }, { arr : [7, 2] }, { arr : [15, 0, 94] } ]

    Let’s create all examples from above in one query:

SELECT
  [1, 4, 832] as integers,
  ['a', 'b', 'c', 'xyz', 'Hallo!'] as strings,
  [true, true, false, true ] as booleans,
  [
    STRUCT(1 as int, false as bools), 
    STRUCT(25, false), 
    STRUCT(620, true) 
  ] as structs1,
  [
    STRUCT( [5, 2, 9] as arr ), 
    STRUCT( [7, 2] ), 
    STRUCT( [15, 0, 94] ) 
  ] as structs2

Try it for yourself! Have a look at the JSON representation as well!

Accessing Elements in Arrays

The easiest way to access an element is by using either OFFSET() which is zero-based or ORDINAL() which is one-based. Run the following code:

WITH t AS (
  SELECT 1 AS id, [1,2,3] AS arr
  UNION ALL
  SELECT 2 AS id, [4,5,6] AS arr
  UNION ALL
  SELECT 3 AS id, [42] AS arr
  )
   
SELECT
  id,
  arr[OFFSET(0)] firstElem,
  arr[ORDINAL(1)] firstElemOrd
FROM t
Row id firstElem firstElemOrd
1 1 1 1
2 2 4 4
3 3 42 42

Safe Offset - Avoid “Out of Bounds” Error

If you tried to access an object that is not there, you’ll get this error: Error: Array index 2 is out of bounds (overflow) To avoid that you can either change your method to access that element and/or use the prefix SAFE_ with your OFFSET function:

  • SAFE_OFFSET()
  • SAFE_ORDINAL()
    SELECT
    id,
    arr[SAFE_OFFSET(1)] off,
    arr[SAFE_ORDINAL(2)] ord
    FROM t
    

    which is results in NULL values for not existing elements.

Row id off ord
1 1 2 2
2 2 5 5
3 3 null null

Getting the Last Element

You might think of getting the length from that array and use it as an index for ORDINAL - and yes, this does work:

SELECT
  id,
  arr[ ORDINAL( ARRAY_LENGTH(arr) ) ] lastElem
FROM t

But you can also reverse the array and take the first element:

SELECT
  id,
  ARRAY_REVERSE(arr)[OFFSET(0)] lastElem
FROM t

Try it! But be careful with reversing arrays, it slows down your query execution.

Concatenate Arrays

Concatenating arrays is achieved by using the function ARRAY_CONCAT_AGG() or ARRAY_CONCAT(), depending on whether you want to aggregate rows (as in GROUP BY) or operate without aggregation of rows:

WITH t AS (
  SELECT 1 AS id, [1,2,3] AS arr
  UNION ALL
  SELECT 2 AS id, [4,5,6] AS arr
  UNION ALL
  SELECT 3 AS id, [42] AS arr
  )
   
SELECT
  ARRAY_CONCAT_AGG( arr)  as concArr
FROM t

The aggregation function has some nice features that might come in handy:

ARRAY_CONCAT_AGG(expression  [ORDER BY key [{ASC|DESC}] [, ... ]]  [LIMIT n])

You can order the way arrays get concatenated and also limit the amount of arrays.

SELECT
  ARRAY_CONCAT_AGG( arr)  as concArr,
  ARRAY_CONCAT_AGG( arr ORDER BY ARRAY_LENGTH(arr) ASC)  as concArr2,
  ARRAY_CONCAT_AGG( arr ORDER BY ARRAY_LENGTH(arr) ASC LIMIT 2)  as concArr2_2
FROM t