Using the solution found in this thread:
I'm able to parse the top level elements in the following schema (Google DocumentAI response):
ARRAY<
STRUCT<
mentionText STRING,
id INTEGER,
type STRING,
confidence NUMERIC,
properties ARRAY<
STRUCT<
normalizedValue STRUCT<
text STRING
>,
mentionText STRING,
id INTEGER,
type STRING,
confidence NUMERIC
>
>
>
>
with this query:
EXECUTE IMMEDIATE '''SELECT ''' || (SELECT STRING_AGG(DISTINCT "MAX(IF(e.type = '" || e.type || "', e.mentionText, NULL)) AS " || e.type),FROM dataset.table, UNNEST(entities) as e) || '''FROM dataset.table, UNNEST(entities) as e'''
to transform this JSON:
{
"entities":
[
{
"id": "0",
"mentionText": "John",
"type": "first_name"
},
{
"id": "1",
"mentionText": "Smith",
"type": "last_name"
},
{
"id": "2",
"properties":
[
{
"id": "3",
"mentionText": "45.00",
"type": "amount"
},
{
"id": "4",
"mentionText": "€",
"type": "currency"
}
],
"type": "transactions"
},
{
"id": "5",
"properties":
[
{
"id": "6",
"mentionText": "128.00",
"type": "amount"
},
{
"id": "7",
"mentionText": "€",
"type": "currency"
}
],
"type": "transactions"
},
]
}
into this:
row_num | first_name | last_name |
---|---|---|
0 | John | Smith |
1 | Someone | Else |