I wanted to load some unstructured json files having max size 200gb to bigquery without using any etl tools, I want a simple solution to transform the data from gcs to proper structured json format and some other custom transform logics to implement before loading into bigquery. The challenge is without using any high computing resource and etl tool how to achieve this
I wanted to load some unstructured json files having max size 200gb to bigquery without using any etl tools, I want a simple solution to transform the data from gcs to proper structured json format and some other custom transform logics to implement before loading into bigquery. The challenge is without using any high computing resource and etl tool how to achieve this
Share Improve this question asked Jan 20 at 14:16 rahrah 231 silver badge9 bronze badges 2- cloud.google.com/bigquery/docs/… – Damião Martins Commented Jan 20 at 17:02
- What is an unstructured JSON file? I have always thought of JSON as being "semi structured" meaning that it is syntactically well formed but the content of a document doesn't have to conform to a specific schema. – Kolban Commented Jan 21 at 15:12
1 Answer
Reset to default 0The idea is to break 200GB into smaller pieces then use Cloud functions, the way I see it is for you to break it by deploying a Cloud Run (it has a memory cap of 16GB) to split it or manually breaking it. Then, use a Cloud Function to transform the data so you can load it to BigQuery.