I have a set of CRITICAL tables in BigQuery that are getting loaded hourly by DAGS.
I have been tasked to develop a standalone solution to check the following:
Are the tables present ?[ There are chances that the tables may get deleted by operations team]
if The table is present, is the table getting loaded on time ?
If the table is getting loaded, is there a difference in the size during consecutive runs[ The table is expected to increase in size]
If any of the above checks fails, the operations team has to be notified as soon as possible.
Can someone suggest a solution( probably a service or list of services) for the above requirement ?
I have a set of CRITICAL tables in BigQuery that are getting loaded hourly by DAGS.
I have been tasked to develop a standalone solution to check the following:
Are the tables present ?[ There are chances that the tables may get deleted by operations team]
if The table is present, is the table getting loaded on time ?
If the table is getting loaded, is there a difference in the size during consecutive runs[ The table is expected to increase in size]
If any of the above checks fails, the operations team has to be notified as soon as possible.
Can someone suggest a solution( probably a service or list of services) for the above requirement ?
Share Improve this question edited Mar 8 at 13:08 Sourav Dutta 4863 silver badges9 bronze badges asked Mar 4 at 22:47 Kumar VeerappanKumar Veerappan 511 silver badge7 bronze badges1 Answer
Reset to default 0Part one: Building queries:
- The existing of a table can be checked by table schema
- You need to query every table and look for changes
- The storage size for the last days is here INFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE
Select *
from `region-us`.INFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE
For the alerting use one the following:
- use Looker Studio pro for checking this queries
- use a BigQuery scheduled query and throw an error :
Select if( condition, ERROR("missing table"),"ok")
Then build a email forwarding in your mail program. - use a BigQuery scheduled query and if a condition is fullfilled call a Cloud Function. Here you can use python etc. to trigger an email. A solution is also to write to Firestore and a Firebase trigger function informs the user.
- Cloud run to do everthing in python etc. is also possible