Best practice suggestions for this connection, seeking a good solution that can heavy lift medium sets of data into pandas dataframe. What is a boto3 equivalent for Bquery?
Best practice suggestions for this connection, seeking a good solution that can heavy lift medium sets of data into pandas dataframe. What is a boto3 equivalent for Bquery?
Share Improve this question asked Dec 6, 2024 at 19:30 sweezsweez 571 silver badge9 bronze badges2 Answers
Reset to default 0Google Cloud BigQuery Client Libraries is the nearest we can compare to Boto3, please see this link for the guide and how to’s. There is also a free trial that you can use to explore and test what you need here.
Make a service account json secret/file by:
- In console hover over IAM & Admin from left navbar, click service accounts
- For the service account desired, click the 3 dots to the right in the actions column, and select manage keys.
- Add key, this will allow you to generate the service json file that should look like this. Make sure the
scopes
line is added if it wasn't generated with the file.
service_account.json
:
{
"type": "service_account",
"project_id": "projectid",
"private_key_id": "xxxx",
"private_key": "-----BEGIN PRIVATE KEY-----",
"client_email": "serviceaccount.iam.gserviceaccount.com",
"client_id": "xxxxx",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/xxxxx",
"scopes": ["https://www.googleapis.com/auth/drive","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/bigquery"]
}
Within a Python project:
- Required packages (
google.cloud
,google.oauth2
). Reference for others possibly needed google-cloud packages - Put the
service_account.json
in a directory. Pass the file to acredentials
variable, utilizecredentials
for script
Sample Script
from google.oauth2 import service_account
from google.cloud import bigquery
import pandas as pd
# setup credentials for connecting to GCP
service_account_file_path = "Path to service_account.json"
gcp_credentials = service_account.Credentials.from_service_account_file(service_account_file_path)
# sample function for loading a pandas df to a bigquery table
def load_to_gbq(df):
client = bigquery.Client(credentials=gcp_credentials)
project_id = "project_id"
dataset_id = "dataset_id"
table_id = "table_id"
table_ref = f"{project_id}.{dataset_id}.{table_id}"
# Upload the DataFrame to the existing table
job_config = bigquery.LoadJobConfig()
job = client.load_table_from_dataframe(df, table_ref, job_config=job_config)
job.result()
return
# function for reading from a bigquery table into a df
def extract_data_gbq(start_date, end_date):
client = bigquery.Client(credentials=gcp_credentials)
project_id = "project_id"
dataset_id = "dataset_id"
table_id = "table_id"
main_table_ref = f"{project_id}.{table_id}"
query = f"""
SELECT
*
FROM {main_table_ref}
WHERE
transaction_date BETWEEN '{start_date}' AND '{end_date}'
"""
extract_job = client.query(query)
extract_job_result = extract_job.result()
df = extract_job_result.to_dataframe()
return df