最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

google bigquery - Connect to BQuery from Python, best practice - Stack Overflow

programmeradmin9浏览0评论

Best practice suggestions for this connection, seeking a good solution that can heavy lift medium sets of data into pandas dataframe. What is a boto3 equivalent for Bquery?

Best practice suggestions for this connection, seeking a good solution that can heavy lift medium sets of data into pandas dataframe. What is a boto3 equivalent for Bquery?

Share Improve this question asked Dec 6, 2024 at 19:30 sweezsweez 571 silver badge9 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 0

Google Cloud BigQuery Client Libraries is the nearest we can compare to Boto3, please see this link for the guide and how to’s. There is also a free trial that you can use to explore and test what you need here.

Make a service account json secret/file by:

  1. In console hover over IAM & Admin from left navbar, click service accounts
  2. For the service account desired, click the 3 dots to the right in the actions column, and select manage keys.
  3. Add key, this will allow you to generate the service json file that should look like this. Make sure the scopes line is added if it wasn't generated with the file.

service_account.json :

{
 "type": "service_account",
 "project_id": "projectid",
 "private_key_id": "xxxx",
 "private_key": "-----BEGIN PRIVATE KEY-----",
 "client_email": "serviceaccount.iam.gserviceaccount.com",
 "client_id": "xxxxx",
 "auth_uri": "https://accounts.google.com/o/oauth2/auth",
 "token_uri": "https://oauth2.googleapis.com/token",
 "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
 "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/xxxxx",
 "scopes": ["https://www.googleapis.com/auth/drive","https://www.googleapis.com/auth/cloud-platform","https://www.googleapis.com/auth/bigquery"]
}

Within a Python project:

  1. Required packages (google.cloud, google.oauth2). Reference for others possibly needed google-cloud packages
  2. Put the service_account.json in a directory. Pass the file to a credentials variable, utilize credentials for script

Sample Script

from google.oauth2 import service_account
from google.cloud import bigquery
import pandas as pd

# setup credentials for connecting to GCP
service_account_file_path = "Path to service_account.json"
gcp_credentials = service_account.Credentials.from_service_account_file(service_account_file_path)

# sample function for loading a pandas df to a bigquery table
def load_to_gbq(df):
    client = bigquery.Client(credentials=gcp_credentials)
    project_id = "project_id"
    dataset_id = "dataset_id"
    table_id = "table_id"

    table_ref = f"{project_id}.{dataset_id}.{table_id}"
    # Upload the DataFrame to the existing table
    job_config = bigquery.LoadJobConfig()
    job = client.load_table_from_dataframe(df, table_ref, job_config=job_config)
    job.result()
    return

# function for reading from a bigquery table into a df
def extract_data_gbq(start_date, end_date):
    client = bigquery.Client(credentials=gcp_credentials)
    project_id = "project_id"
    dataset_id = "dataset_id"
    table_id = "table_id"
    main_table_ref = f"{project_id}.{table_id}"
    query = f"""
            SELECT
            *
            FROM {main_table_ref}
            WHERE
            transaction_date BETWEEN '{start_date}' AND '{end_date}'
            """

    extract_job = client.query(query)
    extract_job_result = extract_job.result()
    df = extract_job_result.to_dataframe()
    return df

发布评论

评论列表(0)

  1. 暂无评论