Request your help and guidance on the following queries.
Scenario:
Trying to create a simple python script that utilizes pyodbc
and works with various datasources say sqlserver, azuresql, snowflake etc. Basically any source that supports ODBC connections. We have an issue when trying to load from SQL Server to Snowflake. The source contains a column whose datatype is varchar(max)
. Here are the issues/questions encountered.
Questions:
- Does Snowflake ODBC Driver support
fast_executemany
? Not able to find documentation that supports this. - If
fast_executemany
is set toTrue
, I am getting aMemoryError
. Of course I have seen various issues and articles that discusses this, but none of the approaches tried seem to fix this. For example have tried bothsnow_cursor.setinputsizes([(pyodbc.SQL_WVARCHAR, 0, 0)])
andsnow_cursor.setinputsizes([(pyodbc.SQL_WVARCHAR, 16777216, 0)])
. Both are failing. - If
fast_executemany
is set to False, the records are getting inserted one by one which is painfully slow.
What would be the right approach to fix the issue.
Sample Code:
import pyodbc
# Snowflake connection parameters
print("Starting script execution...")
# Snowflake connection parameters
conn_params = {
'DRIVER': 'SnowflakeDSIIDriver',
'SERVER': '<account>.snowflakecomputing',
'DATABASE': '<database>',
'SCHEMA': '<schema>',
'WAREHOUSE': '<warehouse>',
'ROLE': '<role>',
'AUTHENTICATOR': 'snowflake_jwt',
'PRIV_KEY_FILE': '<key_file_path>',
'PRIV_KEY_FILE_PWD': '<key_password>',
'UID': '<username>',
'CLIENT_SESSION_KEEP_ALIVE': 'TRUE'
}
print("Connection parameters defined...")
# SQL Server connection parameters
sql_params = {
'DRIVER': '{ODBC Driver 18 for SQL Server}',
'SERVER': '<server>',
'DATABASE': '<database>',
'INSTANCE': '<instance>',
'ENCRYPT': 'yes',
'TRUSTSERVERCERTIFICATE': 'yes',
'CONNECTION_TIMEOUT': '30',
'UID': '<username>',
'PWD': '<password>'
}
# Create connection strings
snow_conn_str = ';'.join([f"{k}={v}" for k, v in conn_params.items()])
sql_conn_str = ';'.join([f"{k}={v}" for k, v in sql_params.items()])
try:
# Connect to SQL Server
sql_conn = pyodbc.connect(sql_conn_str)
sql_cursor = sql_conn.cursor()
# Connect to Snowflake
snow_conn = pyodbc.connect(snow_conn_str)
snow_cursor = snow_conn.cursor()
snow_cursor.fast_executemany = False #True
# snow_cursor.setinputsizes([(pyodbc.SQL_WVARCHAR, 0, 0)])
snow_cursor.setinputsizes([(pyodbc.SQL_WVARCHAR, 16777216, 0)])
# Prepare insert query
insert_query = """
INSERT INTO SNOWFLAKE_TABLE
(COL_01, COL_02, COL_03, COL_04,
COL_05, COL_06, COL_07, COL_08, COL_09)
VALUES (?,?,?,?,?,?,?,?,?)
"""
# Source query
source_query = "SELECT top 1000 * FROM <source_table> with (nolock)"
sql_cursor.execute(source_query)
print("SQL query executed successfully")
batch_size = 1000
total_rows = 0
while True:
# Fetch batch of rows
rows = sql_cursor.fetchmany(batch_size)
print(f"Fetched {len(rows) if rows else 0} rows from SQL Server")
if not rows:
break
# Insert batch into Snowflake
snow_cursor.executemany(insert_query, rows)
print(f"Executed batch insert of {len(rows)} rows to Snowflake")
snow_connmit()
print("Committed changes to Snowflake")
total_rows += len(rows)
print(f"Inserted {len(rows)} rows. Total rows processed: {total_rows}")
print(f"Successfully completed. Total rows inserted: {total_rows}")
except pyodbc.Error as e:
print(f"ODBC Error: {str(e)}")
import traceback
print(traceback.format_exc())
raise # Re-raise to see full error chain
except Exception as f:
print(f"Unexpected error: {str(f)}")
import traceback
print(traceback.format_exc())
raise # Re-raise to see full error chain
finally:
# Close all connections
for cursor in [sql_cursor, snow_cursor]:
if cursor in locals():
cursor.close()
for conn in [sql_conn, snow_conn]:
if conn in locals():
conn.close()
Thanks.. Cheers..