Below is what I'm trying to accomplish:
- Get data from API
- Clean the data
- Insert the data into SQL Server
Issue that I'm facing:
- I convert the data that I receive from api to a pandas dataframe
response = requests.get(API_SCHEME_DATA, headers=headers, data=payload, timeout=20)
if response.status_code == 200:
data = response.json()["Data"]
df = pd.DataFrame(data, dtype=str)
- Now I, clean this data in a very simple step
# bigint list contains column names that I can treat as numeric
for col in bigint_list:
df[col] = pd.to_numeric(df[col], errors='coerce')
df[col] = df[col].fillna(0.0)
df = df.astype(str)
df = df.where(pd.notnull(df), None)
- I then use pyodbc connection to insert dataframe into SQL Server
# Prepare columns and placeholders for the SQL query
columns = [f"[{col}]" for col in df.columns]
placeholders = ', '.join(['?' for _ in range(len(df.columns))])
# SQL Insert query
insert_data_query = f"""
INSERT INTO {table_name} ({', '.join(columns)})
VALUES ({placeholders})
"""
# Convert the dataframe rows to a list of tuples for insertion
rows = [tuple(x) for x in df.replace({None: None}).values]
# Execute the insert query for multiple rows
connection_cursor.executemany(insert_data_query, rows)
connection_cursor.connectionmit()
- While inserting I get the below error:
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Error converting data type nvarchar to numeric. (8114) (SQLExecDirectW)')
- I believe, I have narrowed down the culprit to a parameter coming in the api
"CurrentValue": "0.00002"
- While inserting this parameter, the parameter value has been changed to scientific notation, and I can see the value being changed to
'2e-05'
. The column that I'm inserting in the SQL server is ofDECIMAL
type. And I believe this is why I'm getting the error.
What I have done so far to resolve:
- I have tried to suppress the scientific notation by:
pd.options.display.float_format = '{:.8f}'.format
and also tried to round off the columns like below during cleaning step:
# bigint list contains column names that I can treat as numeric
for col in bigint_list:
df[col] = pd.to_numeric(df[col], errors='coerce')
df[col] = df[col].fillna(0.0)
df[col] = df[col].round(10)
df = df.astype(str)
df = df.where(pd.notnull(df), None)
However, nothing seems to work and I'm still seeing the value converted to scientific notation. Any help would be appreciated.
Below is what I'm trying to accomplish:
- Get data from API
- Clean the data
- Insert the data into SQL Server
Issue that I'm facing:
- I convert the data that I receive from api to a pandas dataframe
response = requests.get(API_SCHEME_DATA, headers=headers, data=payload, timeout=20)
if response.status_code == 200:
data = response.json()["Data"]
df = pd.DataFrame(data, dtype=str)
- Now I, clean this data in a very simple step
# bigint list contains column names that I can treat as numeric
for col in bigint_list:
df[col] = pd.to_numeric(df[col], errors='coerce')
df[col] = df[col].fillna(0.0)
df = df.astype(str)
df = df.where(pd.notnull(df), None)
- I then use pyodbc connection to insert dataframe into SQL Server
# Prepare columns and placeholders for the SQL query
columns = [f"[{col}]" for col in df.columns]
placeholders = ', '.join(['?' for _ in range(len(df.columns))])
# SQL Insert query
insert_data_query = f"""
INSERT INTO {table_name} ({', '.join(columns)})
VALUES ({placeholders})
"""
# Convert the dataframe rows to a list of tuples for insertion
rows = [tuple(x) for x in df.replace({None: None}).values]
# Execute the insert query for multiple rows
connection_cursor.executemany(insert_data_query, rows)
connection_cursor.connectionmit()
- While inserting I get the below error:
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Error converting data type nvarchar to numeric. (8114) (SQLExecDirectW)')
- I believe, I have narrowed down the culprit to a parameter coming in the api
"CurrentValue": "0.00002"
- While inserting this parameter, the parameter value has been changed to scientific notation, and I can see the value being changed to
'2e-05'
. The column that I'm inserting in the SQL server is ofDECIMAL
type. And I believe this is why I'm getting the error.
What I have done so far to resolve:
- I have tried to suppress the scientific notation by:
pd.options.display.float_format = '{:.8f}'.format
and also tried to round off the columns like below during cleaning step:
# bigint list contains column names that I can treat as numeric
for col in bigint_list:
df[col] = pd.to_numeric(df[col], errors='coerce')
df[col] = df[col].fillna(0.0)
df[col] = df[col].round(10)
df = df.astype(str)
df = df.where(pd.notnull(df), None)
However, nothing seems to work and I'm still seeing the value converted to scientific notation. Any help would be appreciated.
Share Improve this question edited Mar 28 at 5:33 sdawar asked Mar 28 at 5:28 sdawarsdawar 3532 silver badges13 bronze badges 2 |1 Answer
Reset to default 1You're probably encountering issues with how Python's json
deserializes numbers from JSON responses. Consider the following example:
from httmock import all_requests, response, HTTMock
import pandas as pd
import requests
@all_requests
def response_content(url, request):
headers = { 'Content-Type': 'application/json' }
content = '{ "Data": [ { "CurrentValue": 0.00001 }, { "CurrentValue": 0.00002 } ] }'
return response(200, content, headers, None, 5, request)
with HTTMock(response_content):
response = requests.get('http://example./')
data = response.json()["Data"]
df = pd.DataFrame(data=data, dtype=str)
print(df)
Which outputs the DataFrame:
CurrentValue
0 1e-05
1 2e-05
To get more control over deserialization of the JSON response you can import the json
module and invoke json.loads()
directly, e.g.:
from httmock import all_requests, response, HTTMock
import json
import pandas as pd
import requests
@all_requests
def response_content(url, request):
headers = { 'Content-Type': 'application/json' }
content = '{ "Data": [ { "CurrentValue": 0.00001 }, { "CurrentValue": 0.00002 } ] }'
return response(200, content, headers, None, 5, request)
with HTTMock(response_content):
response = requests.get('http://example./')
data = json.loads(response.text, parse_float=None)["Data"]
df = pd.DataFrame(data=data)
print(df)
This outputs the DataFrame:
CurrentValue
0 0.00001
1 0.00002
df = df.astype(str)
? try removing that line and re-running. the basic idea of the error message is that you're trying to insert a string to a numeric column. – mechanical_meat Commented Mar 28 at 6:06INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({placeholders})
that looks a lot like injection; ideally you should be parametrising your queries (I assume Pandas is capable of that). – Thom A Commented Mar 28 at 9:02