python - Unable to suppress scientific notation in datarame, which in turn is causing pyodbc.ProgrammingError "Error co

Below is what I'm trying to accomplish:

Get data from API
Clean the data
Insert the data into SQL Server

Issue that I'm facing:

I convert the data that I receive from api to a pandas dataframe

response = requests.get(API_SCHEME_DATA, headers=headers, data=payload, timeout=20)
if response.status_code == 200:
    data = response.json()["Data"]
    df = pd.DataFrame(data, dtype=str)

Now I, clean this data in a very simple step

# bigint list contains column names that I can treat as numeric
for col in bigint_list:
    df[col] = pd.to_numeric(df[col], errors='coerce')
    df[col] = df[col].fillna(0.0)

df = df.astype(str)
df = df.where(pd.notnull(df), None)

I then use pyodbc connection to insert dataframe into SQL Server

# Prepare columns and placeholders for the SQL query
columns = [f"[{col}]" for col in df.columns]
placeholders = ', '.join(['?' for _ in range(len(df.columns))])

# SQL Insert query
insert_data_query = f"""
    INSERT INTO {table_name} ({', '.join(columns)})
    VALUES ({placeholders})
"""
            
# Convert the dataframe rows to a list of tuples for insertion
rows = [tuple(x) for x in df.replace({None: None}).values]
# Execute the insert query for multiple rows
connection_cursor.executemany(insert_data_query, rows)
connection_cursor.connectionmit()

While inserting I get the below error:

pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Error converting data type nvarchar to numeric. (8114) (SQLExecDirectW)')

I believe, I have narrowed down the culprit to a parameter coming in the api

"CurrentValue": "0.00002"

While inserting this parameter, the parameter value has been changed to scientific notation, and I can see the value being changed to '2e-05'. The column that I'm inserting in the SQL server is of DECIMAL type. And I believe this is why I'm getting the error.

What I have done so far to resolve:

I have tried to suppress the scientific notation by:

pd.options.display.float_format = '{:.8f}'.format

and also tried to round off the columns like below during cleaning step:

# bigint list contains column names that I can treat as numeric
for col in bigint_list:
    df[col] = pd.to_numeric(df[col], errors='coerce')
    df[col] = df[col].fillna(0.0)
    df[col] = df[col].round(10)

df = df.astype(str)
df = df.where(pd.notnull(df), None)

However, nothing seems to work and I'm still seeing the value converted to scientific notation. Any help would be appreciated.

Below is what I'm trying to accomplish:

Get data from API
Clean the data
Insert the data into SQL Server

Issue that I'm facing:

I convert the data that I receive from api to a pandas dataframe

response = requests.get(API_SCHEME_DATA, headers=headers, data=payload, timeout=20)
if response.status_code == 200:
    data = response.json()["Data"]
    df = pd.DataFrame(data, dtype=str)

Now I, clean this data in a very simple step

# bigint list contains column names that I can treat as numeric
for col in bigint_list:
    df[col] = pd.to_numeric(df[col], errors='coerce')
    df[col] = df[col].fillna(0.0)

df = df.astype(str)
df = df.where(pd.notnull(df), None)

I then use pyodbc connection to insert dataframe into SQL Server

# Prepare columns and placeholders for the SQL query
columns = [f"[{col}]" for col in df.columns]
placeholders = ', '.join(['?' for _ in range(len(df.columns))])

# SQL Insert query
insert_data_query = f"""
    INSERT INTO {table_name} ({', '.join(columns)})
    VALUES ({placeholders})
"""
            
# Convert the dataframe rows to a list of tuples for insertion
rows = [tuple(x) for x in df.replace({None: None}).values]
# Execute the insert query for multiple rows
connection_cursor.executemany(insert_data_query, rows)
connection_cursor.connectionmit()

While inserting I get the below error:

pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Error converting data type nvarchar to numeric. (8114) (SQLExecDirectW)')

I believe, I have narrowed down the culprit to a parameter coming in the api

"CurrentValue": "0.00002"

While inserting this parameter, the parameter value has been changed to scientific notation, and I can see the value being changed to '2e-05'. The column that I'm inserting in the SQL server is of DECIMAL type. And I believe this is why I'm getting the error.

What I have done so far to resolve:

I have tried to suppress the scientific notation by:

pd.options.display.float_format = '{:.8f}'.format

and also tried to round off the columns like below during cleaning step:

# bigint list contains column names that I can treat as numeric
for col in bigint_list:
    df[col] = pd.to_numeric(df[col], errors='coerce')
    df[col] = df[col].fillna(0.0)
    df[col] = df[col].round(10)

df = df.astype(str)
df = df.where(pd.notnull(df), None)

However, nothing seems to work and I'm still seeing the value converted to scientific notation. Any help would be appreciated.

Share Improve this question edited Mar 28 at 5:33 asked Mar 28 at 5:28 sdawar 3532 silver badges13 bronze badges

why are you doing this df = df.astype(str)? try removing that line and re-running. the basic idea of the error message is that you're trying to insert a string to a numeric column. – mechanical_meat Commented Mar 28 at 6:06
INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({placeholders}) that looks a lot like injection; ideally you should be parametrising your queries (I assume Pandas is capable of that). – Thom A Commented Mar 28 at 9:02

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

You're probably encountering issues with how Python's json deserializes numbers from JSON responses. Consider the following example:

from httmock import all_requests, response, HTTMock
import pandas as pd
import requests

@all_requests
def response_content(url, request):
    headers = { 'Content-Type': 'application/json' }
    content = '{ "Data": [ { "CurrentValue": 0.00001 }, { "CurrentValue": 0.00002 } ] }'
    return response(200, content, headers, None, 5, request)

with HTTMock(response_content):
    response = requests.get('http://example./')
    data = response.json()["Data"]
    df = pd.DataFrame(data=data, dtype=str)
    print(df)

Which outputs the DataFrame:

  CurrentValue
0        1e-05
1        2e-05

To get more control over deserialization of the JSON response you can import the json module and invoke json.loads() directly, e.g.:

from httmock import all_requests, response, HTTMock
import json
import pandas as pd
import requests

@all_requests
def response_content(url, request):
    headers = { 'Content-Type': 'application/json' }
    content = '{ "Data": [ { "CurrentValue": 0.00001 }, { "CurrentValue": 0.00002 } ] }'
    return response(200, content, headers, None, 5, request)

with HTTMock(response_content):
    response = requests.get('http://example./')
    data = json.loads(response.text, parse_float=None)["Data"]
    df = pd.DataFrame(data=data)
    print(df)

This outputs the DataFrame:

   CurrentValue
0       0.00001
1       0.00002

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Unable to suppress scientific notation in datarame, which in turn is causing pyodbc.ProgrammingError "Error co

1 Answer 1

与本文相关的文章

评论列表(0)