最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Unable to suppress scientific notation in datarame, which in turn is causing pyodbc.ProgrammingError "Error co

programmeradmin1浏览0评论

Below is what I'm trying to accomplish:

  1. Get data from API
  2. Clean the data
  3. Insert the data into SQL Server

Issue that I'm facing:

  • I convert the data that I receive from api to a pandas dataframe
response = requests.get(API_SCHEME_DATA, headers=headers, data=payload, timeout=20)
if response.status_code == 200:
    data = response.json()["Data"]
    df = pd.DataFrame(data, dtype=str)
  • Now I, clean this data in a very simple step
# bigint list contains column names that I can treat as numeric
for col in bigint_list:
    df[col] = pd.to_numeric(df[col], errors='coerce')
    df[col] = df[col].fillna(0.0)

df = df.astype(str)
df = df.where(pd.notnull(df), None)   
  • I then use pyodbc connection to insert dataframe into SQL Server
# Prepare columns and placeholders for the SQL query
columns = [f"[{col}]" for col in df.columns]
placeholders = ', '.join(['?' for _ in range(len(df.columns))])

# SQL Insert query
insert_data_query = f"""
    INSERT INTO {table_name} ({', '.join(columns)})
    VALUES ({placeholders})
"""
            
# Convert the dataframe rows to a list of tuples for insertion
rows = [tuple(x) for x in df.replace({None: None}).values]
# Execute the insert query for multiple rows
connection_cursor.executemany(insert_data_query, rows)
connection_cursor.connectionmit()
  • While inserting I get the below error:
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Error converting data type nvarchar to numeric. (8114) (SQLExecDirectW)')
  • I believe, I have narrowed down the culprit to a parameter coming in the api
"CurrentValue": "0.00002"
  • While inserting this parameter, the parameter value has been changed to scientific notation, and I can see the value being changed to '2e-05'. The column that I'm inserting in the SQL server is of DECIMAL type. And I believe this is why I'm getting the error.

What I have done so far to resolve:

  • I have tried to suppress the scientific notation by:
pd.options.display.float_format = '{:.8f}'.format

and also tried to round off the columns like below during cleaning step:

# bigint list contains column names that I can treat as numeric
for col in bigint_list:
    df[col] = pd.to_numeric(df[col], errors='coerce')
    df[col] = df[col].fillna(0.0)
    df[col] = df[col].round(10)

df = df.astype(str)
df = df.where(pd.notnull(df), None) 

However, nothing seems to work and I'm still seeing the value converted to scientific notation. Any help would be appreciated.

Below is what I'm trying to accomplish:

  1. Get data from API
  2. Clean the data
  3. Insert the data into SQL Server

Issue that I'm facing:

  • I convert the data that I receive from api to a pandas dataframe
response = requests.get(API_SCHEME_DATA, headers=headers, data=payload, timeout=20)
if response.status_code == 200:
    data = response.json()["Data"]
    df = pd.DataFrame(data, dtype=str)
  • Now I, clean this data in a very simple step
# bigint list contains column names that I can treat as numeric
for col in bigint_list:
    df[col] = pd.to_numeric(df[col], errors='coerce')
    df[col] = df[col].fillna(0.0)

df = df.astype(str)
df = df.where(pd.notnull(df), None)   
  • I then use pyodbc connection to insert dataframe into SQL Server
# Prepare columns and placeholders for the SQL query
columns = [f"[{col}]" for col in df.columns]
placeholders = ', '.join(['?' for _ in range(len(df.columns))])

# SQL Insert query
insert_data_query = f"""
    INSERT INTO {table_name} ({', '.join(columns)})
    VALUES ({placeholders})
"""
            
# Convert the dataframe rows to a list of tuples for insertion
rows = [tuple(x) for x in df.replace({None: None}).values]
# Execute the insert query for multiple rows
connection_cursor.executemany(insert_data_query, rows)
connection_cursor.connectionmit()
  • While inserting I get the below error:
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Error converting data type nvarchar to numeric. (8114) (SQLExecDirectW)')
  • I believe, I have narrowed down the culprit to a parameter coming in the api
"CurrentValue": "0.00002"
  • While inserting this parameter, the parameter value has been changed to scientific notation, and I can see the value being changed to '2e-05'. The column that I'm inserting in the SQL server is of DECIMAL type. And I believe this is why I'm getting the error.

What I have done so far to resolve:

  • I have tried to suppress the scientific notation by:
pd.options.display.float_format = '{:.8f}'.format

and also tried to round off the columns like below during cleaning step:

# bigint list contains column names that I can treat as numeric
for col in bigint_list:
    df[col] = pd.to_numeric(df[col], errors='coerce')
    df[col] = df[col].fillna(0.0)
    df[col] = df[col].round(10)

df = df.astype(str)
df = df.where(pd.notnull(df), None) 

However, nothing seems to work and I'm still seeing the value converted to scientific notation. Any help would be appreciated.

Share Improve this question edited Mar 28 at 5:33 sdawar asked Mar 28 at 5:28 sdawarsdawar 3532 silver badges13 bronze badges 2
  • why are you doing this df = df.astype(str)? try removing that line and re-running. the basic idea of the error message is that you're trying to insert a string to a numeric column. – mechanical_meat Commented Mar 28 at 6:06
  • INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({placeholders}) that looks a lot like injection; ideally you should be parametrising your queries (I assume Pandas is capable of that). – Thom A Commented Mar 28 at 9:02
Add a comment  | 

1 Answer 1

Reset to default 1

You're probably encountering issues with how Python's json deserializes numbers from JSON responses. Consider the following example:

from httmock import all_requests, response, HTTMock
import pandas as pd
import requests

@all_requests
def response_content(url, request):
    headers = { 'Content-Type': 'application/json' }
    content = '{ "Data": [ { "CurrentValue": 0.00001 }, { "CurrentValue": 0.00002 } ] }'
    return response(200, content, headers, None, 5, request)

with HTTMock(response_content):
    response = requests.get('http://example./')
    data = response.json()["Data"]
    df = pd.DataFrame(data=data, dtype=str)
    print(df)

Which outputs the DataFrame:

  CurrentValue
0        1e-05
1        2e-05

To get more control over deserialization of the JSON response you can import the json module and invoke json.loads() directly, e.g.:

from httmock import all_requests, response, HTTMock
import json
import pandas as pd
import requests

@all_requests
def response_content(url, request):
    headers = { 'Content-Type': 'application/json' }
    content = '{ "Data": [ { "CurrentValue": 0.00001 }, { "CurrentValue": 0.00002 } ] }'
    return response(200, content, headers, None, 5, request)

with HTTMock(response_content):
    response = requests.get('http://example./')
    data = json.loads(response.text, parse_float=None)["Data"]
    df = pd.DataFrame(data=data)
    print(df)

This outputs the DataFrame:

   CurrentValue
0       0.00001
1       0.00002

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论