最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Extract 4 million rows from database in chunks of 1 million rows each - Stack Overflow

programmeradmin2浏览0评论
  • I have a table in snowflake database which contains 4 to 8 millions of rows.

  • I want to extract 4 million rows in chunks to reduce load on database.

  • When loop runs in each loop, rows should append to array r (as shown in last line of code ) in chunks and at last all the data should be saved into CSV file.

I am new to python. I have seen other answers but they are not working as expected.

Can anyone help me.

chunksize= 1000000
query = "select from emp_table;"
cursor = conn.cursor()
cursor.execute(query)
r=[]

  • I have a table in snowflake database which contains 4 to 8 millions of rows.

  • I want to extract 4 million rows in chunks to reduce load on database.

  • When loop runs in each loop, rows should append to array r (as shown in last line of code ) in chunks and at last all the data should be saved into CSV file.

I am new to python. I have seen other answers but they are not working as expected.

Can anyone help me.

chunksize= 1000000
query = "select from emp_table;"
cursor = conn.cursor()
cursor.execute(query)
r=[]

Share Improve this question edited Nov 20, 2024 at 21:01 Christopher Jones 10.9k7 gold badges30 silver badges62 bronze badges asked Nov 17, 2024 at 5:20 LoganLogan 459 bronze badges 2
  • Your code doesn't help to qualify your problem. Please elaborate – Adon Bilivit Commented Nov 17, 2024 at 7:57
  • If you can sort the data, I would add a OFFSET/FETCH to your SQL query, so that you aren't selecting the entire table into memory. – Mike Walton Commented Nov 17, 2024 at 17:26
Add a comment  | 

1 Answer 1

Reset to default 1

As you mentioned, due to high volume data should get loaded in chunks to reduce load, however loading and appending in r could lead to a very memory-intensive task.

To avoid that you could load in chunks, write it and then load the next chunk and follow the same,so that we dont overload memory.

Here is a sample code

import snowflake.connector
import csv

def extract_and_write_data(conn, query, chunksize=1000000, 
    filename='extracted_data.csv'):
    cursor = conn.cursor()
    cursor.execute(query)

    with open(filename, 'w', newline='') as f:
        writer = csv.writer(f)
        writer.writerow(cursor.description)  # Write header row

        ''' your cursor should have all the rows,
          and below while loop will fetch data in 
          each chunksize, first it will fetch 1000000, 
          then next 1000000 and so on and will write to 
          database on each fetch.At last when chunk is 
          less than 1000000, it should break meaning there
           is no more row to be written.'''
       
        while True:
          chunk = cursor.fetchmany(chunksize)
          writer.writerows(chunk)

        # Check if there are more rows to fetch
          if len(chunk) < chunksize:
             break

# Connect to Snowflake
conn = snowflake.connector.connect(
    user='your_user',
    password='your_password',
    account='your_account',
    warehouse='your_warehouse',
    database='your_database',
    schema='your_schema'
)

# SQL query to extract all rows from the table
query = "SELECT * FROM emp_table"

# Extract and write data to CSV
extract_and_write_data(conn, query)

# Close the connection
conn.close()
发布评论

评论列表(0)

  1. 暂无评论