I currently have a zip file, that holds an underlying csv. I would like to read the file row by row without extracting the entire CSV file from the zip.
The underlying csv is simply too big to extract so I need a work around
I currently have a zip file, that holds an underlying csv. I would like to read the file row by row without extracting the entire CSV file from the zip.
The underlying csv is simply too big to extract so I need a work around
Share Improve this question asked Feb 7 at 15:05 polliewpolliew 1 New contributor polliew is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 1 |1 Answer
Reset to default 2You can stream read the zip archive and get the contents of the first row via:
import zipfile
with zipfile.ZipFile("final_analysis_data.zip") as z: # 100m compressed
with z.open("final_analysis_data.csv") as f: # 650m uncompressed
first_row = next(f).decode()
input("check memory useage now, press enter to continue")
print(first_row)
The input()
statement will just pause and allow you to verify that you are not reading the entire archive into memory. With a 100m archive of a 650m csv in this example the python process uses 6m of ram.
Note:
If you feel that this resolves your issue, you might consider closing it as duplicate of:
Read a large zipped text file line by line in python
rather than accepting an answer.
zip
files are treated as folders, so you only have to supply the right path to your python program and it should be able to read it. – quamrana Commented Feb 7 at 15:08