I have a Python script, that basically looks like this:
import mypackage
# this function generates always the same pandas.DataFrame
df = mypackage.create_the_dataframe()
# write the DataFrame to xlsx and csv
df.to_excel("the_dataframe_as.xlsx", index=False, engine="openpyxl")
df.to_csv("the_dataframe_as.csv", index=False)
I was trying to write a test for the create_the_dataframe
function. So I checked the hash of the resulting xlsx and csv files and found that for two different runs of the script, the hash and file size of the resulting xlsx file changes. The hash for the csv remains the same.
Although I can live with this, I am very curious to understand why this is the case?
I have a Python script, that basically looks like this:
import mypackage
# this function generates always the same pandas.DataFrame
df = mypackage.create_the_dataframe()
# write the DataFrame to xlsx and csv
df.to_excel("the_dataframe_as.xlsx", index=False, engine="openpyxl")
df.to_csv("the_dataframe_as.csv", index=False)
I was trying to write a test for the create_the_dataframe
function. So I checked the hash of the resulting xlsx and csv files and found that for two different runs of the script, the hash and file size of the resulting xlsx file changes. The hash for the csv remains the same.
Although I can live with this, I am very curious to understand why this is the case?
Share Improve this question edited Mar 27 at 9:58 d4tm4x asked Mar 27 at 8:56 d4tm4xd4tm4x 5884 silver badges18 bronze badges 2 |1 Answer
Reset to default 3XLSX files contain metadata like the creation timestamp, which change with every newly written file. Plaintext CSV files do not contain such variable metadata, and thus their contents are entirely predictable.
engine="openpyxl"
with the result being the same. So this seems more like an openpyxl topic. I'll update the question. – d4tm4x Commented Mar 27 at 9:57