最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Nondeterministic behaviour of openpyxl - Stack Overflow

programmeradmin7浏览0评论

I have a Python script, that basically looks like this:

import mypackage

# this function generates always the same pandas.DataFrame
df = mypackage.create_the_dataframe()

# write the DataFrame to xlsx and csv
df.to_excel("the_dataframe_as.xlsx", index=False, engine="openpyxl")
df.to_csv("the_dataframe_as.csv", index=False)

I was trying to write a test for the create_the_dataframe function. So I checked the hash of the resulting xlsx and csv files and found that for two different runs of the script, the hash and file size of the resulting xlsx file changes. The hash for the csv remains the same.

Although I can live with this, I am very curious to understand why this is the case?

I have a Python script, that basically looks like this:

import mypackage

# this function generates always the same pandas.DataFrame
df = mypackage.create_the_dataframe()

# write the DataFrame to xlsx and csv
df.to_excel("the_dataframe_as.xlsx", index=False, engine="openpyxl")
df.to_csv("the_dataframe_as.csv", index=False)

I was trying to write a test for the create_the_dataframe function. So I checked the hash of the resulting xlsx and csv files and found that for two different runs of the script, the hash and file size of the resulting xlsx file changes. The hash for the csv remains the same.

Although I can live with this, I am very curious to understand why this is the case?

Share Improve this question edited Mar 27 at 9:58 d4tm4x asked Mar 27 at 8:56 d4tm4xd4tm4x 5884 silver badges18 bronze badges 2
  • 1 Have you tried changing/setting stuffs in the "to_excel" method? Perhaps setting the engine (i.e. defining it instead of leaving it blank) might work! pandas.pydata./docs/reference/api/… – user24758287 Commented Mar 27 at 9:15
  • I pinned it to engine="openpyxl" with the result being the same. So this seems more like an openpyxl topic. I'll update the question. – d4tm4x Commented Mar 27 at 9:57
Add a comment  | 

1 Answer 1

Reset to default 3

XLSX files contain metadata like the creation timestamp, which change with every newly written file. Plaintext CSV files do not contain such variable metadata, and thus their contents are entirely predictable.

发布评论

评论列表(0)

  1. 暂无评论