最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

excel - XLS Old File Format Python Read Issue - Stack Overflow

programmeradmin2浏览0评论

I have an old file format xls file (Microsoft Excel 97-2003 Worksheet)

This file is opening in the excel app without any issues. But when i try to read with python xlrd engine.It throws the below error,

xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x03|\x0b\x0e\xfb\x9a\x00\x00'

Code:

df = pd.read_excel('my_file.xls', engine='xlrd')

When i try to debug the error, i found out that commonly xls file will follow the conventional hexadecimal bytes convention (⁠D0 CF 11 E0 A1 B1 1A E1⁠) at the BOF (beginning of the file). But in my file the bytes are (03 7C 0B 0E FB 9A 00 00 21)

I don't know the root cause of this formatting, i need to convert/read this file programmatically.

I found a solution in windows using win32 excel api

Windows Conversion code:

import win32com.client as win32
fname = "my_file.xls"
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(fname)
wb.SaveAs(fname+"x", FileFormat = 51)  
wb.Close()                              
excel.Application.Quit()

But when i try to do the same in linux, it is not working since win32 api will not work in linux i guess.

Kindly let me know if there are any solutions to handle this in linux

(**Note: I dont think my_file is a xml file also since i have tried with beautifulSoup also)

I have an old file format xls file (Microsoft Excel 97-2003 Worksheet)

This file is opening in the excel app without any issues. But when i try to read with python xlrd engine.It throws the below error,

xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x03|\x0b\x0e\xfb\x9a\x00\x00'

Code:

df = pd.read_excel('my_file.xls', engine='xlrd')

When i try to debug the error, i found out that commonly xls file will follow the conventional hexadecimal bytes convention (⁠D0 CF 11 E0 A1 B1 1A E1⁠) at the BOF (beginning of the file). But in my file the bytes are (03 7C 0B 0E FB 9A 00 00 21)

I don't know the root cause of this formatting, i need to convert/read this file programmatically.

I found a solution in windows using win32 excel api

Windows Conversion code:

import win32com.client as win32
fname = "my_file.xls"
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(fname)
wb.SaveAs(fname+"x", FileFormat = 51)  
wb.Close()                              
excel.Application.Quit()

But when i try to do the same in linux, it is not working since win32 api will not work in linux i guess.

Kindly let me know if there are any solutions to handle this in linux

(**Note: I dont think my_file is a xml file also since i have tried with beautifulSoup also)

Share Improve this question edited Feb 17 at 22:31 James Z 12.3k10 gold badges27 silver badges47 bronze badges asked Feb 16 at 16:21 Ambarish SrinivasanAmbarish Srinivasan 91 silver badge1 bronze badge New contributor Ambarish Srinivasan is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 2
  • 1 Did you check this question? From the question, it seems that xlrd change its behavior after version 1.2.0. – ndclt Commented Feb 16 at 19:23
  • @ndclt Thanks for your suggestion. I have tried with the xlrd version 1.2.0 also before itself, but the same issue persists. – Ambarish Srinivasan Commented Feb 17 at 6:30
Add a comment  | 

1 Answer 1

Reset to default 1

The change to xlrd is that it no longer supports the newer Excel XLSX files but continues to support the old propriety Excel format XLS. Therefore it did and still does open the file purported to be the type the Poster has. However the magic number the Poster states their XLS file has does not match any known file type (that I can find) so not sure what the file type is.
However given that it appears Excel can open the file it must be in some recognisable format.

If you're compelled to modify the files on a Linux PC then as you state any python module that interacts with the Excel app cannot be used. Therefore if it's available or can be installed then LibreOffice may be able to help you.
The libreoffice executable has conversion options that you can use in a headless mode.

For example;
The LibreOffice executable is usually named 'soffice' in later versions.

soffice --convert-to xlsx my_file.xls

If LibreOffice can read the original file ('my_file.xls'), this will create an XLSX file with the same name which should then be able to be read using Openpyxl or Pandas on the same Linux PC.

If not you may just have to do a mass conversion of all and any XLS files to XLSX on windows and then copy them to Linux PC or try to determine what the actual file type is from the originator.

发布评论

评论列表(0)

  1. 暂无评论