最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

accessibility - OCRMyPdf conversion issue specially in table format - Stack Overflow

programmeradmin1浏览0评论

I’m working with a scanned PDF that contains a table with two columns, where each column has two lines of text. When I convert the scanned PDF using OCRmyPDF, I'm encountering an issue with the resulting content.

Tesseract processes the text line by line, but this causes OCRmyPDF to generate separate spans for each piece of content. Specifically, it creates a span for row 1, cell 1, then another span for row 1, cell 2, followed by separate spans for row 2, cell 1, and row 2, cell 2.

This results in accessibility problems for screen readers, as the content is not structured properly. Is there any way to resolve this issue and ensure the table is interpreted correctly by screen readers?

发布评论

评论列表(0)

  1. 暂无评论