你的位置：首页>programmer>pymupdf - Why can't I extract text from this pdf? - Stack Overflow

pymupdf - Why can't I extract text from this pdf? - Stack Overflow

programmeradmin2025-04-062浏览0评论

I have tried a variety of methods both online and using python to extract text from this pdf:

.pdf

And every time I just get seemingly random characters for instance 1$$2!!2!"34$+5

I have tried online options as well as this:

import fitz

text = ""
path = "/home/serveracct/342.pdf"

doc = fitz.open(path)

for page in doc:
    text += page.get_text()
print(text)

I believe since a court doc the format is PDF/A but not 100% sure. I tried detecting an image file but could not. ocrmypdf works but the files become huge..For now I am just trying to determine what about the structure is preventing me from extracting the text. Also when I open this pdf in Adobe there are random boxes showing up on the page:

与本文相关的文章

pymupdf - Why can't I extract text from this pdf? - Stack Overflow

评论列表(0)

暂无评论

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

与本文相关的文章

评论列表(0)