最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

ocr - Tesseract HOCR to a structured text for LLMs - Stack Overflow

programmeradmin1浏览0评论

I want to use the HOCR that I get from TesseractJS (I work on Javascript) and somehow transform it to be readable by a LLM. The goal is to reade technical documents with prices, tabs, header, lines, footer... not just a normal text.

Currently, I plan to "transform" the hOCR to a structured text, but I don't know how yet..

Any idea or anything else ?

发布评论

评论列表(0)

  1. 暂无评论