I want to use the HOCR that I get from TesseractJS (I work on Javascript) and somehow transform it to be readable by a LLM. The goal is to reade technical documents with prices, tabs, header, lines, footer... not just a normal text.
Currently, I plan to "transform" the hOCR to a structured text, but I don't know how yet..
Any idea or anything else ?