最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

pdf - Mistral AI OCR not returning anything useful - Stack Overflow

programmeradmin5浏览0评论

I am trying to extract a table from a PDF.

I was able to use the Le Chat feature of Mistral and get a super great result, but when I try to use the API to programmatically get the same result, I am not able to replicate it. I tried using the OCR API but it did not return anything and the chat completion feature does not seem to be able to recognize my PDF uploads.

I have tried the following code snippet:

from mistralai import Mistral
from mistralai.models import File
import os

api_key = "API_KEY"

client = Mistral(api_key=api_key)

uploaded_pdf = await client.files.upload_async(
        file=File(
            file_name="table.pdf",
            content=open("./table.pdf", "rb").read(),
        ),
        purpose = "ocr",
    )
signed_url = client.files.get_signed_url(file_id=uploaded_pdf.id)
ocr_response = client.ocr.process(
    model="mistral-ocr-latest",
    document={
        "type": "document_url",
        "document_url": signed_url.url,
    }
)

The response is empty with the markdown being a single image file. Additionally, I tried this code snippet as well:

from mistralai import Mistral
from mistralai.models import File
import os

api_key = "API_KEY"

client = Mistral(api_key=api_key)

uploaded_pdf = await client.files.upload_async(
        file=File(
            file_name="table.pdf",
            content=open("./table.pdf", "rb").read(),
        ),
        purpose = "ocr",
    )
signed_url = client.files.get_signed_url(file_id=uploaded_pdf.id)
# Define the messages for the chat
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Can you extract the data from this table from the PDF given."
            },
            {
                "type": "document_url",
                "document_url": signed_url.url,
            }
        ]
    }
]

# Get the chat response
chat_response = client.chatplete(
    model="mistral-large-latest",
    messages=messages,
)

And the response I get is something along the lines of: I'm unable to directly access or view documents from URLs.

Can someone please let me know what I am doing wrong?

I am trying to extract a table from a PDF.

I was able to use the Le Chat feature of Mistral and get a super great result, but when I try to use the API to programmatically get the same result, I am not able to replicate it. I tried using the OCR API but it did not return anything and the chat completion feature does not seem to be able to recognize my PDF uploads.

I have tried the following code snippet:

from mistralai import Mistral
from mistralai.models import File
import os

api_key = "API_KEY"

client = Mistral(api_key=api_key)

uploaded_pdf = await client.files.upload_async(
        file=File(
            file_name="table.pdf",
            content=open("./table.pdf", "rb").read(),
        ),
        purpose = "ocr",
    )
signed_url = client.files.get_signed_url(file_id=uploaded_pdf.id)
ocr_response = client.ocr.process(
    model="mistral-ocr-latest",
    document={
        "type": "document_url",
        "document_url": signed_url.url,
    }
)

The response is empty with the markdown being a single image file. Additionally, I tried this code snippet as well:

from mistralai import Mistral
from mistralai.models import File
import os

api_key = "API_KEY"

client = Mistral(api_key=api_key)

uploaded_pdf = await client.files.upload_async(
        file=File(
            file_name="table.pdf",
            content=open("./table.pdf", "rb").read(),
        ),
        purpose = "ocr",
    )
signed_url = client.files.get_signed_url(file_id=uploaded_pdf.id)
# Define the messages for the chat
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Can you extract the data from this table from the PDF given."
            },
            {
                "type": "document_url",
                "document_url": signed_url.url,
            }
        ]
    }
]

# Get the chat response
chat_response = client.chatplete(
    model="mistral-large-latest",
    messages=messages,
)

And the response I get is something along the lines of: I'm unable to directly access or view documents from URLs.

Can someone please let me know what I am doing wrong?

Share Improve this question asked Mar 9 at 15:12 Shelly LiuShelly Liu 411 bronze badge 0
Add a comment  | 

2 Answers 2

Reset to default 1

You are absolutely right.

When calling the OCR method using the 'mistral-ocr-latest' model, if you send an image or a PDF with embedded images, all you get in the markdown property is a "![img-0.jpeg](img-0.jpeg)".

I've tried everything imaginable, from running in batch, using the upload method and even sending them as base64 strings. Added payment info and changed to the pay-as-you-go plan. No luck.

I guess there's something wrong with Mistral OCR, a bug or it's simply a scam.

We do get images for in each page as image_base64 in mistral-ocr api response please check the response object you get for the API call. Make sure you set param include_image_base64=True in the API call. I am attaching my API code snippet below

ocr_response = client.ocr.process(
    model=ocr_model,
    document={
        "type": "document_url",
        "document_url": "https://arxiv./pdf/2201.04234"
    },
    include_image_base64=True
)

Here's the response object:

发布评论

评论列表(0)

  1. 暂无评论