I am making an inference with Qwen2.5 VL 7B like this, but when I try to encode the image with base64, it exceeds the token limit (since the base64 is quite long).
from huggingface_hub import InferenceClient
import pyautogui
import base64
import pathlib
client = InferenceClient(
api_key=api_key
)
im1 = pyautogui.screenshot()
im1.save(path)
with open(path, "rb") as f:
image = base64.b64encode(f.read()).decode("utf-8")
image = f"data:image/jpeg;base64,{image}"
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": image
}
}
]
}
]
stream = client.chatpletions.create(
model="Qwen/Qwen2.5-VL-7B-Instruct",
messages=messages,
max_tokens=500,
stream=True
)
I couldn't use file path to input the image either, and I really wouldn't want to upload the image onto a hosting service to get the url link. Is there a way I can quickly make a call with vision with this inference?