I am trying to use a multimodal model from Huggingface hub. I tried with "maya-multimodal/maya" model.(Following is the code to load the model): from llama_index.multi_modal_llms.huggingface import HuggingFaceMultiModal from llama_index.core.schema import ImageDocument
model = HuggingFaceMultiModal.from_model_name("maya-multimodal/maya")
While doing this I am getting the following error while loading the model.(Please find the error below). I have two questions here.
- How can I solve this error.
- What are some good quantized multimodal models in Huggingface which I can use(for image to text).
The error: ValueError: The checkpoint you are trying to load has model type llava_cohere but Transformers does not recognise this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+.git
I tried doing both the things suggested:
- pip install --upgrade transformers and
- pip install git+.git but non of them worked.