I have fine-tuned a LLaVA (Large Language and Vision Assistant) model on Google Colab and saved it to my Google Drive. Here’s how I saved the model:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
import os
save_path = "/content/drive/MyDrive/fineTune model1/LLaVA-med-MAKAUT_v1"
os.makedirs(save_path, exist_ok=True)
trainer.model.save_pretrained(save_path)
trainer.tokenizer.save_pretrained(save_path)
processor.image_processor.save_pretrained(save_path)
After saving, my Google Drive folder contains the following files:
README.md
adapter_model.safetensors
adapter_config.json
tokenizer_config.json
special_tokens_map.json
added_tokens.json
tokenizer.model
tokenizer.json
preprocessor_config.json
config.json
However, when I try to load the model for testing, I get an AttributeError related to patch_size:
import torch
from PIL import Image
from transformers import LlavaProcessor, LlavaForConditionalGeneration, CLIPImageProcessor
model_path = "/content/drive/MyDrive/fineTune model/LLaVA-med-MAKAUT_v1"
processor1 = LlavaProcessor.from_pretrained(model_path)
Checking patch size from the model's vision_config
patch_size = new_model_v1.config.vision_config.patch_size
print("Patch size:", patch_size)
Output:
Patch size: 14
Error Occurs Here :
print(processor1.image_processor.patch_size)
Error Message:
AttributeError: 'CLIPImageProcessor' object has no attribute 'patch_size'
What I Have Tried:
Ensuring that the model is properly saved and loaded.
Confirming that the patch size is present in the model's vision configuration (patch_size: 14).
Attempting to manually set patch_size:
processor1.image_processor.patch_size = 14
However, this doesn't seem to be the right approach since CLIPImageProcessor doesn’t have this attribute.
Questions:
- Why does CLIPImageProcessor lack the patch_size attribute even though it is defined in the model’s vision_config?
- What is the correct way to ensure that the LLaVA processor aligns with the fine-tuned model’s configuration, especially concerning patch_size?
- Is there a recommended way to properly load and utilize the fine-tuned LLaVA model along with its processor for inference in Colab?