最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

nlp - How to Fine-Tune Projection Layer in CLIP Model Using LoRA? - Stack Overflow

programmeradmin5浏览0评论

I'm trying to fine-tune the projection layers in the CLIP model using LoRA.

I need help identifying the exact projection layers to modify for my fine-tuning and how I can apply LoRA to them.

Model loading:

import clip

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

Model structure when printed

CLIP(
  (visual): VisionTransformer()
  (transformer): Transformer()
  (token_embedding): Embedding(49408, 512)
  (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)

I need help identifying the exact projection layers to modify for my fine-tuning and how I can apply LoRA to them.

I'm trying to fine-tune the projection layers in the CLIP model using LoRA.

I need help identifying the exact projection layers to modify for my fine-tuning and how I can apply LoRA to them.

Model loading:

import clip

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

Model structure when printed

CLIP(
  (visual): VisionTransformer()
  (transformer): Transformer()
  (token_embedding): Embedding(49408, 512)
  (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
)

I need help identifying the exact projection layers to modify for my fine-tuning and how I can apply LoRA to them.

Share Improve this question edited Mar 26 at 13:39 cronoik 19.6k4 gold badges51 silver badges90 bronze badges asked Mar 17 at 7:37 FadelaFadela 211 silver badge3 bronze badges 2
  • Welcome to SO. How are you loading the model? Via original openai code? Keep in mind that the projection layers are just linear layers, which means you won't benefit (much) from classic lora. – cronoik Commented Mar 22 at 16:37
  • @cronoik Thank you for your comment! I am indeed using clip.load("ViT-B/32", device=device) from the standard clip library. yes this is only an experiment for me to do.. I'm still a bit lost on where to apply LoRA within the model.. I've tried looking for layers with "proj" in their name, but I'm not sure if those are the correct projection layers for LoRA. Could you clarify which kind of layers are typically considered "projection layers" in CLIP for LoRA fine-tuning? Maybe knowing the layer type or position in the network flow would help me identify them accurately. – Fadela Commented Mar 26 at 0:35
Add a comment  | 

1 Answer 1

Reset to default 1

You will not see the projection layers when you print the architecture with print(model), because the projection layers are initialized with nn.Parameter() in the openai CLIP repo (unlike the huggingface implementation which uses linear layers). The code references can be found:

  • visual projection layer: code
  • text projection layer: code

You can still print the layers initialized with nn.Parameter by:

for name, param in model.named_parameters():
    print(f'{name}: {param.shape}')

Output:

text_projection: torch.Size([512, 512])
visual.proj: torch.Size([768, 512])
...

The issue you face now is that nn.Parameter is not supported by peft/LoRA (explanation). You could now either modify the Clip code (using nn.Linear instead of nn.Parameter) or use the CLIP implementation of huggingface (mind the different layer names):

from transformers import CLIPModel
from peft import LoraConfig, get_peft_model

transformers_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")

config = LoraConfig(
    target_modules=["visual_projection", "text_projection"],
)

peft_model = get_peft_model(transformers_model, config)
peft_model.print_trainable_parameters()

Output:

trainable params: 18,432 || all params: 151,295,745 || trainable%: 0.0122
发布评论

评论列表(0)

  1. 暂无评论