If I have a classification model based on Qwen2.5-0.5B training:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
model = AutoModelForSequenceClassification.from_pretrained(
"Qwen/Qwen2.5-0.5B",
device_map="auto",
num_labels=2,
torch_dtype=torch.bfloat16,
)
How do I quantify it for AWQ and calibrate it?
I'm trying to load the classification model using transformers and saving it:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B")
model = AutoModelForSequenceClassification.from_pretrained(
"Qwen/Qwen2.5-0.5B",
device_map="auto",
num_labels=2,
torch_dtype=torch.bfloat16,
)
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)
```python
The conversion was then performed using AutoAwq, but it was found that the head layer changed from score to lm_head after quantisation (the model architecture was changed).
```python
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import AwqConfig, AutoConfig
import torch
quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoAWQForCausalLM.from_pretrained(model_path, device_map="auto", safetensors=True)
model.quantize(tokenizer, quant_config=quant_config, calib_data=data)
model.save_quantized(quant_path, safetensors=True, shard_size="4GB")
tokenizer.save_pretrained(quant_path)