I am using this code from huggingface:
This code is directly pasted from the HuggingFace website's page on deepseek and is supposed to be plug-and-play code:
from transformers import pipeline messages = [ {"role": "user", "content": "Who are you?"}, ] pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1", >trust_remote_code=True) pipe(messages)
But I'm unable to load the model. When I do, I get this issue:
File "<...>/site-packages/transformers/quantizers/auto.py", line 97, in from_dict
raise ValueError(
ValueError: Unknown quantization type, got fp8 - supported types are:
['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq',
'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']
I tried different code:
import torch
generate_text = pipeline(model="deepseek-ai/DeepSeek-R1",torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
generate_text(messages)
This gives the following error:
raise ValueError( ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'higgs', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet', 'vptq']
What can I do?
I am using this code from huggingface:
This code is directly pasted from the HuggingFace website's page on deepseek and is supposed to be plug-and-play code:
from transformers import pipeline messages = [ {"role": "user", "content": "Who are you?"}, ] pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1", >trust_remote_code=True) pipe(messages)
But I'm unable to load the model. When I do, I get this issue:
File "<...>/site-packages/transformers/quantizers/auto.py", line 97, in from_dict
raise ValueError(
ValueError: Unknown quantization type, got fp8 - supported types are:
['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq',
'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']
I tried different code:
import torch
generate_text = pipeline(model="deepseek-ai/DeepSeek-R1",torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
generate_text(messages)
This gives the following error:
raise ValueError( ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'higgs', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet', 'vptq']
What can I do?
Share Improve this question edited yesterday desertnaut 60.4k32 gold badges152 silver badges180 bronze badges asked Feb 9 at 3:05 Akshit GulyanAkshit Gulyan 791 silver badge6 bronze badges 3- 1 Did you try after updating transformers python package? – Pycm Commented Feb 9 at 4:47
- 1 These posts could help error downloading model & Using huggingface_hub instead of pipeline to download – eternal_white Commented Feb 9 at 9:41
- 1 @Pycm Yes, I did, it's the latest version – Akshit Gulyan Commented Feb 10 at 13:02
2 Answers
Reset to default 3The code you posted is auto generated and is not correct. The model card states:
NOTE: Hugging Face's Transformers has not been directly supported yet.
The transformer library doesn't support the quantization method DeepSeek used for their model. Huggingface is working on a PR to officially support it, but it will take some more time.
You can still deploy the model on a GPUs that support fp8 via, for example, vllm:
# Install vLLM from pip:
pip install vllm
# Load and run the model:
vllm serve "deepseek-ai/DeepSeek-R1"
# Call the server using curl:
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "deepseek-ai/DeepSeek-R1",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'
Some people have simply removed the quantization part from the config to load it with transformers. I haven't tested the performance impact this might have, so use it with caution:
from transformers import AutoModelForCausalLM, AutoConfig
config = AutoConfig.from_pretrained("deepseek-ai/DeepSeek-R1", trust_remote_code=True)
del config.quantization_config
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1", config=config, trust_remote_code=True)
The code you should use is:
from huggingface_hub import InferenceClient
client = InferenceClient(
provider="nebius",
api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx"
)
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
completion = client.chatpletions.create(
model="deepseek-ai/DeepSeek-V3",
messages=messages,
max_tokens=500,
)
print(completion.choices[0].message)