最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Why does HuggingFace-provided Deepseek code result in an 'Unknown quantization type' error? - Stack Ove

programmeradmin1浏览0评论

I am using this code from huggingface:

This code is directly pasted from the HuggingFace website's page on deepseek and is supposed to be plug-and-play code:

from transformers import pipeline

messages = [
{"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1", >trust_remote_code=True)
pipe(messages)

But I'm unable to load the model. When I do, I get this issue:

File "<...>/site-packages/transformers/quantizers/auto.py", line 97, in from_dict

raise ValueError(

ValueError: Unknown quantization type, got fp8 - supported types are: 
['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 
'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']

I tried different code:

import torch
generate_text = pipeline(model="deepseek-ai/DeepSeek-R1",torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
generate_text(messages)

This gives the following error:

raise ValueError( ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'higgs', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet', 'vptq']

What can I do?

I am using this code from huggingface:

This code is directly pasted from the HuggingFace website's page on deepseek and is supposed to be plug-and-play code:

from transformers import pipeline

messages = [
{"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1", >trust_remote_code=True)
pipe(messages)

But I'm unable to load the model. When I do, I get this issue:

File "<...>/site-packages/transformers/quantizers/auto.py", line 97, in from_dict

raise ValueError(

ValueError: Unknown quantization type, got fp8 - supported types are: 
['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 
'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet']

I tried different code:

import torch
generate_text = pipeline(model="deepseek-ai/DeepSeek-R1",torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
generate_text(messages)

This gives the following error:

raise ValueError( ValueError: Unknown quantization type, got fp8 - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm', 'quanto', 'eetq', 'higgs', 'hqq', 'compressed-tensors', 'fbgemm_fp8', 'torchao', 'bitnet', 'vptq']

What can I do?

Share Improve this question edited yesterday desertnaut 60.4k32 gold badges152 silver badges180 bronze badges asked Feb 9 at 3:05 Akshit GulyanAkshit Gulyan 791 silver badge6 bronze badges 3
  • 1 Did you try after updating transformers python package? – Pycm Commented Feb 9 at 4:47
  • 1 These posts could help error downloading model & Using huggingface_hub instead of pipeline to download – eternal_white Commented Feb 9 at 9:41
  • 1 @Pycm Yes, I did, it's the latest version – Akshit Gulyan Commented Feb 10 at 13:02
Add a comment  | 

2 Answers 2

Reset to default 3

The code you posted is auto generated and is not correct. The model card states:

NOTE: Hugging Face's Transformers has not been directly supported yet.

The transformer library doesn't support the quantization method DeepSeek used for their model. Huggingface is working on a PR to officially support it, but it will take some more time.

You can still deploy the model on a GPUs that support fp8 via, for example, vllm:

# Install vLLM from pip:
pip install vllm
# Load and run the model:
vllm serve "deepseek-ai/DeepSeek-R1"
# Call the server using curl:
curl -X POST "http://localhost:8000/v1/chat/completions" \
    -H "Content-Type: application/json" \
    --data '{
        "model": "deepseek-ai/DeepSeek-R1",
        "messages": [
            {
                "role": "user",
                "content": "What is the capital of France?"
            }
        ]
    }'

Some people have simply removed the quantization part from the config to load it with transformers. I haven't tested the performance impact this might have, so use it with caution:

from transformers import AutoModelForCausalLM, AutoConfig

config = AutoConfig.from_pretrained("deepseek-ai/DeepSeek-R1", trust_remote_code=True)
del config.quantization_config

model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1", config=config, trust_remote_code=True)

The code you should use is:

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="nebius",
    api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx"
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chatpletions.create(
    model="deepseek-ai/DeepSeek-V3", 
    messages=messages, 
    max_tokens=500,
)

print(completion.choices[0].message)

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论