最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How can I accurately count tokens for Llama3DeepSeek r1 prompts when Groq API reports “Request too large”? - Stack Over

programmeradmin0浏览0评论

I'm integrating the Groq API in my Flask application to classify social media posts using a model based on DeepSeek r1 (e.g., deepseek-r1-distill-llama-70b). I build a prompt by combining multiple texts and send it to the API. However, I keep receiving an error like this:

Request too large: The prompt exceeds the model's token limit. Please reduce your message size and try again.
Error code: 413 - {'error': {'message': 'Request too large for model `deepseek-r1-distill-llama-70b` in anization `_01jbv28e7qfp4rx8d51ybw9ypr` service tier on tokens per minute (TPM): Limit 6000, Requested 9262, please reduce your message size and try again. ...', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}

To handle this, I tried splitting my prompt into chunks so that each request stays below the limit. For token counting, I initially used a simple whitespace split—which clearly underestimates the number of tokens. Then, I switched to using Hugging Face’s deepseek-ai/DeepSeek-R1-Distill-Llama-70B and meta-llama/Meta-Llama-3-8B AutoTokenizer:

from transformers import AutoTokenizer

# Load GPT-2 tokenizer as a fallback for token counting
tokenizer_for_count = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-70B")

def count_tokens(text):
    return len(tokenizer_for_count.encode(text))

# Example function to build prompt chunks (simplified)
def build_prompt_for_chunk(chunk, offset):
    text_descriptions = [f'{{"id": {offset + i}, "text": "{text}"}}' for i, text in enumerate(chunk)]
    prompt = f"Posts: [{', '.join(text_descriptions)}]"
    return prompt

# Building the full prompt
full_prompt = build_prompt_for_chunk(texts, 0)
token_count = count_tokens(full_prompt)
print(f"Total tokens: {token_count}")

When I build my prompt,the deepseek-ai/DeepSeek-R1-Distill-Llama-70B and meta-llama/Meta-Llama-3-8B tokenizers gave me same token count of 7209 tokens. Then the GPT‑2 tokenizer gives me a token count of around 21,204 tokens, and I also did split() where I got 3059 tokens. However, the Groq API error indicates that my prompt is requesting around 9,262 tokens. This discrepancy makes me think that the GPT‑2 tokenizer isn’t an accurate proxy for my deployed model (which is based on Llama3/DeepSeek r1). What should I do to get closer to the accurate count?

def count_tokens(text):
    # Encode the text first.
    tokens = tokenizer_for_count.encode(text)
    # Decode tokens with cleaning disabled.
    decoded = tokenizer_for_count.decode(tokens, clean_up_tokenization_spaces=False)
    # Re-encode the decoded text.
    corrected_tokens = tokenizer_for_count.encode(decoded)
    return len(corrected_tokens)

Also tried to implement a “round-trip” approach to better match the original behavior. However, after applying this method, the total token count comes out to 7210 tokens—still far from the 9262 tokens reported by the Groq API.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论