python - How can I accurately count tokens for Llama3DeepSeek r1 prompts when Groq API reports “Request too large”?

I'm integrating the Groq API in my Flask application to classify social media posts using a model based on DeepSeek r1 (e.g., deepseek-r1-distill-llama-70b). I build a prompt by combining multiple texts and send it to the API. However, I keep receiving an error like this:

Request too large: The prompt exceeds the model's token limit. Please reduce your message size and try again.
Error code: 413 - {'error': {'message': 'Request too large for model `deepseek-r1-distill-llama-70b` in anization `_01jbv28e7qfp4rx8d51ybw9ypr` service tier on tokens per minute (TPM): Limit 6000, Requested 9262, please reduce your message size and try again. ...', 'type': 'tokens', 'code': 'rate_limit_exceeded'}}

To handle this, I tried splitting my prompt into chunks so that each request stays below the limit. For token counting, I initially used a simple whitespace split—which clearly underestimates the number of tokens. Then, I switched to using Hugging Face’s deepseek-ai/DeepSeek-R1-Distill-Llama-70B and meta-llama/Meta-Llama-3-8B AutoTokenizer:

from transformers import AutoTokenizer

# Load GPT-2 tokenizer as a fallback for token counting
tokenizer_for_count = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-70B")

def count_tokens(text):
    return len(tokenizer_for_count.encode(text))

# Example function to build prompt chunks (simplified)
def build_prompt_for_chunk(chunk, offset):
    text_descriptions = [f'{{"id": {offset + i}, "text": "{text}"}}' for i, text in enumerate(chunk)]
    prompt = f"Posts: [{', '.join(text_descriptions)}]"
    return prompt

# Building the full prompt
full_prompt = build_prompt_for_chunk(texts, 0)
token_count = count_tokens(full_prompt)
print(f"Total tokens: {token_count}")

When I build my prompt,the deepseek-ai/DeepSeek-R1-Distill-Llama-70B and meta-llama/Meta-Llama-3-8B tokenizers gave me same token count of 7209 tokens. Then the GPT‑2 tokenizer gives me a token count of around 21,204 tokens, and I also did split() where I got 3059 tokens. However, the Groq API error indicates that my prompt is requesting around 9,262 tokens. This discrepancy makes me think that the GPT‑2 tokenizer isn’t an accurate proxy for my deployed model (which is based on Llama3/DeepSeek r1). What should I do to get closer to the accurate count?

def count_tokens(text):
    # Encode the text first.
    tokens = tokenizer_for_count.encode(text)
    # Decode tokens with cleaning disabled.
    decoded = tokenizer_for_count.decode(tokens, clean_up_tokenization_spaces=False)
    # Re-encode the decoded text.
    corrected_tokens = tokenizer_for_count.encode(decoded)
    return len(corrected_tokens)

Also tried to implement a “round-trip” approach to better match the original behavior. However, after applying this method, the total token count comes out to 7210 tokens—still far from the 9262 tokens reported by the Groq API.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - How can I accurately count tokens for Llama3DeepSeek r1 prompts when Groq API reports “Request too large”? - Stack Over

与本文相关的文章

评论列表(0)