I'm using the following code to send a prompt to the "google/gemma-2-2b" model via Hugging Face's Transformers pipeline:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
HUGGINGFACE_TOKEN = "<my-token>"
model_name = "google/gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16,
token=HUGGINGFACE_TOKEN
)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer, token=HUGGINGFACE_TOKEN)
prompt = "What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa"
output = text_generator(prompt, max_new_tokens=100)
print(output)
Expected output:
A) Paris
Actual output:
[{'generated_text': 'What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa 5) New York ...'}]
The model seems to be echoing the prompt and then generating a long list of options, not strictly following my instructions.
How can I modify the prompt or generation parameters so that the model produces output that strictly follows the prompt without just repeating the input? Any suggestions on which settings (e.g., temperature, sampling flags) or prompt modifications can help ensure that the model generates new text according to my instructions?
I'm using the following code to send a prompt to the "google/gemma-2-2b" model via Hugging Face's Transformers pipeline:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
HUGGINGFACE_TOKEN = "<my-token>"
model_name = "google/gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16,
token=HUGGINGFACE_TOKEN
)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer, token=HUGGINGFACE_TOKEN)
prompt = "What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa"
output = text_generator(prompt, max_new_tokens=100)
print(output)
Expected output:
A) Paris
Actual output:
[{'generated_text': 'What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa 5) New York ...'}]
The model seems to be echoing the prompt and then generating a long list of options, not strictly following my instructions.
How can I modify the prompt or generation parameters so that the model produces output that strictly follows the prompt without just repeating the input? Any suggestions on which settings (e.g., temperature, sampling flags) or prompt modifications can help ensure that the model generates new text according to my instructions?
Share edited Mar 18 at 13:16 OmG 18.9k12 gold badges67 silver badges95 bronze badges asked Mar 6 at 20:24 hanugmhanugm 1,4274 gold badges23 silver badges52 bronze badges 1- You may find the desired output by improving the prompt and also tweaking the temperature of the model. Yet, this model is not appropriate for chat. It is for text completion. – OmG Commented Mar 10 at 16:54
1 Answer
Reset to default 2You are using the text completion capability of the model. The following can be a chat model interaction:
chat = [
{ "role": "user", "content": "<your prompt text>" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)