python - How do I get googlegemma-2-2b to strictly follow my prompt in Hugging Face Transformers?

I'm using the following code to send a prompt to the "google/gemma-2-2b" model via Hugging Face's Transformers pipeline:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

HUGGINGFACE_TOKEN = "<my-token>"

model_name = "google/gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    token=HUGGINGFACE_TOKEN
)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer, token=HUGGINGFACE_TOKEN)

prompt = "What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa"
output = text_generator(prompt, max_new_tokens=100)
print(output)

Expected output:

A) Paris

Actual output:

[{'generated_text': 'What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa 5) New York ...'}]

The model seems to be echoing the prompt and then generating a long list of options, not strictly following my instructions.

How can I modify the prompt or generation parameters so that the model produces output that strictly follows the prompt without just repeating the input? Any suggestions on which settings (e.g., temperature, sampling flags) or prompt modifications can help ensure that the model generates new text according to my instructions?

I'm using the following code to send a prompt to the "google/gemma-2-2b" model via Hugging Face's Transformers pipeline:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

HUGGINGFACE_TOKEN = "<my-token>"

model_name = "google/gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    token=HUGGINGFACE_TOKEN
)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer, token=HUGGINGFACE_TOKEN)

prompt = "What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa"
output = text_generator(prompt, max_new_tokens=100)
print(output)

Expected output:

A) Paris

Actual output:

[{'generated_text': 'What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa 5) New York ...'}]

The model seems to be echoing the prompt and then generating a long list of options, not strictly following my instructions.

Share edited Mar 18 at 13:16 OmG 18.9k12 gold badges67 silver badges95 bronze badges asked Mar 6 at 20:24 hanugm 1,4274 gold badges23 silver badges52 bronze badges

You may find the desired output by improving the prompt and also tweaking the temperature of the model. Yet, this model is not appropriate for chat. It is for text completion. – OmG Commented Mar 10 at 16:54

Add a comment |

1 Answer 1

Sorted by: Reset to default 2

You are using the text completion capability of the model. The following can be a chat model interaction:

chat = [
    { "role": "user", "content": "<your prompt text>" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - How do I get googlegemma-2-2b to strictly follow my prompt in Hugging Face Transformers? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)