最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - How do I get googlegemma-2-2b to strictly follow my prompt in Hugging Face Transformers? - Stack Overflow

programmeradmin3浏览0评论

I'm using the following code to send a prompt to the "google/gemma-2-2b" model via Hugging Face's Transformers pipeline:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

HUGGINGFACE_TOKEN = "<my-token>"

model_name = "google/gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    token=HUGGINGFACE_TOKEN
)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer, token=HUGGINGFACE_TOKEN)

prompt = "What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa"
output = text_generator(prompt, max_new_tokens=100)
print(output)

Expected output:

A) Paris

Actual output:

[{'generated_text': 'What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa 5) New York ...'}]

The model seems to be echoing the prompt and then generating a long list of options, not strictly following my instructions.

How can I modify the prompt or generation parameters so that the model produces output that strictly follows the prompt without just repeating the input? Any suggestions on which settings (e.g., temperature, sampling flags) or prompt modifications can help ensure that the model generates new text according to my instructions?

I'm using the following code to send a prompt to the "google/gemma-2-2b" model via Hugging Face's Transformers pipeline:

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

HUGGINGFACE_TOKEN = "<my-token>"

model_name = "google/gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_name, token=HUGGINGFACE_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16,
    token=HUGGINGFACE_TOKEN
)
text_generator = pipeline("text-generation", model=model, tokenizer=tokenizer, token=HUGGINGFACE_TOKEN)

prompt = "What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa"
output = text_generator(prompt, max_new_tokens=100)
print(output)

Expected output:

A) Paris

Actual output:

[{'generated_text': 'What is the capital of France? Just select an option. Choose only one option from the following A) Paris B) London C) Delhi 4) Goa 5) New York ...'}]

The model seems to be echoing the prompt and then generating a long list of options, not strictly following my instructions.

How can I modify the prompt or generation parameters so that the model produces output that strictly follows the prompt without just repeating the input? Any suggestions on which settings (e.g., temperature, sampling flags) or prompt modifications can help ensure that the model generates new text according to my instructions?

Share edited Mar 18 at 13:16 OmG 18.9k12 gold badges67 silver badges95 bronze badges asked Mar 6 at 20:24 hanugmhanugm 1,4274 gold badges23 silver badges52 bronze badges 1
  • You may find the desired output by improving the prompt and also tweaking the temperature of the model. Yet, this model is not appropriate for chat. It is for text completion. – OmG Commented Mar 10 at 16:54
Add a comment  | 

1 Answer 1

Reset to default 2

You are using the text completion capability of the model. The following can be a chat model interaction:

chat = [
    { "role": "user", "content": "<your prompt text>" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
发布评论

评论列表(0)

  1. 暂无评论