I’m trying to train a language model using google/gemma-2-2b
with the Hugging Face Transformers Trainer
. The same training script works fine for other models like gpt2
and meta-llama/Meta-Llama-3-8B
, but with Gemma-2-2B it fails during evaluation, showing:
RuntimeError: Index put requires the source and destination dtypes match, got Float for the destination and BFloat16 for the source.
Below is the full console output (and the relevant code excerpt at the end). Note that I already attempted the following:
- Setting
attn_implementation='eager'
for Gemma-2-2B. - Switching out of
paged_adamw_32bit
. - (Un)commenting
gradient_checkpointing
.
I still get this dtype mismatch error at eval time. Any ideas on how to resolve or work around this?
Full console output:
Kwargs to run:
{'mode': 'dryrun', 'project': 'self-opt-train-uncompiled-py-2-gsm8k', 'num_train_epochs': 1, 'model_name': 'google/gemma-2-2b', 'today': '2025_m02_d07_t07h_20m_14s', 'tmux_sess_num': None, 'hostname': 'skampere1'}
Setting random seed = 42
vLLM not installed or vllm set seed has a bug, skipping vLLM seed setting.
Currently logged in as: brando
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 5.63it/s]
block_size=1024
len(ds_train)=18612
len(ds_train)=2740
/lfs/skampere1/0/brando9/miniconda/envs/zip_fit/lib/python3.11/site-packages/transformers/training_args.py:1575: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of