I am a beginner in Large Language Models and the Hugging Face API. I was trying to fine tune the Llama 3.1 8b model on the wikitext dataset as practice.
When I try to run the following script, I get an unknown CUDA error.
CUDA error: unknown error
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
File "/root/mesh_LLM.py", line 84, in <module>
trainer.train()
RuntimeError: CUDA error: unknown error
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
My machine is a Dell Precision series with 16 GB VRAM NVIDIA RTX A5000, so I hope it shouldn't be a memory issue considering I am loading and finetuning the model in 8 bit precision.
Here is the code:
import torch,os
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import Trainer, TrainingArguments, BitsAndBytesConfig
from datasets import load_dataset
from peft import LoraConfig, get_peft_model
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
model_name = "meta-llama/Llama-3.1-8b"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, token='hf-****')
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(base_model_id, device_map='auto', load_in_8bit=True)
model.resize_token_embeddings(len(tokenizer))
peft_config = LoraConfig(r=16, lora_alpha=32, lora_dropout=0.05, bias='none', task_type="CAUSAL_LM")
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
# Load a dataset
dataset = load_dataset("wikitext", "wikitext-2-raw-v1")
# Tokenize the dataset
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=512)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
training_args = TrainingArguments(
output_dir="./llama3_finetuned", # Where to save the model
evaluation_strategy="steps", # Evaluate during training
save_strategy="steps", # Save checkpoints
learning_rate=2e-5, # A good starting point for fine-tuning
per_device_train_batch_size=4, # Adjust based on GPU memory
gradient_accumulation_steps=8, # Simulates a larger batch size
num_train_epochs=3, # Experiment with more epochs for small datasets
logging_steps=100, # Log training progress
save_steps=500, # Save model every 500 steps
push_to_hub=False # Skip pushing to Hugging Face Hub for now
)
trainer = Trainer(
model=model,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['validation'],
tokenizer=tokenizer,
args=training_args
)
trainer.train()
trainer.save_model('model_ft/fine_tuned_llama3-8B')
Any leads would be very helpful!