最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

DeepSpeed model initialization memory overhead fix? - Stack Overflow

programmeradmin4浏览0评论

I'm trying to train a small llm on my local computer which has a single gpu with 16gb vram. I kept encoutering vram oom, so I was looking for a way to reduce vram use. DeepSpeed seemed interesting, so I tried it out. But just initializing the model uses a lot of memory and I'm still getting oom. Is there a way to initialize the model without memory overhead? Below is the code I used.

def create_policy_model():
    model = AutoModelForCausalLM.from_pretrained(
        policy_model_name,
        trust_remote_code=True,
        device_map="auto",
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        config=policy_autoconfig,
        attn_implementation="flash_attention_2",
        low_cpu_mem_usage=True
    )

    model.gradient_checkpointing_enable()
    model.enable_input_require_grads()

    return model

policy_model_1 = create_policy_model()

deepspeed_config = {
    "train_micro_batch_size_per_gpu": 1,
    "distributed_type": "NO",
    "optimizer": {
        "type": "Adam",
        "params": {"lr": 5e-5}
    },
    "bf16": {"enabled": True},
    "zero_optimization": {
        "stage": 3,
    },
}

# this is where vram use spikes (at around 10 sec in the scoreenshot below)
model_engine, _, _, _ = deepspeed.initialize(
    model=policy_model_1,
    model_parameters=policy_model_1.parameters(),
    config=deepspeed_config,
)

vram spike screenshot

I tried changing zero optimization stage to all the available values (0-3), but it's all the same.

发布评论

评论列表(0)

  1. 暂无评论