最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

cuda - How to deal with a process holding nvidia GPU memory after termination? - Stack Overflow

programmeradmin1浏览0评论

I am facing an issue with a process that holds GPU memory even after I have terminated it. Here's a detailed breakdown of the situation:

The process (a CUDA application) is running and occupies GPU memory.

When I stop the process, it disappears from nvidia-smi and gpustat, but it still holds GPU memory and utilization rate is 100%, just like this:

[6] NVIDIA A100 80GB PCIe | 52'C, 100 % | 10151 / 81920 MB | (null)

Using nvidia-smi and gpustat cann't shows PID but using nvidia-smi --query-compute-apps=pid,used_memory --format=csv can shows the PID of the process still occupying memory, but:

When I try to kill it using kill -9 <pid>, I get the error: no such process.

The process is not shown as a zombie or defunct process in standard process listings (ps, top).

Driver Version: 535.183.01   
CUDA Version: 12.2
GPU: nvidia A100 80G

This issue persists, and I cannot free up the GPU memory. Have you encountered this problem before? How can I forcefully reclaim the GPU memory or kill such processes when kill -9 doesn't seem to work?

Any suggestions or insights on how to resolve this would be greatly appreciated.

Thanks in advance!

发布评论

评论列表(0)

  1. 暂无评论