I have a function to measure the allocated ram by python in megabytes:
def getram(): print(psutil.Process(os.getpid()).memory_info().rss / 1024**2)
And also I have:
device = "cuda"
My problem is that the following code allocates RAM and I'm going crazy because of that. Does it have an actual solution or do I have to accept my fate and switch to C++ or something?
The code:
getram()
def load_dataset(dir, filenames):
dataset = torch.zeros((len(filenames),3,256,256), device=device)
getram()
for i, filename in enumerate(filenames):
f = read_image(f"{dir}/{filename}")
if 3 != f.shape[0]: print(filename)
dataset[i] = f.to(device)
getram()
return dataset
dataset = load_dataset(dataset_dir, dataset_filenames)
getram()
The code printed out the following:
533.28125
661.2890625
678.27734375
678.27734375
As you can see, as soon as I create the empty tensor with the torch.zeros()
, it takes up the RAM for no reason.
I tried gc.collect()
, but it didn't help at all.
I have a function to measure the allocated ram by python in megabytes:
def getram(): print(psutil.Process(os.getpid()).memory_info().rss / 1024**2)
And also I have:
device = "cuda"
My problem is that the following code allocates RAM and I'm going crazy because of that. Does it have an actual solution or do I have to accept my fate and switch to C++ or something?
The code:
getram()
def load_dataset(dir, filenames):
dataset = torch.zeros((len(filenames),3,256,256), device=device)
getram()
for i, filename in enumerate(filenames):
f = read_image(f"{dir}/{filename}")
if 3 != f.shape[0]: print(filename)
dataset[i] = f.to(device)
getram()
return dataset
dataset = load_dataset(dataset_dir, dataset_filenames)
getram()
The code printed out the following:
533.28125
661.2890625
678.27734375
678.27734375
As you can see, as soon as I create the empty tensor with the torch.zeros()
, it takes up the RAM for no reason.
I tried gc.collect()
, but it didn't help at all.
1 Answer
Reset to default 0The RAM usage you are seeing is caused by loading various CUDA libraries, not from the tensor itself. When you first use CUDA, Pytorch lazily loads CUDA libraries into RAM. You can verify this with the code below (RAM usage numbers are what I got for my system, you will probably get different numbers but the overall point should be the same):
import os
import psutil
import torch
import time
def getram():
print(psutil.Process(os.getpid()).memory_info().rss / 1024**2)
device = 'cuda:0'
# get baseline ram
getram()
> 331.7734375
# create first cuda tensor
# this causes a large RAM increase due to loading CUDA libraries
tmp = torch.zeros(1, device=device)
time.sleep(0.1)
getram()
> 1251.25
# create dataset on GPU
dataset = torch.zeros((128,3,256,256), device=device)
# Slight RAM increase but mostly unchanged
getram()
> 1252.203125
Note that the time.sleep(0.1)
is there because I found running getram
right after allocating tmp
would sometimes return a value while CUDA libs were still loading (ie running getram
again right after without allocating any other values would yield a different result). The sleep is to ensure the libs are fully loaded.