I am trying to reduce the memory requirements of a model I am working with. The model, which I cannot share in its entirety, is not large, only about 600k parameters. The main part consists of two time-convolutional neural networks (running independently) and some encoder/decoder blocks. I added four recurrent autoencoders and try to speed up training by increasing the batch size. That is where I run out of memory. Now I try to understand the error message I receive, which appears odd to me, the important part reads:
Out of memory while trying to allocate 57203235504 bytes. BufferAssignment OOM Debugging. BufferAssignment stats: parameter allocation: 21.34MiB constant allocation: 391.61MiB maybe_live_out allocation: 5.69MiB preallocated temp allocation: 53.27GiB total allocation: 53.68GiB
What exactly is "preallocated temp allocation" and why is it so large? The RNNs process long sequences (~8k timesteps), perhaps this is related to the RNNs.
Preallocation sounds to me like memory that may be required in the future, but currently is not. Therefore my hope is, that I can somehow tell tensorflow to not preallocate the memory. I already use
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(enable=True, device=physical_devices[0])
but to no avail, this does not affect the OOM error. As sources are scarce regarding this detail, I was wondering whether someone here could shed some light on the details of this OOM error. Why does tensorflow preallocate so much memory?