最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - PermissionError: [Errno 13] Permission denied (Slurm) - Stack Overflow

programmeradmin0浏览0评论

I am currently running on my university's computing nodes to train my pytorch model. My data is on the University's remote filesystem as well. I have num_workers>0 and multiple runs going on in parallel. Although I have never had this problem before, now all of my runs seem to crash with this error:

PermissionError: Caught PermissionError in DataLoader worker process 6.
  Original Traceback (most recent call last):
    File "/root/miniconda3/envs/MASynth/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
      data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
    File "/root/miniconda3/envs/MASynth/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
      data = [self.dataset[idx] for idx in possibly_batched_index]
    File "/root/miniconda3/envs/MASynth/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
      data = [self.dataset[idx] for idx in possibly_batched_index]
    File "/remote/fs/users/UNet/code/mat_unet/version_4/data_3.py", line 183, in __getitem__
      sample['semantic']= to_tensor(normalize_images(np.expand_dims(cv2.resize(read_npz(semantic_dir), dsize=(256, 256), interpolation=cv2.INTER_NEAREST), axis=0), max_val=40))
    File "/remote/fs/users/users/UNet/code/mat_unet/version_4/utils.py", line 469, in read_npz
      with np.load(file) as data:
    File "/root/miniconda3/envs/MASynth/lib/python3.9/site-packages/numpy/lib/npyio.py", line 427, in load
      fid = stack.enter_context(open(os_fspath(file), "rb"))
  PermissionError: [Errno 13] Permission denied: '/remote/fs/datasets/dataset_name/version_2.0/folder1/folder2/file.npz

All my runs crash at different times with pointers to different files. What could be causing this and how can I fix it?

TIA

Previously, I was able to run multiple parallel processes on the same dataset without any issues. I have checked all permissions necessary for this dataset, and they are fine. I have been trying my best to ensure that my code is bug-free but still not able to get around the 'Permission Error'.

发布评论

评论列表(0)

  1. 暂无评论