I am trying to anonymize whole slide imaging files, using Python code that I found from Tomi Lilja (/), which I have modified slightly to aid in debugging (I added print statements, as reproduced in the code below [with the file locations replaced by *** to protect privacy]). This program worked excellently well for 56 of my 59 files, ranging in size from 2,397,110 KB to 5,450,684 KB.
Unfortunately, I cannot get it to work for the three largest files - 5,820,441 KB, 5,881,189 KB, and 6,096,842 KB.
import os
import tifftools
source_dir = 'C:\\***\\'
target_dir = 'C:\\***\\'
for filename in os.listdir(source_dir):
if filename.endswith('.ndpi'):
print('line 10')
sourcefile = os.path.join(source_dir, filename)
print('line 12')
temporaryfile = os.path.join(target_dir, filename.replace(".ndpi",".tmp"))
print('line 14')
targetfile = os.path.join(target_dir, filename)
print('line 16')
slide = tifftools.read_tiff(sourcefile)
print('line 18')
# make sure the file is in NDPI format
if slide['ifds'][0]['tags'][tifftools.Tag.NDPI_FORMAT_FLAG.value]['data'][0] == 1:
# create Reference- and concat-lists for tifftools commands
reference_array = []
concat_array = []
for x in range(len(slide['ifds'])):
if slide['ifds'][x]['tags'][tifftools.Tag.NDPI_SOURCELENS.value]['data'][0] != -1.0:
concat_array += [sourcefile+","+str(x)]
for x in range(len(slide['ifds'])-1):
reference_array += ["NDPI_REFERENCE,"+str(x)]
print('line 29')
# remove the label image
tifftools.tiff_concat(concat_array, temporaryfile)
print('line 32')
# remove the Reference tags
tifftools.tiff_set(temporaryfile, targetfile, unset=reference_array)
os.remove(temporaryfile)
print('line36')
print("completed")
I get the following console output and error message (with the file location replaced by *** to protect privacy):
runfile('C:/***/NDPI_Anon.py', wdir='C:/***')
line 10
line 12
line 14
line 16
Traceback (most recent call last):
File ~\anaconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)
File c:\***\ndpi_anon.py:17
slide = tifftools.read_tiff(sourcefile)
File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:115 in read_tiff
nextifd = read_ifd(tiff, info, nextifd, info['ifds'])
File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:211 in read_ifd
read_ifd_tag_data(tiff, info, ifd, tagSet)
File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:250 in read_ifd_tag_data
taginfo['data'] = list(struct.unpack(
MemoryError
I put a few print statements in to see where it gets bogged down - it seems to be an issue with tifftools.read_tiff.
I'm running this in Spyder through Anaconda on a Windows 11 machine with 16gb of RAM, if that matters.
Has anyone run into this issue before with large image files, or have suggestions on how I may be able to resolve this?
I am trying to anonymize whole slide imaging files, using Python code that I found from Tomi Lilja (https://scribesroom.wordpress/2024/03/15/anonymizing-ndpi-slide-scans/), which I have modified slightly to aid in debugging (I added print statements, as reproduced in the code below [with the file locations replaced by *** to protect privacy]). This program worked excellently well for 56 of my 59 files, ranging in size from 2,397,110 KB to 5,450,684 KB.
Unfortunately, I cannot get it to work for the three largest files - 5,820,441 KB, 5,881,189 KB, and 6,096,842 KB.
import os
import tifftools
source_dir = 'C:\\***\\'
target_dir = 'C:\\***\\'
for filename in os.listdir(source_dir):
if filename.endswith('.ndpi'):
print('line 10')
sourcefile = os.path.join(source_dir, filename)
print('line 12')
temporaryfile = os.path.join(target_dir, filename.replace(".ndpi",".tmp"))
print('line 14')
targetfile = os.path.join(target_dir, filename)
print('line 16')
slide = tifftools.read_tiff(sourcefile)
print('line 18')
# make sure the file is in NDPI format
if slide['ifds'][0]['tags'][tifftools.Tag.NDPI_FORMAT_FLAG.value]['data'][0] == 1:
# create Reference- and concat-lists for tifftools commands
reference_array = []
concat_array = []
for x in range(len(slide['ifds'])):
if slide['ifds'][x]['tags'][tifftools.Tag.NDPI_SOURCELENS.value]['data'][0] != -1.0:
concat_array += [sourcefile+","+str(x)]
for x in range(len(slide['ifds'])-1):
reference_array += ["NDPI_REFERENCE,"+str(x)]
print('line 29')
# remove the label image
tifftools.tiff_concat(concat_array, temporaryfile)
print('line 32')
# remove the Reference tags
tifftools.tiff_set(temporaryfile, targetfile, unset=reference_array)
os.remove(temporaryfile)
print('line36')
print("completed")
I get the following console output and error message (with the file location replaced by *** to protect privacy):
runfile('C:/***/NDPI_Anon.py', wdir='C:/***')
line 10
line 12
line 14
line 16
Traceback (most recent call last):
File ~\anaconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
exec(code, globals, locals)
File c:\***\ndpi_anon.py:17
slide = tifftools.read_tiff(sourcefile)
File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:115 in read_tiff
nextifd = read_ifd(tiff, info, nextifd, info['ifds'])
File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:211 in read_ifd
read_ifd_tag_data(tiff, info, ifd, tagSet)
File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:250 in read_ifd_tag_data
taginfo['data'] = list(struct.unpack(
MemoryError
I put a few print statements in to see where it gets bogged down - it seems to be an issue with tifftools.read_tiff.
I'm running this in Spyder through Anaconda on a Windows 11 machine with 16gb of RAM, if that matters.
Has anyone run into this issue before with large image files, or have suggestions on how I may be able to resolve this?
Share Improve this question asked Mar 27 at 0:01 user30073632user30073632 33 bronze badges 3- If you directly try just one of the largest files, does it still give the error? I'm guessing it will but in case it works there might be something you can do with sorcing garbage collection. – JonSG Commented Mar 27 at 1:03
- 1 I tried just running one large file at a time. It still didn't work. I restarted my computer and tried again - that didn’t work either. – user30073632 Commented Mar 27 at 2:00
- Conceptually, it seems possible to me to stream read and write the file and modify the metadata as you go, but I don't know the till file layout at all so I can't advise you there. – JonSG Commented Mar 27 at 14:20
1 Answer
Reset to default 0This is simply a memory problem. I suggest you:
Get more RAM. Even if it's 16GB, the error code clearly says it's not enough.
Resizing the largest ones with the size 4-th largest picture in the dataset. This will give the most minimal impact to what you do.