Error working with large image files (>5.8 GB) in Python

I am trying to anonymize whole slide imaging files, using Python code that I found from Tomi Lilja (/), which I have modified slightly to aid in debugging (I added print statements, as reproduced in the code below [with the file locations replaced by *** to protect privacy]). This program worked excellently well for 56 of my 59 files, ranging in size from 2,397,110 KB to 5,450,684 KB.

Unfortunately, I cannot get it to work for the three largest files - 5,820,441 KB, 5,881,189 KB, and 6,096,842 KB.



import os
import tifftools

source_dir = 'C:\\***\\'
target_dir = 'C:\\***\\'

for filename in os.listdir(source_dir):

    if filename.endswith('.ndpi'):
        print('line 10')

        sourcefile = os.path.join(source_dir, filename)
        print('line 12')

        temporaryfile = os.path.join(target_dir, filename.replace(".ndpi",".tmp"))
        print('line 14')

        targetfile = os.path.join(target_dir, filename)    
        print('line 16')

        slide = tifftools.read_tiff(sourcefile)
        print('line 18')

        # make sure the file is in NDPI format
        if slide['ifds'][0]['tags'][tifftools.Tag.NDPI_FORMAT_FLAG.value]['data'][0] == 1:
            # create Reference- and concat-lists for tifftools commands
            reference_array = []
            concat_array = []

            for x in range(len(slide['ifds'])):
                if slide['ifds'][x]['tags'][tifftools.Tag.NDPI_SOURCELENS.value]['data'][0] != -1.0:
                    concat_array += [sourcefile+","+str(x)]

            for x in range(len(slide['ifds'])-1):
                reference_array += ["NDPI_REFERENCE,"+str(x)]
            print('line 29')

            # remove the label image
            tifftools.tiff_concat(concat_array, temporaryfile)
            print('line 32')

            # remove the Reference tags
            tifftools.tiff_set(temporaryfile, targetfile, unset=reference_array)
            os.remove(temporaryfile)

            print('line36')

print("completed")

I get the following console output and error message (with the file location replaced by *** to protect privacy):

runfile('C:/***/NDPI_Anon.py', wdir='C:/***')
line 10
line 12
line 14
line 16
Traceback (most recent call last):

  File ~\anaconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File c:\***\ndpi_anon.py:17
    slide = tifftools.read_tiff(sourcefile)

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:115 in read_tiff
    nextifd = read_ifd(tiff, info, nextifd, info['ifds'])

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:211 in read_ifd
    read_ifd_tag_data(tiff, info, ifd, tagSet)

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:250 in read_ifd_tag_data
    taginfo['data'] = list(struct.unpack(

MemoryError

I put a few print statements in to see where it gets bogged down - it seems to be an issue with tifftools.read_tiff.

I'm running this in Spyder through Anaconda on a Windows 11 machine with 16gb of RAM, if that matters.

Has anyone run into this issue before with large image files, or have suggestions on how I may be able to resolve this?

I am trying to anonymize whole slide imaging files, using Python code that I found from Tomi Lilja (https://scribesroom.wordpress/2024/03/15/anonymizing-ndpi-slide-scans/), which I have modified slightly to aid in debugging (I added print statements, as reproduced in the code below [with the file locations replaced by *** to protect privacy]). This program worked excellently well for 56 of my 59 files, ranging in size from 2,397,110 KB to 5,450,684 KB.

Unfortunately, I cannot get it to work for the three largest files - 5,820,441 KB, 5,881,189 KB, and 6,096,842 KB.



import os
import tifftools

source_dir = 'C:\\***\\'
target_dir = 'C:\\***\\'

for filename in os.listdir(source_dir):

    if filename.endswith('.ndpi'):
        print('line 10')

        sourcefile = os.path.join(source_dir, filename)
        print('line 12')

        temporaryfile = os.path.join(target_dir, filename.replace(".ndpi",".tmp"))
        print('line 14')

        targetfile = os.path.join(target_dir, filename)    
        print('line 16')

        slide = tifftools.read_tiff(sourcefile)
        print('line 18')

        # make sure the file is in NDPI format
        if slide['ifds'][0]['tags'][tifftools.Tag.NDPI_FORMAT_FLAG.value]['data'][0] == 1:
            # create Reference- and concat-lists for tifftools commands
            reference_array = []
            concat_array = []

            for x in range(len(slide['ifds'])):
                if slide['ifds'][x]['tags'][tifftools.Tag.NDPI_SOURCELENS.value]['data'][0] != -1.0:
                    concat_array += [sourcefile+","+str(x)]

            for x in range(len(slide['ifds'])-1):
                reference_array += ["NDPI_REFERENCE,"+str(x)]
            print('line 29')

            # remove the label image
            tifftools.tiff_concat(concat_array, temporaryfile)
            print('line 32')

            # remove the Reference tags
            tifftools.tiff_set(temporaryfile, targetfile, unset=reference_array)
            os.remove(temporaryfile)

            print('line36')

print("completed")

I get the following console output and error message (with the file location replaced by *** to protect privacy):

runfile('C:/***/NDPI_Anon.py', wdir='C:/***')
line 10
line 12
line 14
line 16
Traceback (most recent call last):

  File ~\anaconda3\Lib\site-packages\spyder_kernels\py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File c:\***\ndpi_anon.py:17
    slide = tifftools.read_tiff(sourcefile)

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:115 in read_tiff
    nextifd = read_ifd(tiff, info, nextifd, info['ifds'])

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:211 in read_ifd
    read_ifd_tag_data(tiff, info, ifd, tagSet)

  File ~\anaconda3\Lib\site-packages\tifftools\tifftools.py:250 in read_ifd_tag_data
    taginfo['data'] = list(struct.unpack(

MemoryError

I put a few print statements in to see where it gets bogged down - it seems to be an issue with tifftools.read_tiff.

I'm running this in Spyder through Anaconda on a Windows 11 machine with 16gb of RAM, if that matters.

Has anyone run into this issue before with large image files, or have suggestions on how I may be able to resolve this?

Share Improve this question asked Mar 27 at 0:01 user30073632 33 bronze badges

If you directly try just one of the largest files, does it still give the error? I'm guessing it will but in case it works there might be something you can do with sorcing garbage collection. – JonSG Commented Mar 27 at 1:03
1 I tried just running one large file at a time. It still didn't work. I restarted my computer and tried again - that didn’t work either. – user30073632 Commented Mar 27 at 2:00
Conceptually, it seems possible to me to stream read and write the file and modify the metadata as you go, but I don't know the till file layout at all so I can't advise you there. – JonSG Commented Mar 27 at 14:20

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

This is simply a memory problem. I suggest you:

Get more RAM. Even if it's 16GB, the error code clearly says it's not enough.
Resizing the largest ones with the size 4-th largest picture in the dataset. This will give the most minimal impact to what you do.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

Error working with large image files (>5.8 GB) in Python - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)