I have pairs of PDF files that I need to merge (each pair). The code works fine, but I don't understand why the second loop always starts with the second file of the pair. I thought it would start from the beginning. I was look at this Python - Why does the second for loop start from the second row, but still stuck. Is it same problem? I dont think so. I think it is somewhere here:

fileList1 = Path(folder).glob('*text1.pdf')
fileList2 = Path(folder).glob('*text2.pdf')

start code

#! python3
# merge PDF files based on a /numeric/string/ code in file name
# example1 r01_cz14_028_city_text1.PDF
# example2 r01_cz14_028_city_text2.PDF
import os
from pypdf import PdfWriter
from pathlib import Path

def mergeFiles(folder):
    
    folder = os.path.abspath(folder)   # make sure folder is absolute path

    # make a lists
    fileList1 = Path(folder).glob('*text1.pdf')
    fileList2 = Path(folder).glob('*text2.pdf')

    outputName = ''
    outputFolderPath = folder + '\\somefolder'
    p = Path(outputFolderPath)
    if not p.exists():
            os.makedirs(outputFolderPath)
    n = 0
    folderLenght = len(folder)+1
    fileNameArea = folderLenght+12
    print(f'Adding files in {outputFolderPath}...')


    for filename1 in fileList1:
        n += 1
        
        # match test                   = SECOND LOOP
        for filename2 in fileList2:
            string1 = str(filename1)[folderLenght:fileNameArea].lower()
            string2 = str(filename2)[folderLenght:fileNameArea].lower()

            # Add choosen files in this folder to the PDF file by string.    
            if string1 == string2:
                outputName = 'D' + \
                str(filename1)[folderLenght + 4: folderLenght + 6].upper() + \
                '_' + str(filename1)[folderLenght + 9: folderLenght + 12] + \
                '_text3.pdf'

                outputName = outputFolderPath + '\\' + outputName
                file1Out = str(filename1)
                file2Out = str(filename2)
                pdfMerge([file1Out, file2Out], outputName)  #  function, works fine
                
                break
            
        print (f'{n}. {os.path.basename(outputName)}')
    
    print('Done.')

mergeFiles('X:\\MergeTest')

fileList1 = Path(folder).glob('*text1.pdf')
fileList2 = Path(folder).glob('*text2.pdf')

start code

#! python3
# merge PDF files based on a /numeric/string/ code in file name
# example1 r01_cz14_028_city_text1.PDF
# example2 r01_cz14_028_city_text2.PDF
import os
from pypdf import PdfWriter
from pathlib import Path

def mergeFiles(folder):
    
    folder = os.path.abspath(folder)   # make sure folder is absolute path

    # make a lists
    fileList1 = Path(folder).glob('*text1.pdf')
    fileList2 = Path(folder).glob('*text2.pdf')

    outputName = ''
    outputFolderPath = folder + '\\somefolder'
    p = Path(outputFolderPath)
    if not p.exists():
            os.makedirs(outputFolderPath)
    n = 0
    folderLenght = len(folder)+1
    fileNameArea = folderLenght+12
    print(f'Adding files in {outputFolderPath}...')


    for filename1 in fileList1:
        n += 1
        
        # match test                   = SECOND LOOP
        for filename2 in fileList2:
            string1 = str(filename1)[folderLenght:fileNameArea].lower()
            string2 = str(filename2)[folderLenght:fileNameArea].lower()

            # Add choosen files in this folder to the PDF file by string.    
            if string1 == string2:
                outputName = 'D' + \
                str(filename1)[folderLenght + 4: folderLenght + 6].upper() + \
                '_' + str(filename1)[folderLenght + 9: folderLenght + 12] + \
                '_text3.pdf'

                outputName = outputFolderPath + '\\' + outputName
                file1Out = str(filename1)
                file2Out = str(filename2)
                pdfMerge([file1Out, file2Out], outputName)  #  function, works fine
                
                break
            
        print (f'{n}. {os.path.basename(outputName)}')
    
    print('Done.')

mergeFiles('X:\\MergeTest')

Share Improve this question asked 22 hours ago Pacho 194 bronze badges

4 fileList2 is a generator, so you can only iterate over it once - on the second iteration of the outer loop, it will already be completely exhausted, and will produce no values. (fileList1 is also a generator, but that doesn't cause a problem because you only iterate it once.). You need to do something like fileList2 = list(fileList2) that will convert the generator into something that can be iterated multiple times. – jasonharper Commented 22 hours ago
@jasonharper thanks for insight. – Pacho Commented 2 hours ago

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

Okey based on your code and question, the problem is the outer loop iterates through fileList1 (text1.pdf files). The inner loop iterates through fileList2 (text2.pdf files) to find a match based on the extracted string. When a match is found (string1 == string2), the code merges the corresponding files and then breaks out of the inner loop1.The important thing is that the next iteration of the outer loop will continue from the current position in fileList2, not from the beginning. So, the solution is easy you can move the fileList2 = Path(folder).glob('*text2.pdf') from outside the first for loop to inside the first for loop. This is because when break was called, it exited the inner loop, but the iterator's position in fileList2 was maintained. So, you code now should look like this `

def mergeFiles(folder):

folder = os.path.abspath(folder)   # make sure folder is absolute path

# make a lists
fileList1 = Path(folder).glob('*text1.pdf')


outputName = ''
outputFolderPath = folder + '\\somefolder'
p = Path(outputFolderPath)
if not p.exists():
        os.makedirs(outputFolderPath)
n = 0
folderLenght = len(folder)+1
fileNameArea = folderLenght+12
print(f'Adding files in {outputFolderPath}...')


for filename1 in fileList1:
    n += 1
    fileList2 = Path(folder).glob('*text2.pdf') #Move fileList2 inside first loop
    # match test                   = SECOND LOOP
    for filename2 in fileList2:
        string1 = str(filename1)[folderLenght:fileNameArea].lower()
        string2 = str(filename2)[folderLenght:fileNameArea].lower()

        # Add choosen files in this folder to the PDF file by string.    
        if string1 == string2:
            outputName = 'D' + \
            str(filename1)[folderLenght + 4: folderLenght + 6].upper() + \
            '_' + str(filename1)[folderLenght + 9: folderLenght + 12] + \
            '_text3.pdf'

            outputName = outputFolderPath + '\\' + outputName
            file1Out = str(filename1)
            file2Out = str(filename2)
            pdfMerge([file1Out, file2Out], outputName)  #  function, works fine
            
            break
        
    print (f'{n}. {os.path.basename(outputName)}')

print('Done.')

mergeFiles('X:\\MergeTest')

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - why second for loop do not start from beginning - Stack Overflow

start code

start code

1 Answer 1

与本文相关的文章

评论列表(0)