I have pairs of PDF files that I need to merge (each pair). The code works fine, but I don't understand why the second loop always starts with the second file of the pair. I thought it would start from the beginning. I was look at this Python - Why does the second for loop start from the second row, but still stuck. Is it same problem? I dont think so. I think it is somewhere here:
fileList1 = Path(folder).glob('*text1.pdf')
fileList2 = Path(folder).glob('*text2.pdf')
start code
#! python3
# merge PDF files based on a /numeric/string/ code in file name
# example1 r01_cz14_028_city_text1.PDF
# example2 r01_cz14_028_city_text2.PDF
import os
from pypdf import PdfWriter
from pathlib import Path
def mergeFiles(folder):
folder = os.path.abspath(folder) # make sure folder is absolute path
# make a lists
fileList1 = Path(folder).glob('*text1.pdf')
fileList2 = Path(folder).glob('*text2.pdf')
outputName = ''
outputFolderPath = folder + '\\somefolder'
p = Path(outputFolderPath)
if not p.exists():
os.makedirs(outputFolderPath)
n = 0
folderLenght = len(folder)+1
fileNameArea = folderLenght+12
print(f'Adding files in {outputFolderPath}...')
for filename1 in fileList1:
n += 1
# match test = SECOND LOOP
for filename2 in fileList2:
string1 = str(filename1)[folderLenght:fileNameArea].lower()
string2 = str(filename2)[folderLenght:fileNameArea].lower()
# Add choosen files in this folder to the PDF file by string.
if string1 == string2:
outputName = 'D' + \
str(filename1)[folderLenght + 4: folderLenght + 6].upper() + \
'_' + str(filename1)[folderLenght + 9: folderLenght + 12] + \
'_text3.pdf'
outputName = outputFolderPath + '\\' + outputName
file1Out = str(filename1)
file2Out = str(filename2)
pdfMerge([file1Out, file2Out], outputName) # function, works fine
break
print (f'{n}. {os.path.basename(outputName)}')
print('Done.')
mergeFiles('X:\\MergeTest')
I have pairs of PDF files that I need to merge (each pair). The code works fine, but I don't understand why the second loop always starts with the second file of the pair. I thought it would start from the beginning. I was look at this Python - Why does the second for loop start from the second row, but still stuck. Is it same problem? I dont think so. I think it is somewhere here:
fileList1 = Path(folder).glob('*text1.pdf')
fileList2 = Path(folder).glob('*text2.pdf')
start code
#! python3
# merge PDF files based on a /numeric/string/ code in file name
# example1 r01_cz14_028_city_text1.PDF
# example2 r01_cz14_028_city_text2.PDF
import os
from pypdf import PdfWriter
from pathlib import Path
def mergeFiles(folder):
folder = os.path.abspath(folder) # make sure folder is absolute path
# make a lists
fileList1 = Path(folder).glob('*text1.pdf')
fileList2 = Path(folder).glob('*text2.pdf')
outputName = ''
outputFolderPath = folder + '\\somefolder'
p = Path(outputFolderPath)
if not p.exists():
os.makedirs(outputFolderPath)
n = 0
folderLenght = len(folder)+1
fileNameArea = folderLenght+12
print(f'Adding files in {outputFolderPath}...')
for filename1 in fileList1:
n += 1
# match test = SECOND LOOP
for filename2 in fileList2:
string1 = str(filename1)[folderLenght:fileNameArea].lower()
string2 = str(filename2)[folderLenght:fileNameArea].lower()
# Add choosen files in this folder to the PDF file by string.
if string1 == string2:
outputName = 'D' + \
str(filename1)[folderLenght + 4: folderLenght + 6].upper() + \
'_' + str(filename1)[folderLenght + 9: folderLenght + 12] + \
'_text3.pdf'
outputName = outputFolderPath + '\\' + outputName
file1Out = str(filename1)
file2Out = str(filename2)
pdfMerge([file1Out, file2Out], outputName) # function, works fine
break
print (f'{n}. {os.path.basename(outputName)}')
print('Done.')
mergeFiles('X:\\MergeTest')
Share
Improve this question
asked 22 hours ago
PachoPacho
194 bronze badges
2
|
1 Answer
Reset to default 1Okey based on your code and question, the problem is the outer loop iterates through fileList1 (text1.pdf files). The inner loop iterates through fileList2 (text2.pdf files) to find a match based on the extracted string. When a match is found (string1 == string2), the code merges the corresponding files and then breaks out of the inner loop1.The important thing is that the next iteration of the outer loop will continue from the current position in fileList2, not from the beginning. So, the solution is easy you can move the
fileList2 = Path(folder).glob('*text2.pdf')
from outside the first for
loop to inside the first for
loop. This is because when break
was called, it exited the inner loop, but the iterator's position in fileList2 was maintained. So, you code now should look like this `
def mergeFiles(folder):
folder = os.path.abspath(folder) # make sure folder is absolute path
# make a lists
fileList1 = Path(folder).glob('*text1.pdf')
outputName = ''
outputFolderPath = folder + '\\somefolder'
p = Path(outputFolderPath)
if not p.exists():
os.makedirs(outputFolderPath)
n = 0
folderLenght = len(folder)+1
fileNameArea = folderLenght+12
print(f'Adding files in {outputFolderPath}...')
for filename1 in fileList1:
n += 1
fileList2 = Path(folder).glob('*text2.pdf') #Move fileList2 inside first loop
# match test = SECOND LOOP
for filename2 in fileList2:
string1 = str(filename1)[folderLenght:fileNameArea].lower()
string2 = str(filename2)[folderLenght:fileNameArea].lower()
# Add choosen files in this folder to the PDF file by string.
if string1 == string2:
outputName = 'D' + \
str(filename1)[folderLenght + 4: folderLenght + 6].upper() + \
'_' + str(filename1)[folderLenght + 9: folderLenght + 12] + \
'_text3.pdf'
outputName = outputFolderPath + '\\' + outputName
file1Out = str(filename1)
file2Out = str(filename2)
pdfMerge([file1Out, file2Out], outputName) # function, works fine
break
print (f'{n}. {os.path.basename(outputName)}')
print('Done.')
mergeFiles('X:\\MergeTest')
`
fileList2
is a generator, so you can only iterate over it once - on the second iteration of the outer loop, it will already be completely exhausted, and will produce no values. (fileList1
is also a generator, but that doesn't cause a problem because you only iterate it once.). You need to do something likefileList2 = list(fileList2)
that will convert the generator into something that can be iterated multiple times. – jasonharper Commented 22 hours ago