I need to create a Python script that finds all PDF files within a directory and its subdirectories.
For example: If I provide the path: D:\Documents\
I also want the list of PDF files from the path D:\Documets\new\
When I wrote:
from glob import glob
files = glob('D:/*.pdf')
I got all the files that are in the D:\
path but not the files that are in the folders that are in the D:\
path
And when I tried:
from glob import glob
files = glob('D:/*/*.pdf')
I got the list of files found inside all the subfolders of D:\
(For example: D:\Documents
) but not the files inside the parent path (D:\
) nor the files inside the subfolders of the subfolders (For example: D:\documents\January
).
I tried:
from glob import glob
files = glob('D:/*.pdf', recursive=True)
But it returned me the list of files in the D:\ path and not beyond that.
I need to create a Python script that finds all PDF files within a directory and its subdirectories.
For example: If I provide the path: D:\Documents\
I also want the list of PDF files from the path D:\Documets\new\
When I wrote:
from glob import glob
files = glob('D:/*.pdf')
I got all the files that are in the D:\
path but not the files that are in the folders that are in the D:\
path
And when I tried:
from glob import glob
files = glob('D:/*/*.pdf')
I got the list of files found inside all the subfolders of D:\
(For example: D:\Documents
) but not the files inside the parent path (D:\
) nor the files inside the subfolders of the subfolders (For example: D:\documents\January
).
I tried:
from glob import glob
files = glob('D:/*.pdf', recursive=True)
But it returned me the list of files in the D:\ path and not beyond that.
Share Improve this question asked Nov 20, 2024 at 9:45 libo navonlibo navon 111 silver badge3 bronze badges 1 |2 Answers
Reset to default 1The correct pattern is 'D:/**/*.pdf'
and the correct code is
from glob import glob
files = glob('D:/**/*.pdf', recursive=True)
You have to use **
as a wildcard for the directories, if you want all files in all nested subdirectories. **
means any number (zero or more) of directories. *
is a wildcard for a single directory or file.
If
recursive
is true, the pattern“**”
will match any files and zero or more directories, subdirectories and symbolic links to directories. If the pattern is followed by anos.sep
oros.altsep
then files will not match.
https://docs.python./3/library/glob.html
Another option is to use the rglob() functionality of a pathlib.Path object.
For example:
from pathlib import Path
for pdf in Path(".").rglob("*.pdf"):
print(pdf)
...will print all filenames ending with .pdf from the current working directory and any sub-directories
recursive = True
requires**
in path to have any effect. – matszwecja Commented Nov 20, 2024 at 9:46