最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - glob function in glob module: get files also from subfolders - Stack Overflow

programmeradmin0浏览0评论

I need to create a Python script that finds all PDF files within a directory and its subdirectories.

For example: If I provide the path: D:\Documents\ I also want the list of PDF files from the path D:\Documets\new\

When I wrote:

from glob import glob
files = glob('D:/*.pdf')

I got all the files that are in the D:\ path but not the files that are in the folders that are in the D:\ path And when I tried:

from glob import glob
files = glob('D:/*/*.pdf')

I got the list of files found inside all the subfolders of D:\ (For example: D:\Documents) but not the files inside the parent path (D:\) nor the files inside the subfolders of the subfolders (For example: D:\documents\January).

I tried:

from glob import glob
files = glob('D:/*.pdf', recursive=True)

But it returned me the list of files in the D:\ path and not beyond that.

I need to create a Python script that finds all PDF files within a directory and its subdirectories.

For example: If I provide the path: D:\Documents\ I also want the list of PDF files from the path D:\Documets\new\

When I wrote:

from glob import glob
files = glob('D:/*.pdf')

I got all the files that are in the D:\ path but not the files that are in the folders that are in the D:\ path And when I tried:

from glob import glob
files = glob('D:/*/*.pdf')

I got the list of files found inside all the subfolders of D:\ (For example: D:\Documents) but not the files inside the parent path (D:\) nor the files inside the subfolders of the subfolders (For example: D:\documents\January).

I tried:

from glob import glob
files = glob('D:/*.pdf', recursive=True)

But it returned me the list of files in the D:\ path and not beyond that.

Share Improve this question asked Nov 20, 2024 at 9:45 libo navonlibo navon 111 silver badge3 bronze badges 1
  • 2 docs.python./3/library/glob.html - recursive = True requires ** in path to have any effect. – matszwecja Commented Nov 20, 2024 at 9:46
Add a comment  | 

2 Answers 2

Reset to default 1

The correct pattern is 'D:/**/*.pdf' and the correct code is

from glob import glob
files = glob('D:/**/*.pdf', recursive=True)

You have to use ** as a wildcard for the directories, if you want all files in all nested subdirectories. ** means any number (zero or more) of directories. * is a wildcard for a single directory or file.

If recursive is true, the pattern “**” will match any files and zero or more directories, subdirectories and symbolic links to directories. If the pattern is followed by an os.sep or os.altsep then files will not match.

https://docs.python./3/library/glob.html

Another option is to use the rglob() functionality of a pathlib.Path object.

For example:

from pathlib import Path

for pdf in Path(".").rglob("*.pdf"):
    print(pdf)

...will print all filenames ending with .pdf from the current working directory and any sub-directories

发布评论

评论列表(0)

  1. 暂无评论