最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

bash - Count files in subdirectories and list the deepest directories - Stack Overflow

programmeradmin1浏览0评论

I want a list of directories with the amount of mp3 files in it; then sorted desc - simply to see, which directories contain the most files.


My command

## Relevant command
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
  awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) {printf(n[i]" "i" \n")};}' > ./foo.txt

sort -rno ./foo.txt ./foo.txt
## Full command (output improvements only)
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
  awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) {printf("%03d",n[i]);printf("   ");printf(substr(i,0,60));printf("\n")}; if(length(n)==0) print "NO mp3 found." }' > ./foo.txt
sort -rno ./foo.txt ./foo.txt

Directory structure

./dir_1/fileA.mp3
./dir_2/subdir_1/fileB.mp3
./dir_2/subdir_2/fileC.mp3
./dir_2/subdir_2/fileD.mp3
...

Output

# What I get:
003  dir_2
001  dir_1

# What I want:
002  dir_2/subdir_2
001  dir_2/subdir_1
001  dir_1

The Problem

It only prints the topmost directories, not the deepest possible. It sums up the mp3 count of subdirs.

I cant increase -mindepth because the depth varies.


It would be okay to have both, like this:

003  dir_2
002  dir_2/subdir_2
001  dir_2/subdir_1
001  dir_1

I tried the find -links 2 argument but it only works for -type d not -type f.

I want a list of directories with the amount of mp3 files in it; then sorted desc - simply to see, which directories contain the most files.


My command

## Relevant command
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
  awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) {printf(n[i]" "i" \n")};}' > ./foo.txt

sort -rno ./foo.txt ./foo.txt
## Full command (output improvements only)
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
  awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) {printf("%03d",n[i]);printf("   ");printf(substr(i,0,60));printf("\n")}; if(length(n)==0) print "NO mp3 found." }' > ./foo.txt
sort -rno ./foo.txt ./foo.txt

Directory structure

./dir_1/fileA.mp3
./dir_2/subdir_1/fileB.mp3
./dir_2/subdir_2/fileC.mp3
./dir_2/subdir_2/fileD.mp3
...

Output

# What I get:
003  dir_2
001  dir_1

# What I want:
002  dir_2/subdir_2
001  dir_2/subdir_1
001  dir_1

The Problem

It only prints the topmost directories, not the deepest possible. It sums up the mp3 count of subdirs.

I cant increase -mindepth because the depth varies.


It would be okay to have both, like this:

003  dir_2
002  dir_2/subdir_2
001  dir_2/subdir_1
001  dir_1

I tried the find -links 2 argument but it only works for -type d not -type f.

Share Improve this question edited 2 days ago Jonathan asked 2 days ago JonathanJonathan 2,0635 gold badges31 silver badges54 bronze badges 3
  • 1 the reason your code doesn't work is because $1 is not the directory. If you change {n[$1]++} to {$NF=""; n[$0]++}, for example, it may work as you intended – jhnc Commented 2 days ago
  • @EdMorton Thanks for this detail, I noticed that when trying to store the result in a variable. Yes, whitespaces may occur, a portable version would be great, of course! – Jonathan Commented yesterday
  • 2 Jonathan FYI you'd need to add -v OFS="/" if you were going to do $NF="" as @jhnc suggests otherwise every / in the directory path would be changed to a blank. – Ed Morton Commented yesterday
Add a comment  | 

3 Answers 3

Reset to default 5

Except for formatting the count and the error message, awk doesn't seem to be needed.

Since you seem to be using GNU utilities which generally accept a nul delimiter option, if you use %h instead of %P, you can count the directories directly. For example:

find . -type f -iname '*.mp3' -printf '%h\0' |
sort -z |
uniq -zc |
sort -zrn |
tr '\0' '\n'

Or explicitly using gawk for its ability to sort the array itself, and to format the counts:

find . -type f -iname '*.mp3' -printf '%h\0' |
gawk -v RS='\0' '
    {n[$0]++}
    END {
        PROCINFO["sorted_in"]="@val_num_desc"
        if (NR) for (i in n) printf "%03d   %s\n",n[i],i
        else print "NO mp3 found."
    }
'

Setup:

mkdir -p dir_{1..2} dir_2/subdir_{1..2}
touch ./dir_1/fileA.mp3 ./dir_2/subdir_1/fileB.mp3 ./dir_2/subdir_2/file{C,D}.mp3

One awk approach using OP's \0 terminated filenames:

 find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
 awk -vRS='\0' '
 { match($0,/\/[^/]+$/)                      # find last "/" plus file name
   count[substr($0,1,RSTART-1)]++            # strip off directory name(s) and use an index in count[] array
 }
 END {
   if (NR==0)
      print "NO mp3 found."
   else
      for (dir in count)
          printf "%03d %s\n",count[dir],dir
 }'

This generates:

001 dir_1
001 dir_2/subdir_1
002 dir_2/subdir_2

Piping the output to sort -rn generates:

002 dir_2/subdir_2
001 dir_2/subdir_1
001 dir_1

If we remove all mp3 files and run again this generates:

NO mp3 found.

Suppose you have this file tree:

$ tree .
.
├── dir_1
│   └── fileA.mp3
└── dir_2
    ├── subdir_1
    │   ├── fileB.mp3
    │   ├── sub_subdir_1
    │   └── sub_subdir_2
    └── subdir_2
        ├── fileC.mp3
        ├── fileD.mp3
        ├── sub_subdir_1
        └── sub_subdir_2
            ├── fileE.mp3
            └── fileF.mp3

9 directories, 6 files 

You can use Ruby to process the NUL delimited output from find:

find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
ruby -e '$<.read.split(/\0/).
        map{|s| File.split(s) }.
        group_by{|d,f| d}.
        map{|k,v| [k, v.length]}.
        sort_by{|k,v| -v}.
        each{|k,v| printf("%03i\t%s\n", v, k)}
'   

# On BSD it is 
find . -mindepth 1 -type f -iname '*.mp3' -print0 | ...

Output:

002 ./dir_2/subdir_2
002 ./dir_2/subdir_2/sub_subdir_2
001 ./dir_2/subdir_1
001 ./dir_1

Or use Ruby's ability to walk the directory recursively directly:

ruby -e 'Dir.glob("#{ARGV[0]}/**/*.mp3").
        map{|s| File.split(s) }.
        group_by{|d,f| d}.
        map{|k,v| [k, v.length]}.
        sort_by{|k,v| -v}.
        each{|k,v| printf("%03i\t%s\n", v, k)}
' .  
# same output

One your desired outputs seemed to be a total for all the files below each directory point. You can use this Ruby to do that:

ruby -e '
col=Hash.new 
 Dir.glob("#{ARGV[0]}/**/*.mp3").map{|s| File.split(s) }.
    each{|p,_| col[p]=Dir.glob("#{p}/**/*.mp3").length }

col.sort_by{|k, v| -v }.each{|k, v| 
    puts "#{v}\t#{k}"
}' .

Output:

4   ./dir_2/subdir_2
2   ./dir_2/subdir_2/sub_subdir_2
1   ./dir_1
1   ./dir_2/subdir_1
发布评论

评论列表(0)

  1. 暂无评论