I want a list of directories with the amount of mp3
files in it; then sorted desc - simply to see, which directories contain the most files.
My command
## Relevant command
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) {printf(n[i]" "i" \n")};}' > ./foo.txt
sort -rno ./foo.txt ./foo.txt
## Full command (output improvements only)
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) {printf("%03d",n[i]);printf(" ");printf(substr(i,0,60));printf("\n")}; if(length(n)==0) print "NO mp3 found." }' > ./foo.txt
sort -rno ./foo.txt ./foo.txt
Directory structure
./dir_1/fileA.mp3
./dir_2/subdir_1/fileB.mp3
./dir_2/subdir_2/fileC.mp3
./dir_2/subdir_2/fileD.mp3
...
Output
# What I get:
003 dir_2
001 dir_1
# What I want:
002 dir_2/subdir_2
001 dir_2/subdir_1
001 dir_1
The Problem
It only prints the topmost directories, not the deepest possible. It sums up the mp3 count of subdirs.
I cant increase -mindepth
because the depth varies.
It would be okay to have both, like this:
003 dir_2
002 dir_2/subdir_2
001 dir_2/subdir_1
001 dir_1
I tried the find -links 2
argument but it only works for -type d
not -type f
.
I want a list of directories with the amount of mp3
files in it; then sorted desc - simply to see, which directories contain the most files.
My command
## Relevant command
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) {printf(n[i]" "i" \n")};}' > ./foo.txt
sort -rno ./foo.txt ./foo.txt
## Full command (output improvements only)
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
awk -F/ -vRS='\0' '{n[$1]++}; END{for (i in n) {printf("%03d",n[i]);printf(" ");printf(substr(i,0,60));printf("\n")}; if(length(n)==0) print "NO mp3 found." }' > ./foo.txt
sort -rno ./foo.txt ./foo.txt
Directory structure
./dir_1/fileA.mp3
./dir_2/subdir_1/fileB.mp3
./dir_2/subdir_2/fileC.mp3
./dir_2/subdir_2/fileD.mp3
...
Output
# What I get:
003 dir_2
001 dir_1
# What I want:
002 dir_2/subdir_2
001 dir_2/subdir_1
001 dir_1
The Problem
It only prints the topmost directories, not the deepest possible. It sums up the mp3 count of subdirs.
I cant increase -mindepth
because the depth varies.
It would be okay to have both, like this:
003 dir_2
002 dir_2/subdir_2
001 dir_2/subdir_1
001 dir_1
I tried the find -links 2
argument but it only works for -type d
not -type f
.
3 Answers
Reset to default 5Except for formatting the count and the error message, awk doesn't seem to be needed.
Since you seem to be using GNU utilities which generally accept a nul delimiter option, if you use %h
instead of %P
, you can count the directories directly. For example:
find . -type f -iname '*.mp3' -printf '%h\0' |
sort -z |
uniq -zc |
sort -zrn |
tr '\0' '\n'
Or explicitly using gawk
for its ability to sort the array itself, and to format the counts:
find . -type f -iname '*.mp3' -printf '%h\0' |
gawk -v RS='\0' '
{n[$0]++}
END {
PROCINFO["sorted_in"]="@val_num_desc"
if (NR) for (i in n) printf "%03d %s\n",n[i],i
else print "NO mp3 found."
}
'
Setup:
mkdir -p dir_{1..2} dir_2/subdir_{1..2}
touch ./dir_1/fileA.mp3 ./dir_2/subdir_1/fileB.mp3 ./dir_2/subdir_2/file{C,D}.mp3
One awk
approach using OP's \0
terminated filenames:
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
awk -vRS='\0' '
{ match($0,/\/[^/]+$/) # find last "/" plus file name
count[substr($0,1,RSTART-1)]++ # strip off directory name(s) and use an index in count[] array
}
END {
if (NR==0)
print "NO mp3 found."
else
for (dir in count)
printf "%03d %s\n",count[dir],dir
}'
This generates:
001 dir_1
001 dir_2/subdir_1
002 dir_2/subdir_2
Piping the output to sort -rn
generates:
002 dir_2/subdir_2
001 dir_2/subdir_1
001 dir_1
If we remove all mp3
files and run again this generates:
NO mp3 found.
Suppose you have this file tree:
$ tree .
.
├── dir_1
│ └── fileA.mp3
└── dir_2
├── subdir_1
│ ├── fileB.mp3
│ ├── sub_subdir_1
│ └── sub_subdir_2
└── subdir_2
├── fileC.mp3
├── fileD.mp3
├── sub_subdir_1
└── sub_subdir_2
├── fileE.mp3
└── fileF.mp3
9 directories, 6 files
You can use Ruby to process the NUL
delimited output from find
:
find . -mindepth 1 -type f -iname "*.mp3" -printf '%P\0' |
ruby -e '$<.read.split(/\0/).
map{|s| File.split(s) }.
group_by{|d,f| d}.
map{|k,v| [k, v.length]}.
sort_by{|k,v| -v}.
each{|k,v| printf("%03i\t%s\n", v, k)}
'
# On BSD it is
find . -mindepth 1 -type f -iname '*.mp3' -print0 | ...
Output:
002 ./dir_2/subdir_2
002 ./dir_2/subdir_2/sub_subdir_2
001 ./dir_2/subdir_1
001 ./dir_1
Or use Ruby's ability to walk the directory recursively directly:
ruby -e 'Dir.glob("#{ARGV[0]}/**/*.mp3").
map{|s| File.split(s) }.
group_by{|d,f| d}.
map{|k,v| [k, v.length]}.
sort_by{|k,v| -v}.
each{|k,v| printf("%03i\t%s\n", v, k)}
' .
# same output
One your desired outputs seemed to be a total for all the files below each directory point. You can use this Ruby to do that:
ruby -e '
col=Hash.new
Dir.glob("#{ARGV[0]}/**/*.mp3").map{|s| File.split(s) }.
each{|p,_| col[p]=Dir.glob("#{p}/**/*.mp3").length }
col.sort_by{|k, v| -v }.each{|k, v|
puts "#{v}\t#{k}"
}' .
Output:
4 ./dir_2/subdir_2
2 ./dir_2/subdir_2/sub_subdir_2
1 ./dir_1
1 ./dir_2/subdir_1
$1
is not the directory. If you change{n[$1]++}
to{$NF=""; n[$0]++}
, for example, it may work as you intended – jhnc Commented 2 days ago-v OFS="/"
if you were going to do$NF=""
as @jhnc suggests otherwise every/
in the directory path would be changed to a blank. – Ed Morton Commented yesterday