This Bash script searches for directories named node_modules
(or a specified folder) within the current working directory and categorizes them based on their size, last modification date, and path.
The problem is that sorting is not working, especially by the size of the files. Sorting by size must be decreasing from the largest to the smallest.
#!/bin/bash
start_time=$(date +%s.%N)
find_dir="node_modules"
sort_by="path"
while [[ "$1" =~ ^- ]]; do
case $1 in
-t|--target)
find_dir="$2"
shift 2
;;
-s|--sort)
sort_by="$2"
shift 2
;;
*)
echo "Invalid option: $1"
exit 1
;;
esac
done
dirs=$(find $(pwd) -type d -name "$find_dir" 2>/dev/null)
json="\"paths\": ["
total_size_kb=0
declare -a results
for dir in $dirs; do
parent_dir=$(dirname "$dir")
if [[ ! "$parent_dir" =~ /$find_dir/ ]]; then
last_mod=$(stat -f "%Sm" -t "%d/%m/%Y %H:%M:%S" "$dir")
size_kb=$(du -sk "$dir" | awk '{print $1}')
total_size_kb=$((total_size_kb + size_kb))
size_mb=$(echo "scale=2; $size_kb/1024" | bc)
if (( $(echo "$size_mb < 1" | bc -l) )); then
size=$(echo "scale=2; $size_kb" | bc)
size="${size} KB"
elif (( $(echo "$size_mb >= 1024" | bc -l) )); then
size=$(echo "scale=2; $size_mb/1024" | bc)
size="${size} GB"
else
size="${size_mb} MB"
fi
results+=("{\"path\": \"$dir\", \"last_mod\": \"$(date -r "$dir" -u +%dd)\", \"size\": \"$size\"}")
fi
done
if [[ "$sort_by" == "size" ]]; then
results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 10 -n -r)
elif [[ "$sort_by" == "path" ]]; then
results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 4)
elif [[ "$sort_by" == "last-mod" ]]; then
results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 8)
fi
json="${json}$(echo "$results" | tr '\n' ',' | sed 's/,$//')"
json="${json}]"
end_time=$(date +%s.%N)
elapsed_time=$(echo "$end_time - $start_time" | bc)
total_size_mb=$(echo "scale=2; $total_size_kb/1024" | bc)
json="{
\"releasable_space\": \"${total_size_mb} MB\",
\"search_completed\": \"$(echo $elapsed_time | cut -d'.' -f1)s\",
${json}
}"
echo "$json"
This Bash script searches for directories named node_modules
(or a specified folder) within the current working directory and categorizes them based on their size, last modification date, and path.
The problem is that sorting is not working, especially by the size of the files. Sorting by size must be decreasing from the largest to the smallest.
#!/bin/bash
start_time=$(date +%s.%N)
find_dir="node_modules"
sort_by="path"
while [[ "$1" =~ ^- ]]; do
case $1 in
-t|--target)
find_dir="$2"
shift 2
;;
-s|--sort)
sort_by="$2"
shift 2
;;
*)
echo "Invalid option: $1"
exit 1
;;
esac
done
dirs=$(find $(pwd) -type d -name "$find_dir" 2>/dev/null)
json="\"paths\": ["
total_size_kb=0
declare -a results
for dir in $dirs; do
parent_dir=$(dirname "$dir")
if [[ ! "$parent_dir" =~ /$find_dir/ ]]; then
last_mod=$(stat -f "%Sm" -t "%d/%m/%Y %H:%M:%S" "$dir")
size_kb=$(du -sk "$dir" | awk '{print $1}')
total_size_kb=$((total_size_kb + size_kb))
size_mb=$(echo "scale=2; $size_kb/1024" | bc)
if (( $(echo "$size_mb < 1" | bc -l) )); then
size=$(echo "scale=2; $size_kb" | bc)
size="${size} KB"
elif (( $(echo "$size_mb >= 1024" | bc -l) )); then
size=$(echo "scale=2; $size_mb/1024" | bc)
size="${size} GB"
else
size="${size_mb} MB"
fi
results+=("{\"path\": \"$dir\", \"last_mod\": \"$(date -r "$dir" -u +%dd)\", \"size\": \"$size\"}")
fi
done
if [[ "$sort_by" == "size" ]]; then
results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 10 -n -r)
elif [[ "$sort_by" == "path" ]]; then
results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 4)
elif [[ "$sort_by" == "last-mod" ]]; then
results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 8)
fi
json="${json}$(echo "$results" | tr '\n' ',' | sed 's/,$//')"
json="${json}]"
end_time=$(date +%s.%N)
elapsed_time=$(echo "$end_time - $start_time" | bc)
total_size_mb=$(echo "scale=2; $total_size_kb/1024" | bc)
json="{
\"releasable_space\": \"${total_size_mb} MB\",
\"search_completed\": \"$(echo $elapsed_time | cut -d'.' -f1)s\",
${json}
}"
echo "$json"
Share
Improve this question
edited 2 days ago
John Kugelman
362k69 gold badges548 silver badges594 bronze badges
asked 2 days ago
PaulPaul
4,43815 gold badges65 silver badges152 bronze badges
11
|
Show 6 more comments
1 Answer
Reset to default 0Here's an attempt at refactoring your code with Python 2/3. The dependencies are part of the Standard Library so they're available with any Python installation:
import os, sys, fnmatch, time, json, argparse
The downside of not using any external libraries (on top of being compatible with Python 2 & 3) is that you have to reinvent the wheel. For example "humanizing" a size in bytes or recursively "finding" the files in a directory:
def humanize_date(timestamp):
return time.strftime("%d/%m/%Y %T", time.localtime(timestamp))
def humanize_size(size):
size = float(size);
for unit in ("B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"):
if size < 1024.0:
size = round(size, 2)
return ("%d %s" if size.is_integer() else "%.2f %s") % (size, unit)
size /= 1024.0
def find(path, name = "*"):
if os.path.lexists(path):
if fnmatch.fnmatch(os.path.basename(path), name):
yield path
if not os.path.islink(path) or path.endswith("/"):
for (rootpath, dirnames, filenames) in os.walk(path):
for direntry in (dirnames + filenames):
if fnmatch.fnmatch(direntry, name):
yield os.path.join(rootpath, direntry)
Now comes the most important function for implementing the logic; it takes a directory as argument and returns a dict
inspired from os.stat_result
with its st_size
and st_mtime
keys changed to "the sum of the size of all files in the directory" and "the modification time of the most recently modified file" respectively:
def dstat(path):
result = None
for direntry in find(path):
stats = os.lstat(direntry)
if result == None:
result = {k: getattr(stats, k) for k in dir(stats) if k.startswith("st_")}
continue
result["st_size"] += stats.st_size
if stats.st_mtime > result["st_mtime"]:
result["st_mtime"] = stats.st_mtime
return result
note: dstat
stands for "directory stat" and also "dict stat"
Then the "main program" just needs to parse the command-line, sort the results and output a JSON:
cli = argparse.ArgumentParser(description='Dummy npkill implementation that outputs JSON')
cli.add_argument('-d', '--directory', default='.', help='Set the directory from which to begin searching (defaults to ".")')
cli.add_argument('-s', '--sort', required=False, choices=['size', 'path', 'last-mod'], help='Sort results by: "size", "path" or "last-mod"')
cli.add_argument('-t', '--target', default='node_modules', help='Specify the name of the directories you want to search (defaults to "node_modules")')
args = cli.parse_args()
results = [ (p, dstat(p)) for p in find(args.directory, name=args.target) ]
if args.sort != None:
sort_key = (
(lambda path,stats: path ) if args.sort == 'path' else
(lambda path,stats: stats["st_size"] ) if args.sort == 'size' else
(lambda path,stats: stats["st_mtime"])
)
results = sorted(results, key = sort_key)
results = [
{
"path": path,
"last_mod": humanize_date(stats["st_mtime"]),
"size": humanize_size(stats["st_size"]),
}
for path, stats in results
]
print(json.JSONEncoder().encode(results))
A few thoughts
The problem that you have with the sorting of the dates is that you're trying to compare strings that do not reflect the correct ordering; for eg. why would 21/01/2003
be "lesser" than 20/12/2024
? You need to use use numbers (seconds since EPOCH) for the comparisons and convert them to your date format after the sorting.
A difference I can see between du -sb
and dstat_result["st_size"]
is that my dstat
will sum the size of hard-linked files while du
won't.
I didn't implement the elapsed time nor the recoverable size, as it isn't part of the main logic required by the program; though I still added the argument parsing ;-)
stat -f format
is BSD-specific so it'll only work on BSD computers. What OS do you target? macOS? – Fravadona Commented 2 days ago