最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

json - Search specific folder and sort directories by size, modification date, or path - Stack Overflow

programmeradmin1浏览0评论

This Bash script searches for directories named node_modules (or a specified folder) within the current working directory and categorizes them based on their size, last modification date, and path.

The problem is that sorting is not working, especially by the size of the files. Sorting by size must be decreasing from the largest to the smallest.

#!/bin/bash

start_time=$(date +%s.%N)

find_dir="node_modules"
sort_by="path"

while [[ "$1" =~ ^- ]]; do
  case $1 in
    -t|--target)
      find_dir="$2"
      shift 2
      ;;
    -s|--sort)
      sort_by="$2"
      shift 2
      ;;
    *)
      echo "Invalid option: $1"
      exit 1
      ;;
  esac
done

dirs=$(find $(pwd) -type d -name "$find_dir" 2>/dev/null)
json="\"paths\": ["

total_size_kb=0

declare -a results

for dir in $dirs; do
  parent_dir=$(dirname "$dir")
  
  if [[ ! "$parent_dir" =~ /$find_dir/ ]]; then
    last_mod=$(stat -f "%Sm" -t "%d/%m/%Y %H:%M:%S" "$dir")
    size_kb=$(du -sk "$dir" | awk '{print $1}')
    
    total_size_kb=$((total_size_kb + size_kb))

    size_mb=$(echo "scale=2; $size_kb/1024" | bc)
    
    if (( $(echo "$size_mb < 1" | bc -l) )); then
      size=$(echo "scale=2; $size_kb" | bc)
      size="${size} KB"
    elif (( $(echo "$size_mb >= 1024" | bc -l) )); then
      size=$(echo "scale=2; $size_mb/1024" | bc)
      size="${size} GB"
    else
      size="${size_mb} MB"
    fi

    results+=("{\"path\": \"$dir\", \"last_mod\": \"$(date -r "$dir" -u +%dd)\", \"size\": \"$size\"}")
  fi
done

if [[ "$sort_by" == "size" ]]; then
  results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 10 -n -r)
elif [[ "$sort_by" == "path" ]]; then
  results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 4)
elif [[ "$sort_by" == "last-mod" ]]; then
  results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 8)
fi

json="${json}$(echo "$results" | tr '\n' ',' | sed 's/,$//')"

json="${json}]"

end_time=$(date +%s.%N)
elapsed_time=$(echo "$end_time - $start_time" | bc)

total_size_mb=$(echo "scale=2; $total_size_kb/1024" | bc)

json="{
  \"releasable_space\": \"${total_size_mb} MB\", 
  \"search_completed\": \"$(echo $elapsed_time | cut -d'.' -f1)s\",
  ${json}
}"

echo "$json"

This Bash script searches for directories named node_modules (or a specified folder) within the current working directory and categorizes them based on their size, last modification date, and path.

The problem is that sorting is not working, especially by the size of the files. Sorting by size must be decreasing from the largest to the smallest.

#!/bin/bash

start_time=$(date +%s.%N)

find_dir="node_modules"
sort_by="path"

while [[ "$1" =~ ^- ]]; do
  case $1 in
    -t|--target)
      find_dir="$2"
      shift 2
      ;;
    -s|--sort)
      sort_by="$2"
      shift 2
      ;;
    *)
      echo "Invalid option: $1"
      exit 1
      ;;
  esac
done

dirs=$(find $(pwd) -type d -name "$find_dir" 2>/dev/null)
json="\"paths\": ["

total_size_kb=0

declare -a results

for dir in $dirs; do
  parent_dir=$(dirname "$dir")
  
  if [[ ! "$parent_dir" =~ /$find_dir/ ]]; then
    last_mod=$(stat -f "%Sm" -t "%d/%m/%Y %H:%M:%S" "$dir")
    size_kb=$(du -sk "$dir" | awk '{print $1}')
    
    total_size_kb=$((total_size_kb + size_kb))

    size_mb=$(echo "scale=2; $size_kb/1024" | bc)
    
    if (( $(echo "$size_mb < 1" | bc -l) )); then
      size=$(echo "scale=2; $size_kb" | bc)
      size="${size} KB"
    elif (( $(echo "$size_mb >= 1024" | bc -l) )); then
      size=$(echo "scale=2; $size_mb/1024" | bc)
      size="${size} GB"
    else
      size="${size_mb} MB"
    fi

    results+=("{\"path\": \"$dir\", \"last_mod\": \"$(date -r "$dir" -u +%dd)\", \"size\": \"$size\"}")
  fi
done

if [[ "$sort_by" == "size" ]]; then
  results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 10 -n -r)
elif [[ "$sort_by" == "path" ]]; then
  results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 4)
elif [[ "$sort_by" == "last-mod" ]]; then
  results=$(for r in "${results[@]}"; do echo "$r"; done | sort -t '"' -k 8)
fi

json="${json}$(echo "$results" | tr '\n' ',' | sed 's/,$//')"

json="${json}]"

end_time=$(date +%s.%N)
elapsed_time=$(echo "$end_time - $start_time" | bc)

total_size_mb=$(echo "scale=2; $total_size_kb/1024" | bc)

json="{
  \"releasable_space\": \"${total_size_mb} MB\", 
  \"search_completed\": \"$(echo $elapsed_time | cut -d'.' -f1)s\",
  ${json}
}"

echo "$json"
Share Improve this question edited 2 days ago John Kugelman 362k69 gold badges548 silver badges594 bronze badges asked 2 days ago PaulPaul 4,43815 gold badges65 silver badges152 bronze badges 11
  • 1 This would be more robust, faster and easier with something with native support for JSON and good OS support. Ruby, Perl, Python, JS come to mind. – dawg Commented 2 days ago
  • @dawg: I need something easy to run on any type of machine without having to install additional components. It works, it just doesn't work properly when sorting. – Paul Commented 2 days ago
  • 1 Well, stat -f format is BSD-specific so it'll only work on BSD computers. What OS do you target? macOS? – Fravadona Commented 2 days ago
  • At the moment mac os, but if it works without problems on mac os, linux and window, it would be great. – Paul Commented 2 days ago
  • 1 You cannot get the modification time of a file in a portable way, and there's the problem of generating JSON with proper escaping. Perl and Python can do that natively, and are installed by default on macOS and almost any Linux/WSL. – Fravadona Commented 2 days ago
 |  Show 6 more comments

1 Answer 1

Reset to default 0

Here's an attempt at refactoring your code with Python 2/3. The dependencies are part of the Standard Library so they're available with any Python installation:

import os, sys, fnmatch, time, json, argparse

The downside of not using any external libraries (on top of being compatible with Python 2 & 3) is that you have to reinvent the wheel. For example "humanizing" a size in bytes or recursively "finding" the files in a directory:

def humanize_date(timestamp):
    return time.strftime("%d/%m/%Y %T", time.localtime(timestamp))

def humanize_size(size):
    size = float(size);
    for unit in ("B", "KiB", "MiB", "GiB", "TiB", "PiB", "EiB", "ZiB", "YiB"):
        if size < 1024.0:
           size = round(size, 2)
           return ("%d %s" if size.is_integer() else "%.2f %s") % (size, unit)
        size /= 1024.0

def find(path, name = "*"):
    if os.path.lexists(path):
        if fnmatch.fnmatch(os.path.basename(path), name):
            yield path
        if not os.path.islink(path) or path.endswith("/"):
            for (rootpath, dirnames, filenames) in os.walk(path):
                for direntry in (dirnames + filenames):
                    if fnmatch.fnmatch(direntry, name):
                        yield os.path.join(rootpath, direntry)

Now comes the most important function for implementing the logic; it takes a directory as argument and returns a dict inspired from os.stat_result with its st_size and st_mtime keys changed to "the sum of the size of all files in the directory" and "the modification time of the most recently modified file" respectively:

def dstat(path):
    result = None
    for direntry in find(path):
        stats = os.lstat(direntry)
        if result == None:
            result = {k: getattr(stats, k) for k in dir(stats) if k.startswith("st_")}
            continue
        result["st_size"] += stats.st_size
        if stats.st_mtime > result["st_mtime"]:
            result["st_mtime"] = stats.st_mtime
    return result

note: dstat stands for "directory stat" and also "dict stat"


Then the "main program" just needs to parse the command-line, sort the results and output a JSON:

cli = argparse.ArgumentParser(description='Dummy npkill implementation that outputs JSON')
cli.add_argument('-d', '--directory', default='.', help='Set the directory from which to begin searching (defaults to ".")')
cli.add_argument('-s', '--sort', required=False, choices=['size', 'path', 'last-mod'], help='Sort results by: "size", "path" or "last-mod"')
cli.add_argument('-t', '--target', default='node_modules', help='Specify the name of the directories you want to search (defaults to "node_modules")')

args = cli.parse_args()

results = [ (p, dstat(p)) for p in find(args.directory, name=args.target) ]

if args.sort != None:
    sort_key = (
        (lambda path,stats: path             ) if args.sort == 'path' else
        (lambda path,stats: stats["st_size"] ) if args.sort == 'size' else
        (lambda path,stats: stats["st_mtime"])
    )
    results = sorted(results, key = sort_key)

results = [
    {
        "path": path,
        "last_mod": humanize_date(stats["st_mtime"]),
        "size": humanize_size(stats["st_size"]),
    }
    for path, stats in results
]

print(json.JSONEncoder().encode(results))

A few thoughts

The problem that you have with the sorting of the dates is that you're trying to compare strings that do not reflect the correct ordering; for eg. why would 21/01/2003 be "lesser" than 20/12/2024? You need to use use numbers (seconds since EPOCH) for the comparisons and convert them to your date format after the sorting.

A difference I can see between du -sb and dstat_result["st_size"] is that my dstat will sum the size of hard-linked files while du won't.

I didn't implement the elapsed time nor the recoverable size, as it isn't part of the main logic required by the program; though I still added the argument parsing ;-)

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论