How can I efficiently measure average CPU usage for a group of processes (bash coprocs + their children) on linux?

BACKGROUND: I wrote a bash function called forkrun that parallelizes code for you in the same way that parallel or xargs -P does. It is faster than parallel, and similar in speed to but has more options than xargs -P. forkrun works by spawning a number of persistent bash coprocs, each of which run an infinite loop (until some end condition is met) that will read N lines worth of data (passed on stdin) and run those lines through whatever you are parallelizing.

GOAL: I am trying to determine the total cpu usage of all these coprocs combined. This needs to include the "overhead" cpu usage of the coproc running its loop and the total cumulative cpu usage of whatever it is running for you (which may or may not be a different PID, and could change on every loop iteration). So, I need to total cpu usage from all the coproc PID's + their children (and grandchildren, and great-grandchildren, and...).

END GOAL: I want to have forkrun dynamically determine how many coprocs to spawn based on runtime conditions. Part of my strategy for this involves figuring out + tracking how much cpu time (on average) each of these coprocs is taking up. The current implementation for "dynamic coproc spawning" does this by looking at total system load (by polling /proc/stat) before and after some coprocs are spawned, but this is very noisy since it is influenced by everything else that is happening in the system.''

IDEA: My initial idea was to use /proc/<PID>/stat for each coproc PID and pull out and sum the utime, stime, cutime and cstime fields. Unfortunately, this only takes into account CPU time for waited on children. i.e., you fork it and then call wait. It doesnt include the cpu time for things like "stuff run in subshells".

NOTE: I'd rather avoid using external tools for this. I spent a lot of effort making sure forkrun has virtually no dependencies - currently its only hard dependencies are a recent bash version, a mounted procfs, and some binaries for basic filesystem operations (rm, mkdir). If an external tool is absolutely required then fine, but im 99% sure I can pull this info out of procfs somehow.

Thanks in advance!

EDIT: here is an example of the code run by coprocs that I am trying to track cpu time for. I want to (from the process that forks these coprocs) figure out how much cpu time each coproc is using as they are running in order to dynamically determine whether or not to spawn more of these coprocs.

Thanks in advance!

Share Improve this question edited Feb 16 at 19:38 asked Jan 31 at 16:36 jkool702 293 bronze badges

1 Can you clarify if you are looking to aggregate the total CPU while the overall process is running, or only when the whole process tree terminate ? – dash-o Commented Feb 6 at 1:41
I'm trying to get the cpu usage as they are running. each of the processes I want to monitior are bash coprocs that (in a loop) read chunks of N lines from the data that were passed on stdin and then call whatever you are parallelizing using those N lines are arguments. This continues until you run out of data on stdin. The goal is to figure out how much CPU all of these processes are using (including the CPU usage of whatever they are repeatedly running in a loop) to figure out if a should fork more of these "worker coprocs" or not. – jkool702 Commented Feb 16 at 19:44

Add a comment |

2 Answers 2

Sorted by: Reset to default 1

The script used by OP is large (>2000 lines) and complex. Not practical to analyze - but from OP comments - it uses coproc, background processes, etc.

Possible path:

Modify the 'EXIT' trap to record CPU time from each terminate process (conditional on environment variable that will specify when to accumulate the log). At the end of the job, sum up the results

Running the script without the env var will not have any impact. When the envvar is set - the total stat will be recorded and show python/perl/awk script can aggregate the required measure - CPU, ...

trap '[ "$CPUSUM" ] && cat /proc/$$/stat >> $CPUSUM' exit

Or using helper function

func record_cpu {
    [ "$CPUSUM" ] && cat /proc/$$/stat >> $CPUSUM
}

trap record_cpu EXIT

Might have to apply the same to other signals, if code is using TERM and similar to coordinate work between coproc

Possible another solution - but will require access to the prctl - using "C" program, or use Python (prctl or ctypes), or Perl (Linux::prctl). Basic idea will be to run a small wrapper to will collect CPU from all childrens, including forked children, background children, etc.

Something like:

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/prctl.h>
#include <sys/types.h>
#include <sys/wait.h>

int main() {
    printf("Setting this process as a child subreaper...\n");

    if (prctl(PR_SET_CHILD_SUBREAPER, 1) != 0) {
        perror("prctl");
        return 1;
    } ;
    if ( fork() == 0 ) {
    // Child
        execvp(argv[0], argv[]) ;
    } ;
    int status ;
    while ( wait(&status) > 0 ) {
    } ;
    exit(0) ;
}

compile (cc -c child_wait.c -o child_wait

And execute time child_wait fork_run ...

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

How can I efficiently measure average CPU usage for a group of processes (bash coprocs + their children) on linux? - Stack Overf

2 Answers 2

与本文相关的文章

评论列表(0)