最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

bash - For loop for every two files in an array using SLURM - Stack Overflow

programmeradmin3浏览0评论

I have a list of files I store in an array :

file1.txt  file2.txt  file3.txt  file4.txt  file5.txt  file6.txt  file7.txt  file8.txt

declare -a files=( *.txt )

I need to iterate every two files. I can do that with :

for (( i=0; i<${#files[@]} ; i+=2 )) ; do    
   echo "${files[i]}" "${files[i+1]}"
done

file1.txt file2.txt
file3.txt file4.txt
file5.txt file6.txt
file7.txt file8.txt

I would like to parallelize that, using the $SLURM_ARRAY_TASK_ID variable, in order to get one task by combination. Any idea, or another way rather than storing the files in an array?

EDIT :

...
#SBATCH --array=1-4

declare -a files=( *.txt )

index1=$((2 * SLURM_ARRAY_TASK_ID - 2))   
index2=$((2 * SLURM_ARRAY_TASK_ID - 1))

if [[ -f "${files[index1]}" && -f "${files[index2]}" ]]; then
    echo "${files[index1]} & ${files[index2]}"

mycommand ${files[index1]}" ${files[index2]}"

I will need to adapt the --array value, according to the number of combinations I have.

I have a list of files I store in an array :

file1.txt  file2.txt  file3.txt  file4.txt  file5.txt  file6.txt  file7.txt  file8.txt

declare -a files=( *.txt )

I need to iterate every two files. I can do that with :

for (( i=0; i<${#files[@]} ; i+=2 )) ; do    
   echo "${files[i]}" "${files[i+1]}"
done

file1.txt file2.txt
file3.txt file4.txt
file5.txt file6.txt
file7.txt file8.txt

I would like to parallelize that, using the $SLURM_ARRAY_TASK_ID variable, in order to get one task by combination. Any idea, or another way rather than storing the files in an array?

EDIT :

...
#SBATCH --array=1-4

declare -a files=( *.txt )

index1=$((2 * SLURM_ARRAY_TASK_ID - 2))   
index2=$((2 * SLURM_ARRAY_TASK_ID - 1))

if [[ -f "${files[index1]}" && -f "${files[index2]}" ]]; then
    echo "${files[index1]} & ${files[index2]}"

mycommand ${files[index1]}" ${files[index2]}"

I will need to adapt the --array value, according to the number of combinations I have.

Share Improve this question edited Mar 23 at 13:24 pedro asked Mar 21 at 19:53 pedropedro 5051 gold badge4 silver badges11 bronze badges 8
  • Have a look at GNU Parallel – pmf Commented Mar 21 at 19:56
  • Replace echo with your command and append at the end a space and &? – Cyrus Commented Mar 21 at 20:06
  • Can I use $SLURM_ARRAY_TASK_ID with parallel? You mean my_command "${files[i]}" "${files[i+1]}" & ? – pedro Commented Mar 21 at 20:27
  • @pedro: Yes, that's how it was meant. – Cyrus Commented Mar 21 at 21:14
  • 1 Thanks for the help. I edited the topic, with the way I found. There is probably a better solution, but that one works for me. – pedro Commented Mar 23 at 13:22
 |  Show 3 more comments

1 Answer 1

Reset to default 0

The way you did it is approximately correct, but also potentially flaky. You're globbing *.txt independently in each Slurm process, and that will fail if the list of .txt files changes while your processes are starting up.

A better way (which is also more efficient) is to make your Slurm launcher script print the globbed filenames into a text file with two filenames per line. Then pass that file to every process and let those processes use SLURM_ARRAY_TASK_ID to select one line from the file (which contains the two filenames that process should act on).

发布评论

评论列表(0)

  1. 暂无评论