I have a list of files I store in an array :
file1.txt file2.txt file3.txt file4.txt file5.txt file6.txt file7.txt file8.txt
declare -a files=( *.txt )
I need to iterate every two files. I can do that with :
for (( i=0; i<${#files[@]} ; i+=2 )) ; do
echo "${files[i]}" "${files[i+1]}"
done
file1.txt file2.txt
file3.txt file4.txt
file5.txt file6.txt
file7.txt file8.txt
I would like to parallelize that, using the $SLURM_ARRAY_TASK_ID
variable, in order to get one task by combination. Any idea, or another way rather than storing the files in an array?
EDIT :
...
#SBATCH --array=1-4
declare -a files=( *.txt )
index1=$((2 * SLURM_ARRAY_TASK_ID - 2))
index2=$((2 * SLURM_ARRAY_TASK_ID - 1))
if [[ -f "${files[index1]}" && -f "${files[index2]}" ]]; then
echo "${files[index1]} & ${files[index2]}"
mycommand ${files[index1]}" ${files[index2]}"
I will need to adapt the --array
value, according to the number of combinations I have.
I have a list of files I store in an array :
file1.txt file2.txt file3.txt file4.txt file5.txt file6.txt file7.txt file8.txt
declare -a files=( *.txt )
I need to iterate every two files. I can do that with :
for (( i=0; i<${#files[@]} ; i+=2 )) ; do
echo "${files[i]}" "${files[i+1]}"
done
file1.txt file2.txt
file3.txt file4.txt
file5.txt file6.txt
file7.txt file8.txt
I would like to parallelize that, using the $SLURM_ARRAY_TASK_ID
variable, in order to get one task by combination. Any idea, or another way rather than storing the files in an array?
EDIT :
...
#SBATCH --array=1-4
declare -a files=( *.txt )
index1=$((2 * SLURM_ARRAY_TASK_ID - 2))
index2=$((2 * SLURM_ARRAY_TASK_ID - 1))
if [[ -f "${files[index1]}" && -f "${files[index2]}" ]]; then
echo "${files[index1]} & ${files[index2]}"
mycommand ${files[index1]}" ${files[index2]}"
I will need to adapt the --array
value, according to the number of combinations I have.
1 Answer
Reset to default 0The way you did it is approximately correct, but also potentially flaky. You're globbing *.txt
independently in each Slurm process, and that will fail if the list of .txt files changes while your processes are starting up.
A better way (which is also more efficient) is to make your Slurm launcher script print the globbed filenames into a text file with two filenames per line. Then pass that file to every process and let those processes use SLURM_ARRAY_TASK_ID to select one line from the file (which contains the two filenames that process should act on).
echo
with your command and append at the end a space and&
? – Cyrus Commented Mar 21 at 20:06$SLURM_ARRAY_TASK_ID
withparallel
? You meanmy_command "${files[i]}" "${files[i+1]}" &
? – pedro Commented Mar 21 at 20:27