my colleague and I both use the same slurm-based cluster. I use nextflow daily on the same server without any problem. He uses snakemake+slurm daily on the same server. Today, he tried to use a NF workflow for the first time using my config and my main.nf file.
But on his side it looks like the jobs are marked as completed, without an exit status, without a '.exit' file (the .exit file is created later, when the job has ended, see below).
Feb-14 14:32:09.928 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 6208569; id: 49; name: MAKE_MINI_BAM (PCRFree); status: COMPLETED; exit: -; error: -; workDir:path/to/nf-workdir/9b/ad1fedb4a9b1f37e07629735f35987 started: 1739539509896; exited: -; ]
Furthermore
and when we look at sacct, the job is still running (?)
$ sacct --cluster nautilus -j 6208569
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
6208569 nf-MAKE_M+ standard thorax 2 RUNNING 0:0
6208569.bat+ batch thorax 2 RUNNING 0:0
6208569.ext+ extern thorax 2 RUNNING 0:0
and in the .nextflow.log there is this warning: " Invalid user: ?`"
Feb-17 14:50:22.215 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: `squeue: error: Invalid user: ?`
Feb-17 14:50:22.215 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: ``
Feb-17 14:51:22.275 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: `squeue: error: Invalid user: ?`
Feb-17 14:51:22.276 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: ``
On my side , there is no problem. what can be the source of this problem ? thanks !
PS: I don't have any specific config hidden in my home PS2: I also asked the NF slack
my colleague and I both use the same slurm-based cluster. I use nextflow daily on the same server without any problem. He uses snakemake+slurm daily on the same server. Today, he tried to use a NF workflow for the first time using my config and my main.nf file.
But on his side it looks like the jobs are marked as completed, without an exit status, without a '.exit' file (the .exit file is created later, when the job has ended, see below).
Feb-14 14:32:09.928 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[jobId: 6208569; id: 49; name: MAKE_MINI_BAM (PCRFree); status: COMPLETED; exit: -; error: -; workDir:path/to/nf-workdir/9b/ad1fedb4a9b1f37e07629735f35987 started: 1739539509896; exited: -; ]
Furthermore
and when we look at sacct, the job is still running (?)
$ sacct --cluster nautilus -j 6208569
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
6208569 nf-MAKE_M+ standard thorax 2 RUNNING 0:0
6208569.bat+ batch thorax 2 RUNNING 0:0
6208569.ext+ extern thorax 2 RUNNING 0:0
and in the .nextflow.log there is this warning: " Invalid user: ?`"
Feb-17 14:50:22.215 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: `squeue: error: Invalid user: ?`
Feb-17 14:50:22.215 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: ``
Feb-17 14:51:22.275 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: `squeue: error: Invalid user: ?`
Feb-17 14:51:22.276 [Task monitor] DEBUG nextflow.executor.SlurmExecutor - [SLURM] invalid status line: ``
On my side , there is no problem. what can be the source of this problem ? thanks !
PS: I don't have any specific config hidden in my home PS2: I also asked the NF slack https://nextflow.slack/archives/C02T98A23U7/p1739540301422199
Share Improve this question edited Feb 17 at 16:03 Pierre asked Feb 17 at 15:42 PierrePierre 35.3k32 gold badges119 silver badges196 bronze badges2 Answers
Reset to default 1Nextflow polls the SLURM queue using:
squeue --noheader -o "%i %t" -t all -u <username>
In the first instance, I would have your colleague try this command with his username. Internally, Nextflow checks System.getProperty('user.name')
to get the username and appends it to the squeue
command1. Check also to see if $USER
is set in the environment Nextflow is run. It might not be being set when starting an interactive session for example.
in the end, that was 'just' a problem with the java instance installed alongside nextflow with conda/mamba. The NF was using the wrong local version of java. I asked my collaborator to install both softwares, to setup PATH and JAVA_HOME and everything went fine.