最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - Snakemake in cluster different ways - Stack Overflow

programmeradmin1浏览0评论

When running snakemake on a cluster, and if we don't have specific requirements for some rules about number of cores/memory, then what is the difference between :

  • Using the classic way, i.e. calling snakemake on the login node, telling it that executor is slurm and that we want X jobs with X cores each, optionally with a profile config file (1 job = 1 rule)
  • Using snakemake like a normal tool, and call it with srun inside a sbatch script without telling it that this is a slurm environment (1 job = whole pipeline)

Example of the second option:

#!/bin/bash
#SBATCH --job-name=test_sbatch_snakemake
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64

srun snakemake --cores 64                       \
               --latency-wait 30                \
               --nolock                         \
               --configfile configs/S_with_N.yaml

I have a pipeline where I don't have any specific capacity requirement for my rules (I just want the maximum of them running in parallel), and I think the second option is easier to implement.

When running snakemake on a cluster, and if we don't have specific requirements for some rules about number of cores/memory, then what is the difference between :

  • Using the classic way, i.e. calling snakemake on the login node, telling it that executor is slurm and that we want X jobs with X cores each, optionally with a profile config file (1 job = 1 rule)
  • Using snakemake like a normal tool, and call it with srun inside a sbatch script without telling it that this is a slurm environment (1 job = whole pipeline)

Example of the second option:

#!/bin/bash
#SBATCH --job-name=test_sbatch_snakemake
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=64

srun snakemake --cores 64                       \
               --latency-wait 30                \
               --nolock                         \
               --configfile configs/S_with_N.yaml

I have a pipeline where I don't have any specific capacity requirement for my rules (I just want the maximum of them running in parallel), and I think the second option is easier to implement.

Share Improve this question asked 2 days ago KiffikiffeKiffikiffe 1197 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

With the second way, your whole workflow will have to wait in the queue before individual rule instances can start. With the first way the resource demand will be spread across the jobs that snakemake will submit, so each rule has a chance to start earlier than what you would have to wait in the second way.

(I use a third way: I sbatch a snakemake command that uses slurm as executor. In our cluster, it is considered bad practice to run the main snakemake on the submit/login node. This main snakemake doesn't wait too much in the queue, because it doesn't have to claim a lot of resources.)

发布评论

评论列表(0)

  1. 暂无评论