workflow - How best to handle snakemake jobs where program produce the same file names

I have an issue with a snakemake workflow. Due to limitations from a certain software, the output of one of my rule is always named SolFix.

I handle this by moving the file to a different named file based on the output of my rule

... [*command to generate SolFix*]; mv SolFix {output.SolFix_ebv}

Multiple rules within the same group {param.group} produces the SolFix file. Hence, there has been occassions where the SolFix file gets overwritten by parallel processes in the snakemake workflow before being moved.

Until now, I have addressed some of these issues by running the program within subfolders.

[mkdir -p subfolder_{params.breed}/rule_subfolder_{params.breed}]; [cd subfolder_{params.breed}/rule_subfolder_{params.breed}];[*command to generate SolFix*]; mv SolFix {output.SolFix_ebv}

However, this is becoming too subdivided.

I have also tried to limit these issues by creating dependencies based on what rules/jobs tend to finish around the same time. However, this only tends to serialize my workflow.

Are there alternative solutions (that scale) I can explore to resolve this issue.

I have an issue with a snakemake workflow. Due to limitations from a certain software, the output of one of my rule is always named SolFix.

I handle this by moving the file to a different named file based on the output of my rule

... [*command to generate SolFix*]; mv SolFix {output.SolFix_ebv}

Until now, I have addressed some of these issues by running the program within subfolders.

[mkdir -p subfolder_{params.breed}/rule_subfolder_{params.breed}]; [cd subfolder_{params.breed}/rule_subfolder_{params.breed}];[*command to generate SolFix*]; mv SolFix {output.SolFix_ebv}

However, this is becoming too subdivided.

I have also tried to limit these issues by creating dependencies based on what rules/jobs tend to finish around the same time. However, this only tends to serialize my workflow.

Are there alternative solutions (that scale) I can explore to resolve this issue.

Share Improve this question edited 2 days ago oguz ismail 50.8k16 gold badges57 silver badges78 bronze badges asked 2 days ago Damilola Decarls 295 bronze badges

I'll also go with sub-folders, can't think of a better way. Or mix-and-match, run a serialized workflow in each of several sub-folders, to more or less balance performance and awkwardness. – X Zhang Commented 2 days ago

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

What person came along and down-voted this question?! I voted it up again. It's a common problem in Snakemake and it has a good answer - shadow rules:

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#shadow-rules

Using shadow mode is not as mysterious as it may seem. It just does exactly what you are trying to do with subfolders but in a consistent and robust way. The "shadow" is just a temporary directory where the rule runs and makes whatever output, then Snakemake moves the output file back to the real working directory and deletes anything else. It's great for cleaning up temp files, and for resolving conflicts like you have.

If you ever tried Nextflow, that system basically runs every step as a shadow rule.

The short answer is, just add shadow: 'minimal' to your original rule (the simple version that did [*command to generate SolFix*]; mv SolFix {output.SolFix_ebv}) and then you should be golden. Let me know if you still have problems.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

workflow - How best to handle snakemake jobs where program produce the same file names - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)