I have an issue with a snakemake workflow. Due to limitations from a certain software, the output of one of my rule is always named SolFix.
I handle this by moving the file to a different named file based on the output of my rule
... [*command to generate SolFix*]; mv SolFix {output.SolFix_ebv}
Multiple rules within the same group {param.group}
produces the SolFix file. Hence, there has been occassions where the SolFix
file gets overwritten by parallel processes in the snakemake workflow before being moved.
Until now, I have addressed some of these issues by running the program within subfolders.
[mkdir -p subfolder_{params.breed}/rule_subfolder_{params.breed}]; [cd subfolder_{params.breed}/rule_subfolder_{params.breed}];[*command to generate SolFix*]; mv SolFix {output.SolFix_ebv}
However, this is becoming too subdivided.
I have also tried to limit these issues by creating dependencies based on what rules/jobs tend to finish around the same time. However, this only tends to serialize my workflow.
Are there alternative solutions (that scale) I can explore to resolve this issue.
I have an issue with a snakemake workflow. Due to limitations from a certain software, the output of one of my rule is always named SolFix.
I handle this by moving the file to a different named file based on the output of my rule
... [*command to generate SolFix*]; mv SolFix {output.SolFix_ebv}
Multiple rules within the same group {param.group}
produces the SolFix file. Hence, there has been occassions where the SolFix
file gets overwritten by parallel processes in the snakemake workflow before being moved.
Until now, I have addressed some of these issues by running the program within subfolders.
[mkdir -p subfolder_{params.breed}/rule_subfolder_{params.breed}]; [cd subfolder_{params.breed}/rule_subfolder_{params.breed}];[*command to generate SolFix*]; mv SolFix {output.SolFix_ebv}
However, this is becoming too subdivided.
I have also tried to limit these issues by creating dependencies based on what rules/jobs tend to finish around the same time. However, this only tends to serialize my workflow.
Are there alternative solutions (that scale) I can explore to resolve this issue.
Share Improve this question edited 2 days ago oguz ismail 50.8k16 gold badges57 silver badges78 bronze badges asked 2 days ago Damilola DecarlsDamilola Decarls 295 bronze badges 1- I'll also go with sub-folders, can't think of a better way. Or mix-and-match, run a serialized workflow in each of several sub-folders, to more or less balance performance and awkwardness. – X Zhang Commented 2 days ago
1 Answer
Reset to default 0What person came along and down-voted this question?! I voted it up again. It's a common problem in Snakemake and it has a good answer - shadow rules:
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#shadow-rules
Using shadow mode is not as mysterious as it may seem. It just does exactly what you are trying to do with subfolders but in a consistent and robust way. The "shadow" is just a temporary directory where the rule runs and makes whatever output, then Snakemake moves the output file back to the real working directory and deletes anything else. It's great for cleaning up temp files, and for resolving conflicts like you have.
If you ever tried Nextflow, that system basically runs every step as a shadow rule.
The short answer is, just add shadow: 'minimal'
to your original rule (the simple version that did [*command to generate SolFix*]; mv SolFix {output.SolFix_ebv}
) and then you should be golden. Let me know if you still have problems.