I have been facing an issue with my Nextflow pipeline, where all steps are skipped after I add another step in the workflow. Below is the pipeline without the extra step:
workflow BOLTZ {
take:
ch_samplesheet // channel: samplesheet read from --input
ch_versions // channel: [ path(versions.yml) ]
ch_boltz_ccd // channel: [ path(boltz_ccd) ]
ch_boltz_model // channel: [ path(model) ]
ch_multiqc_files = Channel.empty()
// CREATE_SAMPLESHEET_YAML
CREATE_SAMPLESHEET_YAML(
ch_samplesheet,
)
// RUN_BOLTZ
RUN_BOLTZ(
CREATE_SAMPLESHEET_YAML.out.samplesheet,
ch_boltz_model,
ch_boltz_ccd
)
emit:
versions = ch_versions
msa = RUN_BOLTZ.out.msa
structures = RUN_BOLTZ.out.structures
confidence = RUN_BOLTZ.out.confidence
plddt = RUN_BOLTZ.out.plddt
}
The intended behavior of this pipeline is to take a samplesheet csv, convert the FASTA files into the YAML format required by Boltz. This alone works fine.
However, by default, Boltz conducts the MSA search using a cloud server, and we wish to implement the local ColabFold MSA search, so I have created variants of CREATE_SAMPLESHEET
and RUN_BOLTZ
that will allow this:
process CREATE_SAMPLESHEET_YAML_MSA {
cache false
tag "$meta.id"
label 'process_single'
container 'docker://nbtmsh/samplesheet-utils:1.1'
input:
tuple val(meta), path(samplesheet)
path ('*.a3m')
output:
tuple val(meta), file('*.yaml'), emit: samplesheet
when:
task.ext.when == null || task.ext.when
script:
"""
create-samplesheet \\
--directory ./ \\
--msa-dir ./ \\
--yaml \
--output-file \$(sample-name --sanitise --index 0 ./*.fasta).yaml
"""
stub:
"""
echo "" > samplesheet.yaml
"""
}
process RUN_BOLTZ_MSA {
tag "$meta.id"
label 'process_medium'
container "/srv/scratch/sbf-pipelines/proteinfold/singularity/boltz.sif"
input:
tuple val(meta), path(fasta)
path ('boltz1_conf.ckpt')
path ('ccd.pkl')
path ('**.a3m')
output:
path ("boltz_results_${fasta.baseName}/processed/msa/*.npz"), emit: msa
path ("boltz_results_${fasta.baseName}/processed/structures/*.npz"), emit: structures
path ("boltz_results_${fasta.baseName}/predictions/${fasta.baseName}/confidence*.json"), emit: confidence
path ("boltz_results_${fasta.baseName}/predictions/${fasta.baseName}/plddt_*.npz"), emit: plddt
script:
"""
boltz predict --use_msa_server "./${fasta.name}" --cache ./
"""
}
Now the workflow looks like this:
workflow BOLTZ {
take:
ch_samplesheet // channel: samplesheet read from --input
ch_versions // channel: [ path(versions.yml) ]
ch_boltz_ccd // channel: [ path(boltz_ccd) ]
ch_boltz_model // channel: [ path(model) ]
ch_colabfold_params // channel: [ path(colabfold_params) ]
ch_colabfold_db // channel: [ path(colabfold_db) ]
ch_uniref30 // channel: [ path(uniref30) ]
main:
ch_multiqc_files = Channel.empty()
// MMSEQS_COLABFOLDSEARCH
MMSEQS_COLABFOLDSEARCH (
ch_samplesheet,
ch_colabfold_params,
ch_colabfold_db,
ch_uniref30
)
ch_versions = ch_versions.mix(MMSEQS_COLABFOLDSEARCH.out.versions)
// CREATE_SAMPLESHEET_YAML
CREATE_SAMPLESHEET_YAML_MSA(
ch_samplesheet,
MMSEQS_COLABFOLDSEARCH.out.a3m
)
// RUN_BOLTZ
RUN_BOLTZ_MSA(
CREATE_SAMPLESHEET_YAML_MSA.out.samplesheet,
ch_boltz_model,
ch_boltz_ccd,
MMSEQS_COLABFOLDSEARCH.out.a3m
)
emit:
versions = ch_versions
msa = RUN_BOLTZ_MSA.out.msa
structures = RUN_BOLTZ_MSA.out.structures
confidence = RUN_BOLTZ_MSA.out.confidence
plddt = RUN_BOLTZ_MSA.out.plddt
}
After this change is implemented, all steps in the pipeline are skipped. I have looked over all the Nextflow channel documentation and I can't seem to figure out why this may be happening.
This is the MMSEQS_COLABFOLDSEARCH process:
[[id:TEST1], /home/z3545907/MPGAFS.fasta]
process MMSEQS_COLABFOLDSEARCH {
tag "$meta.id"
label 'process_high_memory'
// Exit if running this module with -profile conda / -profile mamba
if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
error("Local MMSEQS_COLABFOLDSEARCH module does not support Conda. Please use Docker / Singularity / Podman instead.")
}
container "nf-core/proteinfold_colabfold:dev"
input:
tuple val(meta), path(fasta)
path ('db/params')
path colabfold_db
path uniref30
output:
tuple val(meta), path("**.a3m"), emit: a3m
path "versions.yml", emit: versions
path fasta, emit: fasta
when:
task.ext.when == null || task.ext.when
script:
def args = task.ext.args ?: ''
def VERSION = '1.5.2' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.
"""
ln -r -s $uniref30/uniref30_* ./db
ln -r -s $colabfold_db/colabfold_envdb* ./db
/localcolabfold/colabfold-conda/bin/colabfold_search \\
$args \\
--threads $task.cpus ${fasta} \\
./db \\
"result/"
cat <<-END_VERSIONS > versions.yml
"${task.process}":
colabfold_search: $VERSION
END_VERSIONS
"""
stub:
def VERSION = '1.5.2' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.
"""
mkdir results
touch results/${meta.id}.a3m
cat <<-END_VERSIONS > versions.yml
"${task.process}":
colabfold_search: $VERSION
END_VERSIONS
"""
}