最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

pipeline - Nextflow skips all steps in workflow - Stack Overflow

programmeradmin1浏览0评论

I have been facing an issue with my Nextflow pipeline, where all steps are skipped after I add another step in the workflow. Below is the pipeline without the extra step:

workflow BOLTZ {
    take:
    ch_samplesheet  // channel: samplesheet read from --input
    ch_versions     // channel: [ path(versions.yml) ]
    ch_boltz_ccd    // channel: [ path(boltz_ccd) ]
    ch_boltz_model  // channel: [ path(model) ]

    ch_multiqc_files = Channel.empty()

    // CREATE_SAMPLESHEET_YAML
    CREATE_SAMPLESHEET_YAML(
        ch_samplesheet,
    )

    // RUN_BOLTZ
    RUN_BOLTZ(
        CREATE_SAMPLESHEET_YAML.out.samplesheet,
        ch_boltz_model,
        ch_boltz_ccd
    )

    emit:
    versions   = ch_versions
    msa        = RUN_BOLTZ.out.msa
    structures = RUN_BOLTZ.out.structures
    confidence = RUN_BOLTZ.out.confidence
    plddt      = RUN_BOLTZ.out.plddt
}

The intended behavior of this pipeline is to take a samplesheet csv, convert the FASTA files into the YAML format required by Boltz. This alone works fine.

However, by default, Boltz conducts the MSA search using a cloud server, and we wish to implement the local ColabFold MSA search, so I have created variants of CREATE_SAMPLESHEET and RUN_BOLTZ that will allow this:

process CREATE_SAMPLESHEET_YAML_MSA {
    cache false
    tag "$meta.id"
    label 'process_single'

    container 'docker://nbtmsh/samplesheet-utils:1.1'

    input:
    tuple val(meta), path(samplesheet)
    path ('*.a3m')

    output:
    tuple val(meta), file('*.yaml'), emit: samplesheet

    when:
    task.ext.when == null || task.ext.when

    script:
    """
    create-samplesheet \\
        --directory ./ \\
        --msa-dir ./ \\
        --yaml \
        --output-file \$(sample-name --sanitise --index 0 ./*.fasta).yaml
    """

    stub:
    """
    echo "" > samplesheet.yaml
    """
}

process RUN_BOLTZ_MSA {
    tag "$meta.id"
    label 'process_medium'

    container "/srv/scratch/sbf-pipelines/proteinfold/singularity/boltz.sif"

    input:
    tuple val(meta), path(fasta)
    path ('boltz1_conf.ckpt')
    path ('ccd.pkl')
    path ('**.a3m')

    output:
    path ("boltz_results_${fasta.baseName}/processed/msa/*.npz"), emit: msa
    path ("boltz_results_${fasta.baseName}/processed/structures/*.npz"), emit: structures
    path ("boltz_results_${fasta.baseName}/predictions/${fasta.baseName}/confidence*.json"), emit: confidence
    path ("boltz_results_${fasta.baseName}/predictions/${fasta.baseName}/plddt_*.npz"), emit: plddt

    script:
    """
    boltz predict --use_msa_server "./${fasta.name}" --cache ./
    """
}

Now the workflow looks like this:

workflow BOLTZ {
    take:
    ch_samplesheet  // channel: samplesheet read from --input
    ch_versions     // channel: [ path(versions.yml) ]
    ch_boltz_ccd    // channel: [ path(boltz_ccd) ]
    ch_boltz_model  // channel: [ path(model) ]
    ch_colabfold_params // channel: [ path(colabfold_params) ]
    ch_colabfold_db // channel: [ path(colabfold_db) ]
    ch_uniref30     // channel: [ path(uniref30) ]

    main:
    ch_multiqc_files = Channel.empty()

    // MMSEQS_COLABFOLDSEARCH
    MMSEQS_COLABFOLDSEARCH (
        ch_samplesheet,
        ch_colabfold_params,
        ch_colabfold_db,
        ch_uniref30
    )

    ch_versions = ch_versions.mix(MMSEQS_COLABFOLDSEARCH.out.versions)

    // CREATE_SAMPLESHEET_YAML
    CREATE_SAMPLESHEET_YAML_MSA(
        ch_samplesheet,
        MMSEQS_COLABFOLDSEARCH.out.a3m
    )

    // RUN_BOLTZ
    RUN_BOLTZ_MSA(
        CREATE_SAMPLESHEET_YAML_MSA.out.samplesheet,
        ch_boltz_model,
        ch_boltz_ccd,
        MMSEQS_COLABFOLDSEARCH.out.a3m
    )

    emit:
    versions   = ch_versions
    msa        = RUN_BOLTZ_MSA.out.msa
    structures = RUN_BOLTZ_MSA.out.structures
    confidence = RUN_BOLTZ_MSA.out.confidence
    plddt      = RUN_BOLTZ_MSA.out.plddt
}

After this change is implemented, all steps in the pipeline are skipped. I have looked over all the Nextflow channel documentation and I can't seem to figure out why this may be happening.

This is the MMSEQS_COLABFOLDSEARCH process:

[[id:TEST1], /home/z3545907/MPGAFS.fasta]
process MMSEQS_COLABFOLDSEARCH {
    tag "$meta.id"
    label 'process_high_memory'

    // Exit if running this module with -profile conda / -profile mamba
    if (workflow.profile.tokenize(',').intersect(['conda', 'mamba']).size() >= 1) {
        error("Local MMSEQS_COLABFOLDSEARCH module does not support Conda. Please use Docker / Singularity / Podman instead.")
    }

    container "nf-core/proteinfold_colabfold:dev"

    input:
    tuple val(meta), path(fasta)
    path ('db/params')
    path colabfold_db
    path uniref30

    output:
    tuple val(meta), path("**.a3m"), emit: a3m
    path "versions.yml", emit: versions
    path fasta, emit: fasta

    when:
    task.ext.when == null || task.ext.when

    script:
    def args = task.ext.args ?: ''
    def VERSION = '1.5.2' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.

    """
    ln -r -s $uniref30/uniref30_* ./db
    ln -r -s $colabfold_db/colabfold_envdb* ./db

    /localcolabfold/colabfold-conda/bin/colabfold_search \\
        $args \\
        --threads $task.cpus ${fasta} \\
        ./db \\
        "result/"

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        colabfold_search: $VERSION
    END_VERSIONS
    """

    stub:
    def VERSION = '1.5.2' // WARN: Version information not provided by tool on CLI. Please update this string when bumping container versions.
    """
    mkdir results
    touch results/${meta.id}.a3m

    cat <<-END_VERSIONS > versions.yml
    "${task.process}":
        colabfold_search: $VERSION
    END_VERSIONS
    """
}
发布评论

评论列表(0)

  1. 暂无评论