NAPU on Biowulf

Napu (Nanopore Analysis Pipeline) is a collection of WDL workflows for variant calling and de novo assembly of ONT data, optimized for single-flowcell ONT sequencing protocol. The wet-lab/informatics protocol is now applied to sequence and characterize thousands of human brain genomes at the Center for Alzheimer's and Related Dementias at NIH. This pipeline version has more optional input options that run modular peices of the pipeline more easily.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load napu

[user@cn3144 ~]$ cd /data/$USER/

[user@cn3144 ~]$ cat ${NAPU_CONFIG}
# include the application.conf at the top
include required(classpath("application"))

system {
  job-rate-control {
    jobs = 1
    per = 1 second
  }
#  workflow-heartbeats {
#  write-failure-shutdown-duration = 2 minutes
# }
}

docker {
  hash-lookup {
    enabled = "false"
  }
}

database {
  profile = "slick.jdbc.HsqldbProfile$"
  db {
    driver = "org.hsqldb.jdbcDriver"
    url = """
    jdbc:hsqldb:file:cromwell-executions/cromwell-db/cromwell-db;
    shutdown=false;
    hsqldb.default_table_type=cached;hsqldb.tx=mvcc;
    hsqldb.result_max_memory_rows=10000;
    hsqldb.large_data=true;
    hsqldb.applog=1;
    hsqldb.lob_compressed=true;
    hsqldb.script_format=3
    """
    connectionTimeout = 120000
    numThreads = 2
   }
}

call-caching {
  enabled = true
  invalidate-bad-cache-results = true
}

backend {
  default = "Slurm-singularity"
  providers {
    Slurm-singularity {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
      config {
        concurrent-job-limit = 10
        # without this setting, the workflow hang indefinitely
        # run-in-background = true
        # If an 'exit-code-timeout-seconds' value is specified:
        #     - check-alive will be run at this interval for every job
        #     - if a job is found to be not alive, and no RC file appears after this interval
        #     - Then it will be marked as Failed.
        ## Warning: If set, Cromwell will run 'check-alive' for every job at this interval
        exit-code-timeout-seconds = 60
        filesystems {
         local {
           localization: [
             # soft link does not work for docker with --contain. Hard links won't work
             # across file systems
             "hard-link", "cached-copy", "copy"
           ]
         }
        }
       default-runtime-attributes {
            maxRetries = 0
        }

        runtime-attributes = """
        Int runtime_minutes = 600
        Int cpu = 2
        # the _mb is meaningful and and can result in implicit conversions.
        Int memory_mb = 4000
        String queue = "norm"
        Int? gpuCount
        String? gpuType
        String? docker
        String cacheLocation = "/usr/local/apps/napu/singularity/"
        """

        submit = """
            sbatch \
              --wait \
              -J ${job_name} \
              -D ${cwd} \
              -o ${out} \
              -e ${err} \
              -t ${runtime_minutes} \
              -c ${cpu} \
              --mem ${memory_mb} \
              --partition ${queue} \
              ${if defined(gpuCount) then
                        (if defined(gpuType) then ('--gres=gpu:' + gpuType + ':' + gpuCount)
                                             else ('--gres=gpu:' + gpuCount))
                        else ''} \
              --wrap "/bin/bash ${script}"
        """

        # script-epilogue = "sleep 30"

        submit-docker = """
            docker_subbed=$(sed -e 's/[^A-Za-z0-9._-]/_/g' <<< ${docker})
            # SINGULARITY_CACHEDIR needs to point to a directory accessible by
            # the jobs (i.e. not lscratch). Might want to use a workflow local
            # cache dir like in run.sh
            if [ -z $SINGULARITY_CACHEDIR ]; then
                CACHE_DIR=$HOME/.singularity
            else
                CACHE_DIR=$SINGULARITY_CACHEDIR
            fi
            mkdir -p $CACHE_DIR
            LOCK_FILE=$CACHE_DIR/singularity_pull_flock
            image=${cacheLocation}/$docker_subbed.sif

            if [ ! -f "$image" ]; then
               singularity pull $image docker://${docker}
            fi
            # we want to avoid all the cromwell tasks hammering each other trying
            # to pull the container into the cache for the first time. flock works
            # on GPFS, netapp, and vast (of course only for processes on the same
            # machine which is the case here since we're pulling it in the master
            # process before submitting).

            sbatch \
              --wait \
              -J ${job_name} \
              -D ${cwd} \
              -o ${out} \
              -e ${err} \
              -t ${runtime_minutes} \
              -c ${cpu} \
              --mem ${memory_mb} \
              --partition ${queue} \
              ${if defined(gpuCount) then
                        (if defined(gpuType) then ('--gres=gpu:' + gpuType + ':' + gpuCount)
                                             else ('--gres=gpu:' + gpuCount))
                        else ''} \
              --wrap "singularity exec ${if defined(gpuCount) then '--nv ' else ''} --containall --bind ${cwd}:${docker_cwd} $image ${job_shell} ${docker_sc
ript}"
        """
        kill = "scancel ${job_id}"
        check-alive = "dashboard_cli jobs --is-active -j ${job_id} &> /dev/null"
        job-id-regex = "(\\d+)"
      }
    }
  }
}

[user@cn3144 ~]$ cat inputs_test.json # prepare your input
{
  "cardEndToEndVcfMethyl.inputReads": ["hg002_chr14_guppy5.fastq.gz"],
  "cardEndToEndVcfMethyl.referenceFasta": "grch37_chr14.fasta",
  "cardEndToEndVcfMethyl.threads": 20,
  "cardEndToEndVcfMethyl.sampleName": "Sample"
}

[user@cn3144 ~]$ 
java -Dconfig.file=${NAPU_CONFIG} \
-jar ${CROMWELL_JAR} \
run -i inputs_test.json \
${NAPU_WF}/cardEndToEndVcf.wdl 
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226

[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. napu.sh). For example:

#!/bin/bash
set -e
module load napu
cd /data/$USER
java -Dconfig.file=${NAPU_CONFIG} \
-jar ${CROMWELL_JAR} \
run -i inputs_test.json \
${NAPU_WF}/cardEndToEndVcf.wdl

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] napu.sh