Napu (Nanopore Analysis Pipeline) is a collection of WDL workflows for variant calling and de novo assembly of ONT data, optimized for single-flowcell ONT sequencing protocol. The wet-lab/informatics protocol is now applied to sequence and characterize thousands of human brain genomes at the Center for Alzheimer's and Related Dementias at NIH. This pipeline version has more optional input options that run modular peices of the pipeline more easily.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load napu [user@cn3144 ~]$ cd /data/$USER/ [user@cn3144 ~]$ cat ${NAPU_CONFIG} # include the application.conf at the top include required(classpath("application")) system { job-rate-control { jobs = 1 per = 1 second } # workflow-heartbeats { # write-failure-shutdown-duration = 2 minutes # } } docker { hash-lookup { enabled = "false" } } database { profile = "slick.jdbc.HsqldbProfile$" db { driver = "org.hsqldb.jdbcDriver" url = """ jdbc:hsqldb:file:cromwell-executions/cromwell-db/cromwell-db; shutdown=false; hsqldb.default_table_type=cached;hsqldb.tx=mvcc; hsqldb.result_max_memory_rows=10000; hsqldb.large_data=true; hsqldb.applog=1; hsqldb.lob_compressed=true; hsqldb.script_format=3 """ connectionTimeout = 120000 numThreads = 2 } } call-caching { enabled = true invalidate-bad-cache-results = true } backend { default = "Slurm-singularity" providers { Slurm-singularity { actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory" config { concurrent-job-limit = 10 # without this setting, the workflow hang indefinitely # run-in-background = true # If an 'exit-code-timeout-seconds' value is specified: # - check-alive will be run at this interval for every job # - if a job is found to be not alive, and no RC file appears after this interval # - Then it will be marked as Failed. ## Warning: If set, Cromwell will run 'check-alive' for every job at this interval exit-code-timeout-seconds = 60 filesystems { local { localization: [ # soft link does not work for docker with --contain. Hard links won't work # across file systems "hard-link", "cached-copy", "copy" ] } } default-runtime-attributes { maxRetries = 0 } runtime-attributes = """ Int runtime_minutes = 600 Int cpu = 2 # the _mb is meaningful and and can result in implicit conversions. Int memory_mb = 4000 String queue = "norm" Int? gpuCount String? gpuType String? docker String cacheLocation = "/usr/local/apps/napu/singularity/" """ submit = """ sbatch \ --wait \ -J ${job_name} \ -D ${cwd} \ -o ${out} \ -e ${err} \ -t ${runtime_minutes} \ -c ${cpu} \ --mem ${memory_mb} \ --partition ${queue} \ ${if defined(gpuCount) then (if defined(gpuType) then ('--gres=gpu:' + gpuType + ':' + gpuCount) else ('--gres=gpu:' + gpuCount)) else ''} \ --wrap "/bin/bash ${script}" """ # script-epilogue = "sleep 30" submit-docker = """ docker_subbed=$(sed -e 's/[^A-Za-z0-9._-]/_/g' <<< ${docker}) # SINGULARITY_CACHEDIR needs to point to a directory accessible by # the jobs (i.e. not lscratch). Might want to use a workflow local # cache dir like in run.sh if [ -z $SINGULARITY_CACHEDIR ]; then CACHE_DIR=$HOME/.singularity else CACHE_DIR=$SINGULARITY_CACHEDIR fi mkdir -p $CACHE_DIR LOCK_FILE=$CACHE_DIR/singularity_pull_flock image=${cacheLocation}/$docker_subbed.sif if [ ! -f "$image" ]; then singularity pull $image docker://${docker} fi # we want to avoid all the cromwell tasks hammering each other trying # to pull the container into the cache for the first time. flock works # on GPFS, netapp, and vast (of course only for processes on the same # machine which is the case here since we're pulling it in the master # process before submitting). sbatch \ --wait \ -J ${job_name} \ -D ${cwd} \ -o ${out} \ -e ${err} \ -t ${runtime_minutes} \ -c ${cpu} \ --mem ${memory_mb} \ --partition ${queue} \ ${if defined(gpuCount) then (if defined(gpuType) then ('--gres=gpu:' + gpuType + ':' + gpuCount) else ('--gres=gpu:' + gpuCount)) else ''} \ --wrap "singularity exec ${if defined(gpuCount) then '--nv ' else ''} --containall --bind ${cwd}:${docker_cwd} $image ${job_shell} ${docker_sc ript}" """ kill = "scancel ${job_id}" check-alive = "dashboard_cli jobs --is-active -j ${job_id} &> /dev/null" job-id-regex = "(\\d+)" } } } } [user@cn3144 ~]$ cat inputs_test.json # prepare your input { "cardEndToEndVcfMethyl.inputReads": ["hg002_chr14_guppy5.fastq.gz"], "cardEndToEndVcfMethyl.referenceFasta": "grch37_chr14.fasta", "cardEndToEndVcfMethyl.threads": 20, "cardEndToEndVcfMethyl.sampleName": "Sample" } [user@cn3144 ~]$ java -Dconfig.file=${NAPU_CONFIG} \ -jar ${CROMWELL_JAR} \ run -i inputs_test.json \ ${NAPU_WF}/cardEndToEndVcf.wdl [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. napu.sh). For example:
#!/bin/bash set -e module load napu cd /data/$USER java -Dconfig.file=${NAPU_CONFIG} \ -jar ${CROMWELL_JAR} \ run -i inputs_test.json \ ${NAPU_WF}/cardEndToEndVcf.wdl
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] napu.sh