Nextflow is a domain specific language modelled after UNIX pipes. It simplifies writing parallel and scalable pipelines. The version installed on our systems can run jobs locally (on the same machine) and by submitting to Slurm.
nf-core is a community effort to collect a curated set of analysis pipelines built using Nextflow. There are more than 90 pipelines available as part of nf-core. Here is some basic introduction to run nf-core pipeline: hello nf-core.
EPI2ME Labs maintains a collection of Nextflow bioinformatics workflows tailored to Oxford Nanopore Technologies long-read sequencing data. Workflow projects are prefixed with wf-.
The code that is executed at each pipeline stage can be written in a number of different languages (shell, python, R, ...).
Intermediate results for workflows are stored in the $PWD/work
directory which allows resuming execution of pipelines.
The language used to write pipeline scripts is an extension of groovy.
Nextflow is a complex workflow management tool. Please read the manual carefully and make sure to place appropriate limits on your pipeline to avoid submitting too many jobs or running too many local processes.
Nextflow, when running many tasks appears to create many temp files in the ./work directory. Please make sure that your pipeline does not inadvertently create millions of small files which would result in a degradation of file system performance.
Nextflow, by default, with no config file, will spawns parallel task executions in the computer on which it is running. This is not a good practice in HPC systems which are designed to share compute resources across many users. Please use -profile biowulflocal to utilized allocated resources.
Some of epi2me pipelines do not work under nextflow/24.04, so please load nextflow/23.10 instead.
cp /usr/local/apps/nextflow/nextflow.config . # only need to copy once
nextflow run xxxx -profile biowulflocal # run inside of interactive session
module load nextflow
nf-core --help
pollInterval
and queueStatInterval
to reduce the frequency
with which nextflow polls slurm. The default frequency creates too many
queries and results in unnecessary load on the scheduler.export NXF_SINGULARITY_CACHEDIR=/data/$USER/nxf_singularity_cache;
export SINGULARITY_CACHEDIR=/data/$USER/.singularity;
--save_reference
for the first run (will save your indices in your results directory), and reuse them for multiple samples.
-r 1.3.1
, otherwise, the newest version will be used. ulimit -u 10240 -n 16384
First, let's do some basic local execution. For this we will allocate an interactive session:
[user@biowulf]$ sinteractive --mem=10g -c2 --gres=lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ module load nextflow [user@cn3144]$ nextflow run hello N E X T F L O W ~ version 23.10.0 Pulling nextflow-io/hello ... downloaded from https://github.com/nextflow-io/hello.git Launching `https://github.com/nextflow-io/hello` [elegant_almeida] DSL2 - revision: 7588c46ffe [master] executor > local (4) [05/a7b6cf] process > sayHello (4) [100%] 4 of 4 ✔ Ciao world! Bonjour world! Hello world! Hola world!
For the traditional hello world example we will parallelize the uppercasing of different language greetings:
# create file of greetings [user@cn3144]$ mkdir testdir;cat > ./testdir/test1 <<EOF Hello world! Hallo world! Ciao world! Salut world! Bongiorno world! Servus world! EOF [user@cn3144]$ cat > ./testdir/test2 <<EOF Gruess Gott world! Na was los world! Gruetzi world! Hello world! Come va world! Ca va world! Hi world! Good bye world! EOF [user@cn3144]$ cat > test.R <<EOF args <- commandArgs(trailingOnly = TRUE) library(readr) df <- read_lines(args[1]) # create output sink(args[2]) for (each in df){ cat (toupper(each)) cat ('\n') } sink() EOF
We then create a file called rhello.nf
that describes
the workflow to be executed
// Declare syntax version nextflow.enable.dsl=2 params.output_dir = './results' process getsbatchlist { module 'R' input: path(input_file) each Rs publishDir "${params.output_dir}" output: path "${input_file}.txt" script: """ Rscript ${Rs} ${input_file} ${input_file}.txt """ } workflow { def inputf = Channel.fromPath('./testdir/test*') def Rs = Channel.fromPath('./test.R') getsbatchlist(inputf,Rs) | view }
The workflow is executed with
[user@cn3144]$ nextflow rhello.nf N E X T F L O W ~ version 23.04.1 Launching `test2.nf` [hopeful_cray] DSL2 - revision: 7401a333f4 executor > local (2) [28/6ccadb] process > getsbatchlist (2) [100%] 2 of 2 ✔ /gpfs/gsfs8/users/apptest2/work/82/9d153e5b2a5ab4399ab36beb01e552/test2.txt /gpfs/gsfs8/users/apptest2/work/28/6ccadb35b01a2755c9670375ec1a05/test1.txt [user@cn3144]$ cat results/test1.txt HELLO WORLD! HALLO WORLD! CIAO WORLD! SALUT WORLD! BONGIORNO WORLD! SERVUS WORLD!
Note that results are out of order.
The same workflow can be used to run each of the processes as a slurm job
by creating a nextflow.config
file. We provide a file with correct
settings for biowulf at /usr/local/apps/nextflow/nextflow.config
.
If you use this file please don't change settings for job submission and
querying (pollInterval, queueStatInterval, and submitRateLimit
).
In particular you might want to remove the lscratch allocation if that does not apply to your workflow. Although
it was encouraged to use lscratch as much as you can.
[user@cn3144]$ cp /usr/local/apps/nextflow/nextflow.config . [user@cn3144]$ cat nextflow.config [user@cn3144]$ nextflow run -profile biowulf hello.nf N E X T F L O W ~ version 20.10.0 Launching `hello.nf` [intergalactic_cray] - revision: f195027c60 executor > slurm (15) [34/d935ef] process > splitLetters [100%] 1 of 1 ✔ HELLO WORLD! [...snip...] [97/85354f] process > convertToUpper (11) [100%] 14 of 14 ✔Running nextflow with biowulf profile (slurm executor) using test input from nf-core:
[user@cn3144]$ nextflow run nf-core/sarek -profile test,biowulf --outdir testout N E X T F L O W ~ version 22.10.4 Launching `https://github.com/nf-core/sarek` [agitated_noyce] DSL2 - revision: c87f4eb694 [master] WARN: Found unexpected parameters: * --test_data_base: https://raw.githubusercontent.com/nf-core/test-datasets/modules - Ignore this warning: params.schema_ignore_params = "test_data_base" ------------------------------------------------------ ,--./,-. ___ __ __ __ ___ /,-._.--~' |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' ____ .´ _ `. / |\`-_ \ __ __ ___ | | \ `-| |__` /\ |__) |__ |__/ \ | \ / .__| /¯¯\ | \ |___ | \ `|____\´ nf-core/sarek v3.1.2 ------------------------------------------------------ ...
Run nextflow with local executor (biowulflocal profile) to utilize allocated cpus and memory on compute node, --mem and -c is essential, and use lscratch as work directory for troubleshooting:
[user@biowulf]$sinteractive --mem=80g -c 32 --gres=lscratch:200 [user@cn3144]$ nextflow run nf-core/sarek -r 3.2.3 \ -profile biowulflocal \ --wes \ --joint_germline \ --input test.csv \ --tools haplotypecaller,vep,snpeff \ --outdir /data/$USER/sarek/ \ --genome GATK.GRCh38 \ --igenomes_base /fdb/igenomes_nf \ --save_output_as_bam \ -w /lscratch/$SLURM_JOB_ID \ --cache_version 110 \ --vep_cache /fdb/VEP/110/cache \ --snpeff_cache /fdb/snpEff/5.1d/data/
Pipeline settings can be provided in a yaml file via -params-file Create a batch input file (e.g. nf_main.sh) to run the master process. For example: Submit this job using the Slurm sbatch command. Create a batch script (e.g. nf_local.sh) to run with biowulflocal profile. For example: Submit this job using the Slurm sbatch command. Create a batch script (e.g. wf_basecalling_local.sh) to run with biowulflocal profile. For example: Submit this job using the Slurm sbatch command. Submit this job using the Slurm sbatch command. The master process submitting jobs should be run
either as a batch job or on an interactive node - not on the biowulf
login node.
[user@cn3144]$ nextflow run nf-core/hic -profile biowulflocal -params-file params.yaml
[user@cn3144]$ cat params.yaml
input: './samplesheet.csv'
outdir: './results/'
fasta: 'https://github.com/nf-core/test-datasets/raw/hic/reference/W303_SGD_2015_JRIU00000000.fsa'
digestion: 'hindiii'
schema_ignore_params: 'genomes,digest,input_paths,input'
min_mapq : 10
min_restriction_fragment_size : 100
max_restriction_fragment_size : 100000
min_insert_size : 100
max_insert_size : 600
bin_size : '2000,1000'
res_dist_decay : '1000'
res_tads : '1000'
tads_caller : 'insulation,hicexplorer'
res_compartments : '2000'
#! /bin/bash
#SBATCH --job-name=nextflow-main
#SBATCH --cpus-per-task=4
#SBATCH --mem=4G
#SBATCH --gres=lscratch:200
#SBATCH --time=24:00:00
module load nextflow
export NXF_SINGULARITY_CACHEDIR=/data/$USER/nxf_singularity_cache;
export SINGULARITY_CACHEDIR=/data/$USER/.singularity;
export TMPDIR=/lscratch/$SLURM_JOB_ID
export NXF_JVM_ARGS="-Xms2g -Xmx4g"
nextflow run nf-core/rnaseq -r 3.13.2 \
-profile biowulf \
--input samplesheet_test.csv \
--outdir /data/$USER/rnaseq_out \
--gtf /fdb/igenomes_nf/Homo_sapiens/Ensembl/pub/release-110/gtf/Homo_sapiens.GRCh38.110.gtf \
--fasta \
/fdb/igenomes_nf/Homo_sapiens/Ensembl/pub/release-110/fasta/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
--star_index /fdb/igenomes_nf/Homo_sapiens/Ensembl/pub/release-110/STARindex/ \
--igenomes_ignore --genome null \
-resume
sbatch nf_main.sh
#! /bin/bash
#SBATCH --job-name=nextflow-local
#SBATCH --cpus-per-task=32
#SBATCH --mem=80G
#SBATCH --gres=lscratch:200
#SBATCH --time=24:00:00
module load nextflow
export NXF_SINGULARITY_CACHEDIR=/data/$USER/nxf_singularity_cache;
export SINGULARITY_CACHEDIR=/data/$USER/.singularity;
export TMPDIR=/lscratch/$SLURM_JOB_ID
nextflow run nf-core/hic -profile biowulflocal -params-file params.yaml
sbatch nf_local.sh
#! /bin/bash
#SBATCH --job-name=wf-basecaling
#SBATCH --cpus-per-task=12
#SBATCH --mem=64G
#SBATCH --time=4:00:00
#SBATCH --gres=lscratch:200,gpu:1
#SBATCH --partition=gpu
#SBATCH --constraint="gpua100|gpuv100x|gpuv100"
module load nextflow
export NXF_SINGULARITY_CACHEDIR=/data/$USER/nxf_singularity_cache;
export SINGULARITY_CACHEDIR=/data/$USER/.singularity;
export TMPDIR=/lscratch/$SLURM_JOB_ID
nextflow run epi2me-labs/wf-basecalling \
-profile biowulflocal \
-resume \
--input wf-basecalling-demo/input \
--ref wf-basecalling-demo/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta \
--dorado_ext pod5 \
--out_dir output \
--basecaller_cfg dna_r10.4.1_e8.2_400bps_hac@v4.1.0 \
--remora_cfg "dna_r10.4.1_e8.2_400bps_hac@v4.1.0_5mCG_5hmCG@v2"
sbatch wf_basecalling_local.sh