Nextflow is a domain specific language modelled after UNIX pipes. It simplifies writing parallel and scalable pipelines. The version installed on our systems can run jobs locally (on the same machine) and by submitting to Slurm.
nf-core is a community effort to collect a curated set of analysis pipelines built using Nextflow. There are more than 90 pipelines available as part of nf-core. Here is some basic introduction to run nf-core pipeline: hello nf-core.
EPI2ME Labs maintains a collection of Nextflow bioinformatics workflows tailored to Oxford Nanopore Technologies long-read sequencing data. Workflow projects are prefixed with wf-.
The code that is executed at each pipeline stage can be written in a number of different languages (shell, python, R, ...).
Intermediate results for workflows are stored in the $PWD/work
directory which allows resuming execution of pipelines.
The language used to write pipeline scripts is an extension of groovy.
Nextflow is a complex workflow management tool. Please read the manual carefully and make sure to place appropriate limits on your pipeline to avoid submitting too many jobs or running too many local processes.
Nextflow, when running many tasks appears to create many temp files in the ./work directory. Please make sure that your pipeline does not inadvertently create millions of small files which would result in a degradation of file system performance.
Nextflow, by default, with no config file, will spawns parallel task executions in the computer on which it is running. This is not a good practice in HPC systems which are designed to share compute resources across many users. Please use -profile biowulflocal to utilized allocated resources.
Some of epi2me pipelines do not work under nextflow/24.04, so please load nextflow/23.10 instead.
Sarek/3.5.0 is a buggy version, please use sarek/3.5.1 instead.
- Not setting proper job submission and querying parameters
- Please update your nextflow config to use the below parameters when running 'slurm' executor:
executor {
name = 'slurm'
pollInterval = '2 min'
queueStatInterval = '5 min'
submitRateLimit = '6/1min'
} - slurm profile submit jobs with unlimited time and max memory, which stuck at the pending status
- Please update to the most recent version of nextflow.config to avoid this bug:
cp /usr/local/apps/nextflow/nextflow.config .
- FATAL error while mount /gsx
- /gsx is retired from the cluster, thus if you see some errors like this:
WARNING: skipping mount of /gs6: stat /gs6: no such file or directory FATAL: container creation failed: mount /gs6->/gs6 error: while mounting /gs6: mount source /gs6 doesn't exist.
Please update to the most recent version of nextflow.config, and then run your pipeline again:cp /usr/local/apps/nextflow/nextflow.config .
- docker: command not found
- The two most popular containerization systems are Singularity/Apptainer and Docker, HPC facilities will not use Docker as it provides root access to the host system, and instead will use Singularity. For most pipelines, running biowulf or biowulflocal profile after copy the config file to work directory will avoid this error:
cp /usr/local/apps/nextflow/nextflow.config . # only need to copy once
nextflow run xxxx -profile biowulflocal # run inside of interactive session
- The master process submitting jobs should be run either as a batch job or on an interactive node - not on the biowulf login node.
- Please explicitly set the
pollInterval
,queueStatInterval
andsubmitRateLimit
for the 'slurm' executor to reduce the frequency with which nextflow polls slurm. (See "Common Pitfalls" above.) The default frequency creates too many queries and results in unnecessary load on the scheduler. - Module Name: nextflow (see the modules page for more information)
- nf-core is packaged with nextflow, it is accessible after loading the newest nextflow module.
- Nextflow can use local, slurm, and hyperqueue executor. For small, short jobs, or debugging, we recommand to use local executor with
-profile biowulflocal
. For the larger and longer running jobs, we recommand to use the slurm executor-profile biowulf
. If your workflow includes a lot of short processes and you need several nodes for the computation, you can use hyperqueue. However, the executor settings can be complex depending on the pipeline. A simple solution is running hybrid workflow: submit the run as a batch job of a single task with reasonable resources. The resources depending on how many local processes you want to run simultaneously (e.g. 32). The local processes will all run within the main batch job, and the slurm processes will be submitted as separate slurm jobs. - When your /home directory is full while running nextflow, it could be that the singularity cache is filling up, please redirect those to /data directory with:
export NXF_SINGULARITY_CACHEDIR=/data/$USER/nxf_singularity_cache;
export SINGULARITY_CACHEDIR=/data/$USER/.singularity;
- Some of the commonly used reference genomes with index files could be find under /fdb/igenomes_nf/, depends on your pipeline, you might be able to use customized genome, but please add
--save_reference
for the first run (will save your indices in your results directory), and reuse them for multiple samples. - It is a good idea to specify a pipeline version when running the nf-core pipelines, eg.
-r 1.3.1
, otherwise, the newest version will be used. - Pipeline settings can be provided in a yaml file via -params-file
. - For epi2me pipelines, we recommand to run with local executor first, and adjust the config for slurm executor to reduce the wasting of resources. If you needs help, pleas contact staff@hpc.nih.gov.
- When running pipelines with local executor, nextflow might start a large number of processes or open too many files, if you encounter errors, you may have to raise the limit on the number of processes and/or open files:
ulimit -u 10240 -n 16384
- Nextflow Training: there are Basic (Foundational) training and Advanced training for nextflow, and tutorial for RNA-seq Variant Calling training. There will be a community foundational nextflow training on March 10-14 2025.
module load nextflow
nf-core --help
First, let's do some basic local execution. For this we will allocate an interactive session:
[user@biowulf]$ sinteractive --mem=10g -c2 --gres=lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ module load nextflow [user@cn3144]$ nextflow run hello N E X T F L O W ~ version 23.10.0 Pulling nextflow-io/hello ... downloaded from https://github.com/nextflow-io/hello.git Launching `https://github.com/nextflow-io/hello` [elegant_almeida] DSL2 - revision: 7588c46ffe [master] executor > local (4) [05/a7b6cf] process > sayHello (4) [100%] 4 of 4 ✔ Ciao world! Bonjour world! Hello world! Hola world!
For the traditional hello world example we will parallelize the uppercasing of different language greetings:
# create file of greetings [user@cn3144]$ mkdir testdir;cat > ./testdir/test1 <<EOF Hello world! Hallo world! Ciao world! Salut world! Bongiorno world! Servus world! EOF [user@cn3144]$ cat > ./testdir/test2 <<EOF Gruess Gott world! Na was los world! Gruetzi world! Hello world! Come va world! Ca va world! Hi world! Good bye world! EOF [user@cn3144]$ cat > test.R <<EOF args <- commandArgs(trailingOnly = TRUE) library(readr) df <- read_lines(args[1]) # create output sink(args[2]) for (each in df){ cat (toupper(each)) cat ('\n') } sink() EOF
We then create a file called rhello.nf
that describes
the workflow to be executed
// Declare syntax version nextflow.enable.dsl=2 params.output_dir = './results' process getsbatchlist { module 'R' input: path(input_file) each Rs publishDir "${params.output_dir}" output: path "${input_file}.txt" script: """ Rscript ${Rs} ${input_file} ${input_file}.txt """ } workflow { def inputf = Channel.fromPath('./testdir/test*') def Rs = Channel.fromPath('./test.R') getsbatchlist(inputf,Rs) | view }
The workflow is executed with
[user@cn3144]$ nextflow rhello.nf N E X T F L O W ~ version 23.04.1 Launching `test2.nf` [hopeful_cray] DSL2 - revision: 7401a333f4 executor > local (2) [28/6ccadb] process > getsbatchlist (2) [100%] 2 of 2 ✔ /gpfs/gsfs8/users/apptest2/work/82/9d153e5b2a5ab4399ab36beb01e552/test2.txt /gpfs/gsfs8/users/apptest2/work/28/6ccadb35b01a2755c9670375ec1a05/test1.txt [user@cn3144]$ cat results/test1.txt HELLO WORLD! HALLO WORLD! CIAO WORLD! SALUT WORLD! BONGIORNO WORLD! SERVUS WORLD!
Note that results are out of order.
The same workflow can be used to run each of the processes as a slurm job
by creating a nextflow.config
file. We provide a file with correct
settings for biowulf at /usr/local/apps/nextflow/nextflow.config
.
If you use this file please don't change settings for job submission and
querying (pollInterval, queueStatInterval, and submitRateLimit
).
In particular you might want to remove the lscratch allocation if that does not apply to your workflow. Although
it was encouraged to use lscratch as much as you can.
[user@cn3144]$ cp /usr/local/apps/nextflow/nextflow.config . [user@cn3144]$ cat nextflow.configparams { config_profile_description = 'Biowulf nf-core config' config_profile_contact = 'staff@hpc.nih.gov' config_profile_url = 'https://hpc.nih.gov/apps/nextflow.html' clusterOptions = null igenomes_base = '/fdb/igenomes_nf/' } // use a local executor for short jobs and it has to give -c and --mem to make nextflow // allocate the resource automatically. For this the // settings below may have to be adapted to the allocation for // the main nextflow job. singularity { enabled = true autoPullMode = true autoMounts = true cacheDir = "/data/$USER/nxf_singularity_cache" libraryDir = "/fdb/nxf/singularity-images" envWhitelist='https_proxy,http_proxy,ftp_proxy,DISPLAY,SLURM_JOB_ID,SINGULARITY_BINDPATH,MPLCONFIGDIR,NUMBA_CACHE_DIR' } env { SINGULARITY_CACHEDIR="/data/$USER/.singularity" OMP_NUM_THREADS = 1 NUMBA_CACHE_DIR = "/data/$USER/.cache/numba" OPENBLAS_NUM_THREADS = 1 PYTHONNOUSERSITE = 1 } // Preform work directory cleanup after a successful run cleanup = true profiles { biowulflocal { process { executor = 'local' cache = 'lenient' memory = "$SLURM_MEM_PER_NODE MB" cpus = "$SLURM_CPUS_PER_TASK" process."withLabel:gpu".containerOptions = "--nv" _JAVA_OPTIONS="-Djava.io.tmpdir=/lscratch/$SLURM_JOB_ID" } } biowulf { executor { name = 'slurm' queue = 'norm' queueSize = 200 pollInterval = '2 min' queueStatInterval = '5 min' submitRateLimit = '6/1min' } process { maxRetries = 2 resourceLimits = [ cpus: 192, memory: 751.GB, time: 240.h ] clusterOptions = ' --gres=lscratch:200 ' scratch = '/lscratch/$SLURM_JOB_ID' // with the default stageIn and stageOut settings using scratch can // result in humungous work folders // see https://github.com/nextflow-io/nextflow/issues/961 and // https://www.nextflow.io/docs/latest/process.html?highlight=stageinmode stageInMode = 'symlink' stageOutMode = 'rsync' // for running pipeline on group sharing data directory, this can avoid inconsistent files timestamps cache = 'lenient' _JAVA_OPTIONS="-Djava.io.tmpdir=/data/$USER/.cache" // example for setting different parameters for jobs with a 'gpu' label // withLabel:gpu { // queue = 'gpu' // time = '4h' // clusterOptions = " --gres=lscratch:400,gpu:1 " // clusterOptions = ' --constraint="gpua100|gpuv100|gpuv100x" ' // containerOptions = " --nv " // } // example for setting short running jobs to run with local executor for a process name // withName: 'SAMTOOLS_INDEX|MULTIQC' { // executor = 'local' // } // example for setting different parameters for a process name // withName: 'FASTP|MULTIQC' { // cpus = 6 // queue = 'quick' // memory = '6 GB' // time = '4h' // } // example for setting different parameters for jobs with a resource label // withLabel:process_low { // cpus = 2 // memory = '12 GB' // time = '4h' // } // withLabel:process_medium { // cpus = 6 // memory = '36 GB' // time = '12h' // } // withLabel:process_high { // cpus = 12 // memory = '72 GB' // time = '16 h' // } } timeline.enabled = true report.enabled = true } }[user@cn3144]$ nextflow run -profile biowulf hello.nf N E X T F L O W ~ version 20.10.0 Launching `hello.nf` [intergalactic_cray] - revision: f195027c60 executor > slurm (15) [34/d935ef] process > splitLetters [100%] 1 of 1 ✔ HELLO WORLD! [...snip...] [97/85354f] process > convertToUpper (11) [100%] 14 of 14 ✔
Run nextflow with local executor (biowulflocal profile) to utilize allocated cpus and memory on compute node, --mem and -c is essential, and use lscratch as work directory for troubleshooting:
[user@biowulf]$sinteractive --mem=80g -c 32 --gres=lscratch:200 [user@cn3144]$ nextflow run nf-core/sarek -r 3.4.4 \ -profile biowulflocal \ --wes \ --joint_germline \ --input test.csv \ --tools haplotypecaller,vep,snpeff \ --outdir /data/$USER/sarek/ \ --genome GATK.GRCh38 \ --igenomes_base /fdb/igenomes_nf \ --save_output_as_bam \ --cache_version 110 \ --vep_cache /fdb/VEP/110/cache \ --snpeff_cache /fdb/snpEff/5.1d/data/
Pipeline settings can be provided in a yaml file via -params-file
[user@cn3144]$ nextflow run nf-core/hic -profile biowulflocal -params-file params.yaml [user@cn3144]$ cat params.yaml input: './samplesheet.csv' outdir: './results/' fasta: 'https://github.com/nf-core/test-datasets/raw/hic/reference/W303_SGD_2015_JRIU00000000.fsa' digestion: 'hindiii' schema_ignore_params: 'genomes,digest,input_paths,input' min_mapq : 10 min_restriction_fragment_size : 100 max_restriction_fragment_size : 100000 min_insert_size : 100 max_insert_size : 600 bin_size : '2000,1000' res_dist_decay : '1000' res_tads : '1000' tads_caller : 'insulation,hicexplorer' res_compartments : '2000'Running nextflow with biowulf profile (slurm executor) using test input from nf-core:
[user@cn3144]$ nextflow run nf-core/mag -profile test,biowulf --outdir testout N E X T F L O W ~ version 24.10.3 Launching `https://github.com/nf-core/mag` [desperate_maxwell] DSL2 - revision: 049ea0a819 [master] ------------------------------------------------------ ,--./,-. ___ __ __ __ ___ /,-._.--~' |\ | |__ __ / ` / \ |__) |__ } { | \| | \__, \__/ | \ |___ \`-._,-`-, `._,._,' nf-core/mag 3.3.0 ------------------------------------------------------ ...
Create a batch input file (e.g. nf_main.sh) to run the master process. For example:
#! /bin/bash #SBATCH --job-name=nextflow-main #SBATCH --cpus-per-task=4 #SBATCH --mem=4G #SBATCH --gres=lscratch:200 #SBATCH --time=24:00:00 module load nextflow export NXF_SINGULARITY_CACHEDIR=/data/$USER/nxf_singularity_cache; export SINGULARITY_CACHEDIR=/data/$USER/.singularity; export TMPDIR=/lscratch/$SLURM_JOB_ID export NXF_JVM_ARGS="-Xms2g -Xmx4g" nextflow run nf-core/rnaseq -r 3.13.2 \ -profile biowulf \ --input samplesheet_test.csv \ --outdir /data/$USER/rnaseq_out \ --gtf /fdb/igenomes_nf/Homo_sapiens/Ensembl/pub/release-110/gtf/Homo_sapiens.GRCh38.110.gtf \ --fasta \ /fdb/igenomes_nf/Homo_sapiens/Ensembl/pub/release-110/fasta/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa \ --star_index /fdb/igenomes_nf/Homo_sapiens/Ensembl/pub/release-110/STARindex/ \ --igenomes_ignore --genome null \ -resume
Submit this job using the Slurm sbatch command.
sbatch nf_main.sh
Create a batch script (e.g. nf_local.sh) to run with biowulflocal profile. For example:
#! /bin/bash #SBATCH --job-name=nextflow-local #SBATCH --cpus-per-task=32 #SBATCH --mem=80G #SBATCH --gres=lscratch:200 #SBATCH --time=24:00:00 module load nextflow export NXF_SINGULARITY_CACHEDIR=/data/$USER/nxf_singularity_cache; export SINGULARITY_CACHEDIR=/data/$USER/.singularity; export TMPDIR=/lscratch/$SLURM_JOB_ID nextflow run nf-core/hic -profile biowulflocal -params-file params.yaml
Submit this job using the Slurm sbatch command.
sbatch nf_local.sh
Create a batch script (e.g. wf_basecalling_local.sh) to run with biowulflocal profile. For example:
#! /bin/bash #SBATCH --job-name=wf-basecaling #SBATCH --cpus-per-task=12 #SBATCH --mem=64G #SBATCH --time=4:00:00 #SBATCH --gres=lscratch:200,gpu:1 #SBATCH --partition=gpu #SBATCH --constraint="gpua100|gpuv100x|gpuv100" module load nextflow export NXF_SINGULARITY_CACHEDIR=/data/$USER/nxf_singularity_cache; export SINGULARITY_CACHEDIR=/data/$USER/.singularity; export TMPDIR=/lscratch/$SLURM_JOB_ID nextflow run epi2me-labs/wf-basecalling \ -profile biowulflocal \ -resume \ --input wf-basecalling-demo/input \ --ref wf-basecalling-demo/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta \ --dorado_ext pod5 \ --out_dir output \ --basecaller_cfg dna_r10.4.1_e8.2_400bps_hac@v4.1.0 \ --remora_cfg "dna_r10.4.1_e8.2_400bps_hac@v4.1.0_5mCG_5hmCG@v2"
Submit this job using the Slurm sbatch command.
sbatch wf_basecalling_local.sh
Submit this job using the Slurm sbatch command.
The master process submitting jobs should be run either as a batch job or on an interactive node - not on the biowulf login node.