Nextflow is a domain specific language modelled after UNIX pipes. It simplifies writing parallel and scalable pipelines. The version installed on our systems can run jobs locally (on the same machine) and by submitting to Slurm.
The code that is executed at each pipeline stage can be written in a number of different languages (shell, python, R, ...).
Intermediate results for workflows are stored in the $PWD/work
directory which allows resuming execution of pipelines.
The language used to write pipeline scripts is an extension of groovy.
Nextflow is a complex workflow management tool. Please read the manual carefully and make sure to place appropriate limits on your pipeline to avoid submitting too many jobs or running too many local processes.
Nextflow, when running many tasks appears to create many temp files in the ./work directory. Please make sure that your pipeline does not inadvertantly create millions of small files which would result in a degradation of file system performance.
pollInterval
and queueStatInterval
to reduce the frequency
with which nextflow polls slurm. The default frequency creates too many
queries and results in unnecessary load on the scheduler.First, let's do some basic local execution. For this we will allocate an interactive session:
[user@biowulf]$ sinteractive --mem=10g -c2 --gres=lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ module load nextflow
For the traditional hello world example we will parallelize the uppercasing of different language greetings:
# create file of greetings [user@cn3144]$ cat > greetings.txt <<EOF Hello world! Hallo world! Ciao world! Salut world! Bongiorno world! Servus world! Gruess Gott world! Na was los world! Gruetzi world! Hello world! Come va world! Ca va world! Hi world! Good bye world! EOF
We then create a file called hello.nf
that describes
the workflow to be executed
// vim: set ft=groovy: params.file = file('greetings.txt').toAbsolutePath() process splitLetters { output: file 'chunk_*' into letters mode flatten """ pwd split -l1 '${params.file}' chunk_ """ } process convertToUpper { input: file x from letters output: stdout result """ cat $x | tr '[a-z]' '[A-Z]' """ } result.subscribe { println it.trim() }
The workflow is executed with
[user@cn3144]$ nextflow run hello.nf N E X T F L O W ~ version 20.10.0 Launching `hello.nf` [prickly_knuth] - revision: f195027c60 executor > local (5) [ee/06a621] process > splitLetters [100%] 1 of 1 ✔ [d9/5e328a] process > convertToUpper (3) [ 0%] 0 of 11 SALUT WORLD! BONGIORNO WORLD! CIAO WORLD! HELLO WORLD! HALLO WORLD! SERVUS WORLD! GRUETZI WORLD! COME VA WORLD! CA VA WORLD! executor > local (15) [ee/06a621] process > splitLetters [100%] 1 of 1 ✔ executor > local (15) [ee/06a621] process > splitLetters [100%] 1 of 1 ✔ [b4/2b9395] process > convertToUpper (13) [100%] 14 of 14 ✔
Note that results are out of order.
The same workflow can be used to run each of the processes as a slurm job
by creating a nextflow.config
file. We provide a file with correct
settings for biowulf at /usr/local/apps/nextflow/nextflow.config
.
If you use this file please don't change settings for job submission and
querying (pollInterval, queueStatInterval, and submitRateLimit
).
In particular you might want to remove the lscratch allocation for all
jobs if that does not apply to your workflow.
[user@cn3144]$ cp /usr/local/apps/nextflow/nextflow.config . [user@cn3144]$ cat nextflow.config params { config_profile_description = 'Biowulf nf-core config' config_profile_contact = 'staff@hpc.nih.gov' config_profile_url = 'https://hpc.nih.gov/apps/nextflow.html' max_memory = '224 GB' max_cpus = 32 max_time = '72 h' igenomes_base = '/fdb/igenomes/' } container_mounts = '-B/gs10,/gs11,/gs12,/gs6,/gs8,/gs9,/vf,/spin1,/data,/fdb,/gpfs,/lscratch' // use a local executor for short jobs. For this the // settings below may have to be adapted to the allocation for // the main nextflow job executor { $local { queueSize = 100 memory = '12 G' cpus = '6' } $slurm { queue = 'norm' queueSize = 200 pollInterval = '1 min' queueStatInterval = '5 min' submitRateLimit = '6/1min' retry.maxAttempts = 1 } } profiles { biowulf { process { executor = 'slurm' errorStrategy = 'finish' maxRetries = 0 clusterOptions = ' --gres=lscratch:200 ' containerOptions = " $container_mounts " scratch = '/lscratch/$SLURM_JOBID' // with the default stageIn and stageOut settings using scratch can // result in humungous work folders // see https://github.com/nextflow-io/nextflow/issues/961 and // https://www.nextflow.io/docs/latest/process.html?highlight=stageinmode stageInMode = 'symlink' stageOutMode = 'rsync' } // example for setting different parameters for jobs with a 'gpu' label //withLabel:gpu { // queue = 'gpu' // time = '36h' // clusterOptions = " --gres=lscratch:400,gpu:v100x:1 " // containerOptions = " --nv $mounts " //} singularity.enabled = true singularity.autoMounts = true singularity.cacheDir = "$PWD/singularity" singularity.envWhitelist='https_proxy,http_proxy,ftp_proxy,DISPLAY,SLURM_JOBID' timeline.enabled = true report.enabled = true } } [user@cn3144]$ nextflow run -profile biowulf hello.nf N E X T F L O W ~ version 20.10.0 Launching `hello.nf` [intergalactic_cray] - revision: f195027c60 executor > slurm (15) [34/d935ef] process > splitLetters [100%] 1 of 1 ✔ HELLO WORLD! [...snip...] [97/85354f] process > convertToUpper (11) [100%] 14 of 14 ✔
The master process submitting jobs should be run either as a batch job or on an interactive node - not on the biowulf login node.