biomodal CLI on Biowulf

Quick Links

The biomodal pipeline processes FASTQ files generated using the biomodal library preparation kits. The pipeline utilises Nextflow as the orchestration tool.

Documentation

biomodal documentation

Important Notes

Module Name: biomodal (see the modules page for more information)
Users cannot run 'biomodal init' or 'biomodal test'.
Users must set --work-dir option to the 'biomodal analyse' command.
A 'biomodal analyse' job does NOT require large resources, just enough time for nextflow to finish all the pipeline steps which are submitted as separate jobs with their own allocations.
This pipeline produces HTML reports. You can use hpcdrive to view these reports on your local workstation.
Example files can be accessed using the BIOMODAL_TEST_DATA environment variable. This dataset is 200G in size, and will produce a further 700G data in results and work files. Please make sure you have enough free space in your data directory before running the example. See interactive session below.

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load biomodal

[user@cn3144 ~]$ cd /data/$USER 

[user@cn3144 user]$ cp -r $BIOMODAL_TEST_DATA .

[user@cn3144 user]$ biomodal analyse --work-dir TEST_DATA/nf-work \
  --input-path TEST_DATA \
  --meta-file TEST_DATA/CEGX_Run_meta.csv \
  --output-path TEST_DATA \
  --additional-profile deep_seq \
  --tag test_demo_data --mode 6bp

[user@cn3144 user]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. biomodal.sh). For example:

#!/bin/bash
set -e
module load biomodal
biomodal analyse --work-dir /data/$USER/nf-work \
  --input-path /data/$USER/mybiomodaldata \
  --meta-file /data/$USER/mybiomodaldata/meta.csv \
  --output-path /data/$USER/mybiomodaldata \
  --additional-profile deep_seq \
  --tag mydata --mode 6bp

Make sure to update the paths according to your data location. Submit this job using the Slurm sbatch command. You do not need to provide large resources to this job since it is the orchestrating job; just provide enough wall time. For example:

sbatch --cpus-per-task=4 --mem=8G --time=2-00:00:00 biomodal.sh