Broccoli on Biowulf
Broccoli is a BIDS-compliant pipeline for fast analysis of fMRI data.
Web site
Documentation
Important Notes
- Module Name: broccoli (see the modules page for more information)
- broccoli is a scientific pipeline with the potential to overload Biowulf's central filesystem. To avoid filesystem issues we recommend the following:
- Limit I/O with central Biowulf directories (e.g., /data, /home) by making use of local disk (/lscratch). You can make use of lscratch to store output directories by directly assigning the output directory of broccoli to /lscratch/$SLURM_JOB_ID (remember to allocate enough space in /lscratch), then copy the output back to your data directory at the end of your job (see batch example below). Additionally, you should copy your input dataset and cohort and design files to /lscratch/$SLURM_JOB_ID to keep I/O local as much as possible (see batch example below).
- Profile/benchmark broccoli jobs: We recommend making sure a given broccoli job can scale before launching a large number of jobs. You can do this by profiling/benchmarking small jobs (e.g., a swarm of 3 broccoli commands), then monitoring the jobs by using either the user dashboard or the commands jobhist, sjobs, squeue, and jobload (see biowulf utilities). There are resources prepared by the HPC staff that go over how to monitor your jobs for benchmarking and profiling (video, and slides). Once you have profiled a swarm with a few jobs and you have refined the memory, CPU (and walltime) requirements, it is probably safe to (gradually) increase the number of broccoli jobs. For many analyses pipelines one has no way of knowing in advance how much memory or CPUs will be actually required in an HPC environment. This is why it is very important to profile/benchmark.
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program.
Sample session to display usage (user input in bold):
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 12345678 salloc.exe: job 12345678 queued and waiting for resources salloc.exe: job 12345678 has been allocated resources salloc.exe: Granted job allocation 12345678 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn1234 are ready for job [user@cn1234 ~]$ module load broccoli [+] Loading singularity on cn1234 [+] Loading broccoli 1.0.1 ... [user@cn1234 ~]$ broccolipeline usage: broccolipipeline bids_dir output_dir analysis_type [participant(s)] [user@cn1234 ~]$ exit salloc.exe: Relinquishing job allocation 12345678 [user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. broccoli.sh). For example:
#!/bin/bash # sbatch --mem=10g --cpus-per-task=48 --time=05:00:00 broccoli.sh set -e function fail { echo "FAIL: $@" >&2 exit 1 # signal failure } module load broccoli/1.0.1 broccolipipeline /data/user/BROCCOLI_TEST/ds001 /data/user/BROCCOLI_TEST/OUTPUT participant
Submit this job using the Slurm sbatch command.
sbatch [--gres=lscratch:#] [--cpus-per-task=#] [--mem=#] broccoli.sh