Biowulf High Performance Computing at the NIH
eager on Biowulf

Eager is a tool for the efficient reconstruction of ancient genomes. It includes a GUI to create XML configuration files and a command line interface to run the analysis pipeline.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session with a graphical connection - either from an NX session or with X11 forwarding.

[user@biowulf]$ sinteractive --cpus-per-task=6 --mem=30g --gres=lscratch:50
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID

We will use a medieval Mycobacterium leprae sample from Schuenemann et al.

[user@cn3144]$ cp -Lr ${EAGER_TEST_DATA:-none} data
[user@cn3144]$ # start the GUI
[user@cn3144]$ singularity exec -B $PWD/data:/data $EAGER_IMG eager

Follow the tutorial in the eager docs to create a configuration file. Note that the data directory is mounted at /data inside the container. Then run the pipeline with the command line interface

[user@cn3144]$ singularity exec -B $PWD/data:/data $EAGER_IMG eagercli /data/Results
Found 1 input configuration files.
Processing file # 1
Schaffa, Schaffa, Genome baua!
Checking for file at path: /data/References/NC_011896.dict

# ModulePoolPaths: [/data/RAW/sk8/sk8_R1.fastq.gz, /data/RAW/sk8/sk8_R2.fastq.gz]
# Module that will be now executed: CreateResultsDirectories
[...snip...]
[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf]$

Batch job
Most jobs should be run as batch jobs.

Create a configuration file for a run either using the GUI or some other way. Then create a batch script file. For example:

#!/bin/bash
module load eager/1.92

datad=/data/$USER/eager/example
singularity exec -B ${datad}:/data $EAGER_IMG eagercli /data/Results/sample1/2018-09-05-16-27-EAGER.xml

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=4 --mem=32g eager.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. eager.swarm). For example:

singularity exec -B /data/$USER/eager:/data $EAGER_IMG eagercli /data/Results/sample1/2018-09-05-16-27-EAGER.xml
singularity exec -B /data/$USER/eager:/data $EAGER_IMG eagercli /data/Results/sample2/2018-09-05-16-27-EAGER.xml
singularity exec -B /data/$USER/eager:/data $EAGER_IMG eagercli /data/Results/sample3/2018-09-05-16-27-EAGER.xml

Submit this job using the swarm command.

swarm -f eager.swarm -g 32 -t 4 --module eager/1.92
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module eager Loads the eager module for each subjob in the swarm