Abyss on HPC
ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The parallel version is implemented using MPI and is capable of assembling larger genomes.
References:
Documentation
- Module Name: abyss (see the modules page for more information)
- MPI app; when running abyss on multiple nodes, please follow the syntax in the example below instead of using traditional MPI syntax.
Batch job (in MPI mold)
Most jobs should be run as batch jobs.
Create a batch input file (e.g. batch.sh). For example:
#!/bin/bash #SBATCH --job-name="abyss" #SBATCH --mail-type=BEGIN,END ### sbatch --partition=multinode --ntasks=16 --nodes=2 --time=24:00:00 --mem=60g batch.sh cd /data/$USER/abyss module load abyss `which abyss-pe` np=${SLURM_NTASKS} j=8 k=25 n=10 in='/data/$USER/File_1.fq /data/$USER/File_2.fq' name=OutputPrefix
Submit this job using the Slurm sbatch command.
sbatch --partition=multinode --ntasks=16 --nodes=2 --time=24:00:00 --mem=60g batch.sh
The job runs 16 tasks ( np=${SLURM_NTASKS} ) on 2 nodes with 8 (j=8) cpus each node.
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
Create a swarmfile (e.g. job.swarm). For example:
cd dir1;abyss-pe np=${SLURM_NTASKS} j=8 k=25 n=10 in='1.fq 2.fq' name=out cd dir2;abyss-pe np=${SLURM_NTASKS} j=8 k=25 n=10 in='1.fq 2.fq' name=out cd dir3;abyss-pe np=${SLURM_NTASKS} j=8 k=25 n=10 in='1.fq 2.fq' name=out
Submit this job using the swarm command.
swarm -f job.swarm --module abyss --sbatch "--partition=multinode --ntasks=16 --nodes=2 --time=24:00:00 --mem=60g"where
--sbatch | use this flag to pass sbatch flags to swarm |
--module | Loads the module for each subjob in the swarm |