High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Preseq on Biowulf & Helix

Description

The preseq package is aimed at predicting the yield of distinct reads from a genomic library from an initial sequencing experiment. The estimates can then be used to examine the utility of further sequencing, optimize the sequencing depth, or to screen multiple libraries to avoid low complexity samples.

There may be multiple versions available on our systems. An easy way of selecting the version is to use modules. To see the modules available, type

module avail preseq 

To select a module use

module load preseq/[version]

where [version] is the version of choice.

Environment variables set
Documentation

https://github.com/smithlabcode/preseq

Interactive job on Biowulf

Allocate an interactive session with sinteractive and use as described below

biowulf$ sinteractive --mem=5g
salloc.exe: Pending job allocation 38978697
[...snip...]
salloc.exe: Nodes cn2273 are ready for job
node$ module load preseq
[+] Loading preseq
node$ preseq lc_extrap -o yield_estimates.txt input.bed
[...snip...]
node$ exit
biowulf$

 

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is preseq.batch

module load preseq || exit 1
cd /data/$USER
preseq lc_extrap -o yield_estimates.txt input.bed

Submit to the queue with sbatch:

biowulf$ sbatch preseq.batch

 

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. script.swarm). For example:

# this file is called script.swarm
cd dir1;preseq command 1;preseq command 2
cd dir2;preseq command 1;preseq command 2
cd dir3;preseq command 1;preseq command 2
[...]

Submit this job using the swarm command.

swarm -f script.swarm --module preseq

For more information regarding swarm: https://hpc.nih.gov/apps/swarm.html#usage