High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Lobstr on Biowulf & Helix

Description

lobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data.

There may be multiple versions available on our systems. An easy way of selecting the version is to use modules. To see the modules available, type

module avail lobstr 

To select a module use

module load lobstr/[version]

where [version] is the version of choice.

Environment variables set
Documentation

http://lobstr.teamerlich.org/index.html

Lobstr job on Biowulf

Allocate an interactive session with sinteractive and use as described below

biowulf$ sinteractive --mem=5g
salloc.exe: Pending job allocation 38978697
[...snip...]
salloc.exe: Nodes cn2273 are ready for job
node$ module load lobstr
[+] Loading lobstr
node$ lobSTR -f FILE1,FILE2,.. \
--index-prefix PATH_TO_INDEX/lobSTR_ \
-o OUTPUT_PREFIX \
--rg-sample SAMPLE \
--rg-lib LIB [...snip...] node$ exit biowulf$

 

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is lobstr.batch

module load lobstr || exit 1
cd /data/$USER
lobSTR -f FILE1,FILE2,.. \
--index-prefix PATH_TO_INDEX/lobSTR_ \
-o OUTPUT_PREFIX \
--rg-sample SAMPLE \
--rg-lib LIB

Submit to the queue with sbatch:

biowulf$ sbatch lobstr.batch

 

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. script.swarm). For example:

# this file is called script.swarm
cd dir1;lobstr command 1;lobstr command 2
cd dir2;lobstr command 1;lobstr command 2
cd dir3;lobstr command 1;lobstr command 2
[...]

Submit this job using the swarm command.

swarm -f script.swarm --module lobstr

For more information regarding swarm: https://hpc.nih.gov/apps/swarm.html#usage