High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
Circleseq on Biowulf & Helix

Description

Circleseq takes sample-specific paired end FASTQ files as input and produces a list of CIRCLE-seq detected off-target cleavage sites as output. The individual pipeline steps are:

1. Merge: Merge read1 an read2 for easier mapping to genome.
2. Read Alignment: Merged paired end reads from the assay are aligned to the reference genome using the BWA-MEM algorithm with default parameters (Li. H, 2009).
3. Cleavage Site Identification: Mapped sites are analyzed to determine which represent high-quality cleavage sites.
4. Visualization of Results: Identified on-target and off-target cleavage sites are rendered as a color-coded alignment map for easy analysis of results.

There may be multiple versions available on our systems. An easy way of selecting the version is to use modules. To see the modules available, type

module avail circleseq

To select a module use

module load circleseq/[version]

where [version] is the version of choice.

Environment variables set
Documentation

https://github.com/tsailabSJ/circleseq

Interactive Job on Biowulf

Allocate an interactive session with sinteractive and use as described below

biowulf$ sinteractive --mem=20g
salloc.exe: Pending job allocation 38978697
[...snip...]
salloc.exe: Nodes cn2273 are ready for job
node$ module load circleseq
[+] Loading circleseq
node$ python $SCRIPTDIR/circleseq.py all --manifest /path/to/manifest.yaml
[...snip...]
node$ exit
biowulf$

 

Batch job on Biowulf

Create a batch script similar to the following example:

#! /bin/bash
# this file is file.batch

module load circleseq || exit 1
cd /data/$USER
python $SCRIPTDIR/circleseq.py all --manifest /path/to/manifest.yaml 

Submit to the queue with sbatch:

biowulf$ sbatch file.batch

 

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. script.swarm). For example:

# this file is called script.swarm
cd dir1;circleseq command 1;circleseq command 2
cd dir2;circleseq command 1;circleseq command 2
cd dir3;circleseq command 1;circleseq command 2
[...]

Submit this job using the swarm command.

swarm -f script.swarm --module circleseq

For more information regarding swarm: https://hpc.nih.gov/apps/swarm.html#usage