Biowulf High Performance Computing at the NIH
circtools on Biowulf

Circtools is a modular, Python3-based framework for circRNA-related tools that unifies several functionalities in single command line driven software. The command line follows the circtools subcommand standard that is employed in samtools or bedtools. Currently, circtools includes modules for detecting and reconstructing circRNAs, a quick check of circRNA mapping results, RBP enrichment screenings, circRNA primer design, statistical testing, and an exon usage module.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive -c 8 --mem 20g
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load circtools

[user@cn3144 ~]$ circtools -h
usage: circtools [-V]  []
            
            Available commands:

               enrich:       circular RNA RBP enrichment scan
               primex:       circular RNA primer design tool
               detect:       circular RNA detection with DCC
               reconstruct:  circular RNA reconstruction with FUCHS
               circtest:     circular RNA statistical testing
               exon:         circular RNA alternative exon analysis
               quickcheck:   circular RNA sequencing library quick checks               
            

circtools: a modular, python-based framework for circRNA-related tools that
unifies several functions in single command line driven software.

positional arguments:
  command        Command to run

optional arguments:
  -h, --help     show this help message and exit
  -V, --version  show program's version number and exit

# Please see documentation for pre-processing of files
[user@cn3144 ~]$ circtools detect @samplesheet \ # @ is generally used to specify a file name
      -mt1 @mate1 \ # mate1 file containing the mate1 independently mapped chimeric.junction.out files
      -mt2 @mate2 \ # mate2 file containing the mate1 independently mapped chimeric.junction.out files
      -D \ # run in circular RNA detection mode
      -R GRCm38_90_repeatmasker.gtf \ # regions in this GTF file are masked from circular RNA detection
      -an Mus_musculus.GRCm38.90.gtf \ # annotation is used to assign gene names to known transcripts
      -Pi \ # run in paired independent mode, i.e. use -mt1 and -mt2
      -F \ # filter the circular RNA candidate regions
      -M \ # filter out candidates from mitochondrial chromosomes
      -Nr 5 6 \ # minimum count in one replicate [1] and number of replicates the candidate has to be detected in [2]
      -fg \ # candidates are not allowed to span more than one gene
      -G \ # also run host gene expression
      -A Mus_musculus.GRCm38.dna.primary_assembly.fa \ # name of the fasta genome reference file; must be indexed, i.e. a .fai file must be present
      -p $SLURM_CPUS_PER_TASK
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. circtools.sh). For example:

#!/bin/bash
set -e
module load circtools
circtools detect @samplesheet \ # @ is generally used to specify a file name
      -mt1 @mate1 \ # mate1 file containing the mate1 independently mapped chimeric.junction.out files
      -mt2 @mate2 \ # mate2 file containing the mate1 independently mapped chimeric.junction.out files
      -D \ # run in circular RNA detection mode
      -R GRCm38_90_repeatmasker.gtf \ # regions in this GTF file are masked from circular RNA detection
      -an Mus_musculus.GRCm38.90.gtf \ # annotation is used to assign gene names to known transcripts
      -Pi \ # run in paired independent mode, i.e. use -mt1 and -mt2
      -F \ # filter the circular RNA candidate regions
      -M \ # filter out candidates from mitochondrial chromosomes
      -Nr 5 6 \ minimum count in one replicate [1] and number of replicates the candidate has to be detected in [2]
      -fg \ # candidates are not allowed to span more than one gene
      -G \ # also run host gene expression
      -A Mus_musculus.GRCm38.dna.primary_assembly.fa \ # name of the fasta genome reference file; must be indexed, i.e. a .fai file must be present
      -p $SLURM_CPUS_PER_TASK

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=8 --mem=20g circtools.sh