Biowulf High Performance Computing at the NIH
circaidme on Biowulf

CircAidMe is a tool designed to analyze data generated with CircAID-p-seq for Oxford Nanopore Technologies. In brief, it detects known adapter sequences used by CircAID-p-seq kit for every Oxford Nanopore read. After having detected the adapters it will extract the embedded insert sequences and calculate a consensus sequence for the insert.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ module load circaidme
[user@cn3144]$ mkdir /data/$USER/circaidme/
[user@cn3144]$ cd /data/$USER/circaidme/
[user@cn3144]$ cp ${CIRCAIDME_TEST_DATA:-none}/* .
[user@cn3144]$ circaidme -h
usage: circaidme [-h] --input-file INPUT_FILE --out-path OUT_PATH
                 --adapter-name ADAPTER_NAME [--adapter-list ADAPTER_LIST]
                 [--force-overwrite] [--tag TAG]
                 [--refine-adapter-alignment {False,True}]
                 [--min-inserts MIN_INSERTS] [--cons-min-len CONS_MIN_LEN]
                 [--cons-max-len CONS_MAX_LEN] [--keep-forward]
                 [--no-store-removed-reads] [--iter-first-muscle {1,2,3}]
                 [--iter-second-muscle {1,2,3,4}] [--threads THREADS]
                 [--version]

CircAidMe v0.1.1 -- Tool for the analysis of CircAID-p-seq data -- Designed
and implemented by Genexa AG, Switzerland (genexa.ch) & Immagina BioTechnology
S.R.L., Italy (immaginabiotech.com)

required arguments:
  --input-file INPUT_FILE
                        FASTA/FASTQ file with CircAID-p-seq data (default:
                        None)
  --out-path OUT_PATH   path to store results (also used for temp files)
                        (default: None)
  --adapter-name ADAPTER_NAME
                        define which adapter to be used OR "ALL" for all the
                        available adapters OR "LIST" if you want to provide
                        the list of adapters to be used with argument "--
                        adapter-list". Predefined adapters are: "Luc20_DNA,
                        ADR7391_RNA, ADR1_RNA, ADR2_RNA, ADR3_RNA,ADR4_RNA,
                        ADR1572_RNA, ADR1859_RNA, ADR2520_RNA, ADR2858_RNA,
                        ADR323_RNA, ADR4314_RNA, ADR4557_RNA, ADR4885_RNA,
                        ADR5555_RNA" (default: None)

optional arguments:
  --adapter-list ADAPTER_LIST
                        for user-defined adapter list (comma separated list)
                        (default: None)
  --force-overwrite     set flag if you want to overwrite result files
                        (default: False)
  --tag TAG             tag to be added to the output FASTA file (default:
                        none)
  --refine-adapter-alignment {False,True}
                        choose if adapter alignment has to be refined
                        (default: True)
  --min-inserts MIN_INSERTS
                        number of inserts which have to be present in order to
                        calculate a consensus sequence (default: 2)
  --cons-min-len CONS_MIN_LEN
                        minimal length of the consensus sequence (default: 15)
  --cons-max-len CONS_MAX_LEN
                        maximal length of the consensus sequence (default: 40)
  --keep-forward        set flag if reads with only "forward" inserts must be
                        kept (default: False)
  --no-store-removed-reads
                        set flag if removed reads do NOT have to be written to
                        a separate FASTA file (default: False)
  --iter-first-muscle {1,2,3}
                        number of iterations MUSCLE has to perform for first
                        MSA calculation (default: 2)
  --iter-second-muscle {1,2,3,4}
                        number of iterations MUSCLE has to perform for second
                        MSA calculation (default: 3)
  --threads THREADS     number of threads to be used (default: 1)
  --version             show program's version number and exit

[user@cn3144]$ circaidme --input-file CircAID_testdata.fastq --out-path ./testout --adapter-name ALL
[user@cn3144]$ tree ./testout
./testout
├── CircAID_testdata.csv
├── CircAID_testdata.fasta
├── CircAID_testdata.log
└── CircAID_testdata_removed_reads.fasta

[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. circaidme.sh). For example:


#!/bin/bash
set -e
module load circaidme
cd /data/$USER/circaidme/
circaidme --input-file CircAID_testdata.fastq --out-path ./testout --adapter-name ALL --threads 4

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=4 circaidme.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. circaidme.swarm). For example:

circaidme --input-file CircAID_testdata1.fastq --out-path ./testout2 --adapter-name ALL --threads 4
circaidme --input-file CircAID_testdata2.fastq --out-path ./testout2 --adapter-name ALL --threads 4

Submit this job using the swarm command.

swarm -f circaidme.swarm [-g #] [-t #] --module circaidme