Biowulf High Performance Computing at the NIH
Atropos: specific, sensitive, and speedy trimming of sequencing reads

A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Atropos trims reads with high sensitivity and specificity while maintaining leadingedge speed.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
[user@cn3316 ~]$ module load atropos
Create testing data in FASTA format:
[user@cn3316 ~]$ printf ">myseq\\nACGTGCGCATGCCA" > test.fa
[user@cn3316 ~]$ printf "printf "printf "@myseq\nACGTGCGCATGCCA\n+\n9C;=;=<9@4868>" > test.fastq
Run atropos on the sample data:
 atropos atropos -a CCA -se test.fa -o res.fa

Atropos version: 1.1.18
Python version: 3.5.2
Command line parameters: trim -a CCA -se test.fa -o res.fa

Sample ID: test
Input format: FASTA, Read 1, w/o Qualities
Input files:

Start time: 2018-05-10T21:59:41.572528
Wallclock time: 0.01 s (10000 us/read; 0.01 M reads/minute)
CPU time (main process): 0.01 s


Reads                               records fraction
----------------------------------- ------- --------
Total reads processed:               1
Reads with adapters:                 1 100.0%
Reads written (passing filters):     1 100.0%

Base pairs                          bp fraction
----------------------------------- -- --------
Total bp processed:                 14
Total bp written (filtered):        11 78.6%

Adapter 1

Sequence Type       Length Trimmed (x)
-------- ---------- ------ -----------
CCA      regular 3'      3           1

No. of allowed errors:
0-3 bp: 0

Bases preceding removed adapters:
  A           0.0%
  C           0.0%
  G          100.0%
  T           0.0%
  none/other  0.0%

Overview of removed sequences:
length count expect max.err error counts
------ ----- ------ ------- ------------
     3     1    0.0       0 1 

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

module load atropos
# Create testing data in FASTQ format:
printf "printf "@myseq\nACGTGCGCATGCCA\n+\n9C;=;=<9@4868>" > test.fastq
atropos -a CCA -se test.fastaq -o res.fastaq

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#]