A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Atropos trims reads with high sensitivity and specificity while maintaining leadingedge speed.
References:
- John P. Didion, Marcel Martin and Francis S. Collins
Atropos: specific, sensitive, and speedy trimming of sequencing reads.
PeerJ 5:e3720 https://doi.org/10.7717/peerj.3720
- Module Name: atropos (see the modules page for more information)
- Implemented as a Singularity container
- Multithreaded
- Implements of a new insert alignment-based trimming algorithm for paired-end reads
- Options for trimming specific types of data (miRNA, bisulfite-seq)
- A new command ('detect') that will detect adapter sequences and other potential contaminants
- A new command ('error') that will estimate the sequencing error rate
- A new command ('qc') that generates read statistics similar to FastQC
- The ability to read SAM/BAM files and read/write interleaved FASTQ files
- Direct trimming of reads from an SRA accession
- Unusual environment variables set
- ATROPOS_HOME Atropos installation directory
- ATROPOS_BIN Atropos executable directory
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive [user@cn3316 ~]$ module load atroposCreate testing data in FASTA format:
[user@cn3316 ~]$ printf ">myseq\\nACGTGCGCATGCCA" > test.fa [user@cn3316 ~]$ printf "printf "printf "@myseq\nACGTGCGCATGCCA\n+\n9C;=;=<9@4868>" > test.fastqRun atropos on the sample data:
atropos atropos -a CCA -se test.fa -o res.fa ... ======= Atropos ======= Atropos version: 1.1.18 Python version: 3.5.2 Command line parameters: trim -a CCA -se test.fa -o res.fa Sample ID: test Input format: FASTA, Read 1, w/o Qualities Input files: /gpfs/gsfs7/users/user/atropos/test.fa Start time: 2018-05-10T21:59:41.572528 Wallclock time: 0.01 s (10000 us/read; 0.01 M reads/minute) CPU time (main process): 0.01 s -------- Trimming -------- Reads records fraction ----------------------------------- ------- -------- Total reads processed: 1 Reads with adapters: 1 100.0% Reads written (passing filters): 1 100.0% Base pairs bp fraction ----------------------------------- -- -------- Total bp processed: 14 Total bp written (filtered): 11 78.6% --------- Adapter 1 --------- Sequence Type Length Trimmed (x) -------- ---------- ------ ----------- CCA regular 3' 3 1 No. of allowed errors: 0-3 bp: 0 Bases preceding removed adapters: A 0.0% C 0.0% G 100.0% T 0.0% none/other 0.0% Overview of removed sequences: length count expect max.err error counts 0 ------ ----- ------ ------- ------------ 3 1 0.0 0 1 [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. atropos.sh). For example:
#!/bin/bash module load atropos # Create testing data in FASTQ format: printf "printf "@myseq\nACGTGCGCATGCCA\n+\n9C;=;=<9@4868>" > test.fastq atropos -a CCA -se test.fastaq -o res.fastaq
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] atropos.sh