fade on Biowulf
FADE(Fragmentase Artifact Detection and Elimination) is a method of identification and removal of enymatic fragmentation artifacts.
Features
- fade accepts SAM/BAM/CRAM files containing reads that have been mapped to a reference genome and filters or cleans up artifact-containing reads according to the following procedure.
- The annotate subcommand performs the initial analysis and adds BAM tags encoding information concerning artifact status to the alignments, used during filtration to remove the artifacts.
- The out subcommand eliminates the artifact. It either removes reads from the output BAM/SAM file completely if they or their mate contain an identified fragmentation artifact or artifact-containing reads are trimmed to remove extraneous sequence originating from the opposite strand.
- The stats subcommand reports extended information on all reads identified by annotate to contain the artifact.
- The stats-clip subcommand reports information on all soft-clipped sequences present.
- The extract subcommand allows the extraction of the artifact sequences in their remapped state.
References:
- Thomas Gregory, Apollinaire Ngankeu, Shelley Orwick, Esko A Kautto, Jennifer A Woyach, John C Byrd, James S Blachly Characterization and mitigation of fragmentation enzyme-induced dual stranded artifacts NAR Genomics and Bioinformatics, Volume 2, Issue 4, December 2020, lqaa070, https://doi.org/10.1093/nargab/lqaa070 PubMed | Journal
Documentation
- fade Main Site:Main Site
Important Notes
- Module Name: fade (see the modules page for more information)
- fade is easy to run:
fade -h
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --cpus-per-task=2 --mem=4G salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load fade [user@cn3144 ~]$ mkdir -p /data/$USER/fade; cd /data/$USER/fade [user@cn3144 fade]$ fade -h Fragmentase Artifact Detection and Elimination usage: ./fade [subcommand] annotate: marks artifact reads in bam tags (must be done first) out: eliminates artifact from reads(may require queryname sorted bam) stats: reports extended information about artifact reads stats-clip: reports extended information about all soft-clipped reads extract: extracts artifacts into a mapped bam -h --help This help information. [user@cn3144 fade]$ fade annotate Fragmentase Artifact Detection and Elimination annotate: performs re-alignment of soft-clips and annotates bam records with bitflag (rs) and realignment tags (am) usage: ./fade annotate [BAM/SAM input] [Indexed fasta reference] -t --threads extra threads for parsing the bam file --min-length Minimum number of bases for a soft-clip to be considered for artifact detection -w --window-size Number of bases considered outside of read or mate region for re-alignment -b --bam output bam -u --ubam output uncompressed bam -h --help This help information. [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. fade.sh). For example:
#!/bin/bash #SBATCH --job-name=S1_fade #SBATCH --output=S1_fade.out #SBATCH --ntasks=1 #SBATCH --cpus-per-task=8 #SBATCH --mem=4Gb #SBATCH --time=2:00:00 #SBATCH --partition=norm set -e module load fade cd /data/$USER/fade fade annotate -t 8 -b sam1.bam ref.fa > sam1.anno.bam
Submit the job:
sbatch fade.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
Create a swarmfile (e.g. fade.swarm). For example:
fade annotate -t 8 -b sam1.bam ref.fa > sam1.anno.bam fade annotate -t 8 -b sam2.bam ref.fa > sam2.anno.bam fade annotate -t 8 -b sam3.bam ref.fa > sam3.anno.bam
Submit this job using the swarm command.
swarm -f fade.swarm -g 8 --module fadewhere
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |