bamUtil is a collection of utilities for manipulating bam files compiled
into a single executable called bam
. Available tools:
Tools to rewrite SAM/BAM files | |
---|---|
convert | Convert SAM/BAM to SAM/BAM |
writeRegion | Write a file with reads in the specified region and/or have the specified read name |
splitChromosome | Split BAM by Chromosome |
splitBam | Split a BAM file into multiple BAM files based on ReadGroup |
findCigars | Output just the reads that contain any of the specified CIGAR operations. |
Tools to modify SAM/BAM files | |
clipOverlap | Clip overlapping read pairs in a SAM/BAM File already sorted by Coordinate or ReadName |
filter | Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high |
revert | Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags |
squeeze | reduces files size by dropping OQ fields, duplicates, & specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers |
trimBam | Trim the ends of reads in a SAM/BAM file changing read ends to 'N' and quality to '!' |
mergeBam | merge multiple BAMs and headers appending ReadGroupIDs if necessary |
polishBam | adds/updates header lines & adds the RG tag to each record |
dedup | Mark Duplicates |
recab | Recalibrate |
Tools to extract information | |
validate | Validate a SAM/BAM File |
diff | Diff 2 coordinate sorted SAM/BAM files. |
stats | Stats a SAM/BAM File |
gapInfo | Print information on the gap between read pairs in a SAM/BAM File. |
dumpHeader | Print SAM/BAM Header |
dumpRefInfo | Print SAM/BAM Reference Name Information |
dumpIndex | Print BAM Index File in English |
readReference | Print the reference string for the specified region |
explainFlags | Describe flags |
bam2FastQ | Convert the specified BAM file to fastQs. |
readIndexedBam | Read Indexed BAM By Reference and write it from reference id |
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load bamutil [+] Loading bamutil 1.0.14 [user@cn3144 ~]$ cd /data/$USER/test_data/bam [user@cn3144 ~]$ bam stats --in read1_250k.bam --qual --basic Number of records read = 170016 TotalReads 170016.00 MappedReads 170016.00 PairedReads 0.00 ProperPair 0.00 ..... [user@cn3144 ~]$ bam explainFlags --dec 96 0x60 (96): mate reverse strand first fragment [user@cn3144 ~]$ bam dedup --in read1_500k_sorted.bam --out /scratch/$USER/temp.bam \ --rmDups [user@cn3144 ~]$ cat /scratch/$USER/temp.bam.log ...... -------------------------------------------------------------------------- SUMMARY STATISTICS OF THE READS Total number of reads: 326389 Total number of paired-end reads: 0 Total number of properly paired reads: 0 Total number of unmapped reads: 0 Total number of reverse strand mapped reads: 162220 Total number of QC-failed reads: 0 Total number of secondary reads: 0 Size of singleKeyMap (must be zero): 0 Size of pairedKeyMap (must be zero): 0 Total number of missing mates: 0 Total number of reads excluded from duplicate checking: 0 -------------------------------------------------------------------------- Sorting the indices of 3651 duplicated records [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. bamutil.sh). For example:
#!/bin/bash #SBATCH --job-name=bamUtil set -e module load bamutil inbam=/data/$USER/test_data/bam/gcat_set_025_raw.bam obam=/data/$USER/test_data/temp/gcat_set_025.clean.bam dbsnp=/data/$USER/test_data/snp/snp138_pos.hg19 genome=/fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa # bamutil wants to write to the directory containing the genome file mkdir /lscratch/$SLURM_JOB_ID cp $genome /lscratch/$SLURM_JOB_ID lgenome=/lscratch/$SLURM_JOB_ID/$(basename $genome) bam dedup --in $inbam --out $obam \ --rmDups --verbose --recab \ --refFile $lgenome \ --dbsnp $dbsnp \ --storeQualTag OQ --maxBaseQual 40 rm -rf /lscratch/$SLURM_JOB_ID
Submit this job using the Slurm sbatch command.
sbatch --mem=4g bamutil.sh
Create a swarmfile (e.g. bamutil.swarm). For example:
bam squeeze --in file1.bam --out file1s.bam --refFile genome.fa bam squeeze --in file2.bam --out file2s.bam --refFile genome.fa bam squeeze --in file3.bam --out file3s.bam --refFile genome.fa
Submit this job using the swarm command.
swarm -f bamutil.swarm --module bamutilwhere
--module bamutil | Loads the bamutil module for each subjob in the swarm |