bamUtil is a collection of utilities for manipulating bam files compiled
into a single executable called bam. Available tools:
| Tools to rewrite SAM/BAM files | |
|---|---|
| convert | Convert SAM/BAM to SAM/BAM |
| writeRegion | Write a file with reads in the specified region and/or have the specified read name |
| splitChromosome | Split BAM by Chromosome |
| splitBam | Split a BAM file into multiple BAM files based on ReadGroup |
| findCigars | Output just the reads that contain any of the specified CIGAR operations. |
| Tools to modify SAM/BAM files | |
| clipOverlap | Clip overlapping read pairs in a SAM/BAM File already sorted by Coordinate or ReadName |
| filter | Filter reads by clipping ends with too high of a mismatch percentage and by marking reads unmapped if the quality of mismatches is too high |
| revert | Revert SAM/BAM replacing the specified fields with their previous values (if known) and removes specified tags |
| squeeze | reduces files size by dropping OQ fields, duplicates, & specified tags, using '=' when a base matches the reference, binning quality scores, and replacing readNames with unique integers |
| trimBam | Trim the ends of reads in a SAM/BAM file changing read ends to 'N' and quality to '!' |
| mergeBam | merge multiple BAMs and headers appending ReadGroupIDs if necessary |
| polishBam | adds/updates header lines & adds the RG tag to each record |
| dedup | Mark Duplicates |
| recab | Recalibrate |
| Tools to extract information | |
| validate | Validate a SAM/BAM File |
| diff | Diff 2 coordinate sorted SAM/BAM files. |
| stats | Stats a SAM/BAM File |
| gapInfo | Print information on the gap between read pairs in a SAM/BAM File. |
| dumpHeader | Print SAM/BAM Header |
| dumpRefInfo | Print SAM/BAM Reference Name Information |
| dumpIndex | Print BAM Index File in English |
| readReference | Print the reference string for the specified region |
| explainFlags | Describe flags |
| bam2FastQ | Convert the specified BAM file to fastQs. |
| readIndexedBam | Read Indexed BAM By Reference and write it from reference id |
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load bamutil
[+] Loading bamutil 1.0.14
[user@cn3144 ~]$ cd /data/$USER/test_data/bam
[user@cn3144 ~]$ bam stats --in read1_250k.bam --qual --basic
Number of records read = 170016
TotalReads 170016.00
MappedReads 170016.00
PairedReads 0.00
ProperPair 0.00
.....
[user@cn3144 ~]$ bam explainFlags --dec 96
0x60 (96):
mate reverse strand
first fragment
[user@cn3144 ~]$ bam dedup --in read1_500k_sorted.bam --out /scratch/$USER/temp.bam \
--rmDups
[user@cn3144 ~]$ cat /scratch/$USER/temp.bam.log
......
--------------------------------------------------------------------------
SUMMARY STATISTICS OF THE READS
Total number of reads: 326389
Total number of paired-end reads: 0
Total number of properly paired reads: 0
Total number of unmapped reads: 0
Total number of reverse strand mapped reads: 162220
Total number of QC-failed reads: 0
Total number of secondary reads: 0
Size of singleKeyMap (must be zero): 0
Size of pairedKeyMap (must be zero): 0
Total number of missing mates: 0
Total number of reads excluded from duplicate checking: 0
--------------------------------------------------------------------------
Sorting the indices of 3651 duplicated records
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Create a batch input file (e.g. bamutil.sh). For example:
#!/bin/bash #SBATCH --job-name=bamUtil set -e module load bamutil inbam=/data/$USER/test_data/bam/gcat_set_025_raw.bam obam=/data/$USER/test_data/temp/gcat_set_025.clean.bam dbsnp=/data/$USER/test_data/snp/snp138_pos.hg19 genome=/fdb/igenomes/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa # bamutil wants to write to the directory containing the genome file mkdir /lscratch/$SLURM_JOB_ID cp $genome /lscratch/$SLURM_JOB_ID lgenome=/lscratch/$SLURM_JOB_ID/$(basename $genome) bam dedup --in $inbam --out $obam \ --rmDups --verbose --recab \ --refFile $lgenome \ --dbsnp $dbsnp \ --storeQualTag OQ --maxBaseQual 40 rm -rf /lscratch/$SLURM_JOB_ID
Submit this job using the Slurm sbatch command.
sbatch --mem=4g bamutil.sh
Create a swarmfile (e.g. bamutil.swarm). For example:
bam squeeze --in file1.bam --out file1s.bam --refFile genome.fa bam squeeze --in file2.bam --out file2s.bam --refFile genome.fa bam squeeze --in file3.bam --out file3s.bam --refFile genome.fa
Submit this job using the swarm command.
swarm -f bamutil.swarm --module bamutilwhere
| --module bamutil | Loads the bamutil module for each subjob in the swarm |