FusionInspector: In silico Validation of Fusion Transcript Predictions.
FusionInspector is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). FusionInspector assists in fusion transcript discovery by performing a supervised analysis of fusion predictions, attempting to recover and re-score evidence for such predictions.
Given a list of candidate fusion genes (as derived from running any fusion transcript prediction tool, such as Prada, FusionCatcher, SoapFuse, TophatFusion, DISCASM/GMAP-Fusion, STAR-Fusion, or other), FusionInspector extracts the genomic regions for the fusion partners and constructs mini-fusion-contigs containing the pairs of genes in their proposed fused orientation. The original reads are aligned to these candidate fusion contigs; fusion-supporting reads that would normally align as discordant pairs or split reads should align as concordant 'normal' reads in this fusion-gene context. Those reads supporting each fusion (spanning fragments and fusion-breakpoint-containing reads) are identified, reported, and scored accordingly.
Optionally, Trinity de novo transcriptome assembly can be executed as part of the FusionInspector routine in order to de novo reconstruct fusion transcripts from the mapped reads.
Outputs generated by FusionInspector are easily viewed in a genome browser such as IGV so that the evidence for fusion transcripts can be manually assessed for read and alignment quality.
Example files are under /usr/local/apps/fusioninspector/test directory.
To test fusioninspector with the example files:
$ cp -r /usr/local/apps/fusioninspector/test /data/$USER $ cd /data/$USER/test $ sinteractive --mem=10g $ module load fusioninspector $ FusionInspector --fusions fusion_targets.A.txt,fusion_targets.B.txt,fusion_targets.C.txt --genome_lib /usr/local/apps/fusioninspector/GRCh37_gencode_v19_CTAT_lib_July272016/ \ --left_fq test.reads_1.fastq.gz --right_fq test.reads_2.fastq.gz --out_dir TestOut \ --out_prefix finspector --align_utils STAR --prep_for_IGV --no_cleanup
Data resource and indexes are under
/usr/local/apps/fusioninspector/GRCh37_gencode_v19_CTAT_lib_July272016
Sample session:
[user@biowulf]$ sinteractive salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load fusioninspector [user@cn3144 ~]$ FusionInspector -h usage: FusionInspector [-h] --fusions CHIM_SUMMARY_FILES --genome_lib_dir GENOME_LIB_DIR --left_fq LEFT_FQ_FILENAME --right_fq RIGHT_FQ_FILENAME --out_prefix OUT_PREFIX [--align_utils ALIGN_UTILS] [--min_junction_reads MIN_JUNCTION_READS] [--min_sum_frags MIN_SUM_FRAGS] [--min_novel_junction_support MIN_NOVEL_JUNCTION_SUPPORT] [--require_LDAS REQUIRE_LDAS] [--max_promiscuity MAX_PROMISCUITY] [-E EVALUE] [--min_per_id MIN_PER_ID] [--only_fusion_reads] [--capture_genome_alignments] [--include_Trinity] [--prep_for_IGV] [--write_intermediate_results] [--no_cleanup] [--version] [--CPU CPU] [--dirty] [--aligner_path ALIGNER_PATH] [--mem_benchmark I_MEM_BENCHMARK] [--out_dir Output_directory] Extracts a pair of genes from the genome, creates a mini-contig, aligns reads to the mini-contig, and extracts the fusion reads as a separate tier for vsiualization. optional arguments: -h, --help show this help message and exit --fusions CHIM_SUMMARY_FILES fusions summary files (list, comma-delimited and no spaces) (default: ) --genome_lib_dir GENOME_LIB_DIR genome lib directory - see http://FusionFilter.github.io for details (default: ) --left_fq LEFT_FQ_FILENAME left fastq file (default: None) --right_fq RIGHT_FQ_FILENAME right fastq file (default: None) --out_prefix OUT_PREFIX output filename prefix (default: None) --align_utils ALIGN_UTILS alignment utilities to use. (default: STAR) --min_junction_reads MIN_JUNCTION_READS minimum number of junction-spanning reads required (default: 1) --min_sum_frags MIN_SUM_FRAGS minimum fusion support = ( # junction_reads + # spanning_frags ) (default: 2) --min_novel_junction_support MIN_NOVEL_JUNCTION_SUPPORT (minimum number of junction reads required if breakpoint lacks involvement of only reference junctions (default: 3) --require_LDAS REQUIRE_LDAS require long double anchor support for split reads when no spanning frags are found (default: 1) --max_promiscuity MAX_PROMISCUITY maximum number of partners allowed for a given fusion (default: 3) -E EVALUE, --Evalue EVALUE E-value threshold for blast searches (default: 0.001) --min_per_id MIN_PER_ID minimum percent identity for a fusion-supporting read alignment (default: 97) --only_fusion_reads include only read alignments in output that support fusion (default: False) --capture_genome_alignments reports ref genome alignments too (for debugging only) (default: False) --include_Trinity include fusion-guided Trinity assembly (default: False) --prep_for_IGV generate bam, bed, etc., for use with IGV (default: False) --write_intermediate_results generate bam, bed, etc., for intermediate aligner outputs (default: False) --no_cleanup do not cleanup the fusion inspector workspace, retain intermediate output files (default: False) --version show version string: v0.9.0beta (default: False) --CPU CPU Number of threads for running the aligner (default: 4) --dirty turn off FP filtering for non-STAR methods (increases speed, reduces RAM, mostly restricted for testing purposes) (default: False) --aligner_path ALIGNER_PATH path to the aligner tool (default: uses PATH setting) (default: None)
Create a batch input file (e.g. script.sh). For example:
#!/bin/bash module load fusioninspector cd /data/$USER/dir FusionInspector command 1 FusionInspector command 2 ......
Then submit the file on biowulf
biowulf> $ sbatch script.sh
For more information regarding sbatch command : https://hpc.nih.gov/docs/userguide.html#submit
Create a swarmfile (e.g. script.swarm). For example:
# this file is called script.swarm cd dir1;FusionInspector command 1; FusionInspector command 2 cd dir2;FusionInspector command 1; FusionInspector command 2 cd dir3;FusionInspector command 1; FusionInspector command 2 [...]
Submit this job using the swarm command.
swarm -f script.swarm --module fusioninspector
For more information regarding swarm: https://hpc.nih.gov/apps/swarm.html#usage