Fusioninspector on NIH HPC Systems

FusionInspector: In silico Validation of Fusion Transcript Predictions.

FusionInspector is a component of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). FusionInspector assists in fusion transcript discovery by performing a supervised analysis of fusion predictions, attempting to recover and re-score evidence for such predictions.

Given a list of candidate fusion genes (as derived from running any fusion transcript prediction tool, such as PradaFusionCatcherSoapFuseTophatFusionDISCASM/GMAP-FusionSTAR-Fusion, or other), FusionInspector extracts the genomic regions for the fusion partners and constructs mini-fusion-contigs containing the pairs of genes in their proposed fused orientation. The original reads are aligned to these candidate fusion contigs; fusion-supporting reads that would normally align as discordant pairs or split reads should align as concordant 'normal' reads in this fusion-gene context. Those reads supporting each fusion (spanning fragments and fusion-breakpoint-containing reads) are identified, reported, and scored accordingly.

Optionally, Trinity de novo transcriptome assembly can be executed as part of the FusionInspector routine in order to de novo reconstruct fusion transcripts from the mapped reads.

Outputs generated by FusionInspector are easily viewed in a genome browser such as IGV so that the evidence for fusion transcripts can be manually assessed for read and alignment quality.

Example files are under /usr/local/apps/fusioninspector/test directory.
To test fusioninspector with the example files:

  $ cp -r /usr/local/apps/fusioninspector/test /data/$USER
  $ cd /data/$USER/test
  $ sinteractive --mem=10g
  $ module load fusioninspector
  $ FusionInspector --fusions  fusion_targets.A.txt,fusion_targets.B.txt,fusion_targets.C.txt 
		--genome_lib /usr/local/apps/fusioninspector/GRCh37_gencode_v19_CTAT_lib_July272016/  \
		--left_fq test.reads_1.fastq.gz --right_fq test.reads_2.fastq.gz --out_dir  TestOut   \
		--out_prefix finspector --align_utils STAR --prep_for_IGV --no_cleanup

    

Data resource and indexes are under
/usr/local/apps/fusioninspector/GRCh37_gencode_v19_CTAT_lib_July272016

On Sinteractive

Sample session:


[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load fusioninspector

[user@cn3144 ~]$ FusionInspector -h
usage: FusionInspector [-h] --fusions CHIM_SUMMARY_FILES --genome_lib_dir
                       GENOME_LIB_DIR --left_fq LEFT_FQ_FILENAME --right_fq
                       RIGHT_FQ_FILENAME --out_prefix OUT_PREFIX
                       [--align_utils ALIGN_UTILS]
                       [--min_junction_reads MIN_JUNCTION_READS]
                       [--min_sum_frags MIN_SUM_FRAGS]
                       [--min_novel_junction_support MIN_NOVEL_JUNCTION_SUPPORT]
                       [--require_LDAS REQUIRE_LDAS]
                       [--max_promiscuity MAX_PROMISCUITY] [-E EVALUE]
                       [--min_per_id MIN_PER_ID] [--only_fusion_reads]
                       [--capture_genome_alignments] [--include_Trinity]
                       [--prep_for_IGV] [--write_intermediate_results]
                       [--no_cleanup] [--version] [--CPU CPU] [--dirty]
                       [--aligner_path ALIGNER_PATH]
                       [--mem_benchmark I_MEM_BENCHMARK]
                       [--out_dir Output_directory]

Extracts a pair of genes from the genome, creates a mini-contig, aligns reads
to the mini-contig, and extracts the fusion reads as a separate tier for
vsiualization.

optional arguments:
  -h, --help            show this help message and exit
  --fusions CHIM_SUMMARY_FILES
                        fusions summary files (list, comma-delimited and no
                        spaces) (default: )
  --genome_lib_dir GENOME_LIB_DIR
                        genome lib directory - see
                        http://FusionFilter.github.io for details (default: )
  --left_fq LEFT_FQ_FILENAME
                        left fastq file (default: None)
  --right_fq RIGHT_FQ_FILENAME
                        right fastq file (default: None)
  --out_prefix OUT_PREFIX
                        output filename prefix (default: None)
  --align_utils ALIGN_UTILS
                        alignment utilities to use. (default: STAR)
  --min_junction_reads MIN_JUNCTION_READS
                        minimum number of junction-spanning reads required
                        (default: 1)
  --min_sum_frags MIN_SUM_FRAGS
                        minimum fusion support = ( # junction_reads + #
                        spanning_frags ) (default: 2)
  --min_novel_junction_support MIN_NOVEL_JUNCTION_SUPPORT
                        (minimum number of junction reads required if
                        breakpoint lacks involvement of only reference
                        junctions (default: 3)
  --require_LDAS REQUIRE_LDAS
                        require long double anchor support for split reads
                        when no spanning frags are found (default: 1)
  --max_promiscuity MAX_PROMISCUITY
                        maximum number of partners allowed for a given fusion
                        (default: 3)
  -E EVALUE, --Evalue EVALUE
                        E-value threshold for blast searches (default: 0.001)
  --min_per_id MIN_PER_ID
                        minimum percent identity for a fusion-supporting read
                        alignment (default: 97)
  --only_fusion_reads   include only read alignments in output that support
                        fusion (default: False)
  --capture_genome_alignments
                        reports ref genome alignments too (for debugging only)
                        (default: False)
  --include_Trinity     include fusion-guided Trinity assembly (default:
                        False)
  --prep_for_IGV        generate bam, bed, etc., for use with IGV (default:
                        False)
  --write_intermediate_results
                        generate bam, bed, etc., for intermediate aligner
                        outputs (default: False)
  --no_cleanup          do not cleanup the fusion inspector workspace, retain
                        intermediate output files (default: False)
  --version             show version string: v0.9.0beta (default: False)
  --CPU CPU             Number of threads for running the aligner (default: 4)
  --dirty               turn off FP filtering for non-STAR methods (increases
                        speed, reduces RAM, mostly restricted for testing
                        purposes) (default: False)
  --aligner_path ALIGNER_PATH
                        path to the aligner tool (default: uses PATH setting)
                        (default: None)

Batch job on Biowulf

Create a batch input file (e.g. script.sh). For example:

#!/bin/bash
module load fusioninspector

cd /data/$USER/dir
FusionInspector command 1
FusionInspector command 2
......

Then submit the file on biowulf

biowulf> $ sbatch script.sh

For more information regarding sbatch command : https://hpc.nih.gov/docs/userguide.html#submit

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. script.swarm). For example:

# this file is called script.swarm
cd dir1;FusionInspector command 1; FusionInspector command 2
cd dir2;FusionInspector command 1; FusionInspector command 2
cd dir3;FusionInspector command 1; FusionInspector command 2
[...]

Submit this job using the swarm command.

swarm -f script.swarm --module fusioninspector

For more information regarding swarm: https://hpc.nih.gov/apps/swarm.html#usage

Documentation