fuseq-wes: discovering fusion genes from whole exome sequencing data in cancer patients

This tool is developed based on FuSeq, the method for detecting fusion genes from RNA-seq data. A subsampling study of the prostate data suggests that a coverage of at least 75x is necessary to achieve high accuracy.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=4g --gres=lscratch:10 
[user@cn3144 ~]$ module load fuseq-wes 
[+] Loading python 3.8  ...
[+] Loading gcc  9.2.0  ...
[-] Unloading gcc  9.2.0  ...
[+] Loading gcc  9.2.0  ...
[+] Loading openmpi 4.0.5  for GCC 9.2.0
[+] Loading ImageMagick  7.0.8  on cn4313
[+] Loading HDF5  1.10.4
[-] Unloading gcc  9.2.0  ...
[+] Loading gcc  9.2.0  ...
[+] Loading NetCDF 4.7.4_gcc9.2.0
[+] Loading pandoc  2.17.1.1  on cn4313
[+] Loading pcre2 10.21  ...
[+] Loading R 4.2.0
[+] Loading fuseq-wes  1.0.0
Create soft links to the sample read data:
[user@cn3144 ]$ cp -r $FUSEQ_WES_TEST_DATA/* . 
[user@cn3144 ]$ bamfile="FuSeq_WES_testdata/test.bam"
[user@cn3144 ]$ ref_json="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\
UCSC_hg19_wes_contigSize3000_bigLen130000_r100.json"
[user@cn3144 ]$ gtfSqlite="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\
UCSC_hg19_wes_contigSize3000_bigLen130000_r100.sqlite"

[user@cn3144 ]$ output_dir="test_out"
[user@cn3144 ]$ mkdir $output_dir
#extract mapped reads and split reads
[user@cn3144 ]$ python3 $FUSEQ_WES/FuSeq_WES_v1.0.0/fuseq_wes.py \
                   --bam $bamfile \
                   --gtf $ref_json \
                   --mapq-filter \
                   --outdir $output_dir
#process the reads

[user@cn3144 ]$ fusiondbFn="$FUSEQ_WES/FuSeq_WES_v1.0.0/Data/Mitelman_fusiondb.RData"
[user@cn3144 ]$ paralogdb="$FUSEQ_WES/FuSeq_WES_v1.0.0/Data/ensmbl_paralogs_grch37.RData"
[user@cn3144 ]$ Rscript $FUSEQ_WES/FuSeq_WES_v1.0.0/process_fuseq_wes.R \
                   in=$output_dir \
                   sqlite=$gtfSqlite \
                   fusiondb=$fusiondbFn \
                   paralogdb=$paralogdbFn \
                   out=$output_dir
Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. fuseq-wes.sh). For example:

#! /bin/bash
module load fuseq-wes
set -e
cp -r $FUSEQ_WES_TEST_DATA/* .
bamfile="FuSeq_WES_testdata/test.bam"
ref_json="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\
UCSC_hg19_wes_contigSize3000_bigLen130000_r100.json"
gtfSqlite="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\
UCSC_hg19_wes_contigSize3000_bigLen130000_r100.sqlite"
output_dir="test_out"
mkdir -p $output_dir

python3 $FUSEQ_WES/FuSeq_WES_v1.0.0/fuseq_wes.py \
                   --bam $bamfile \
                   --gtf $ref_json \
                   --mapq-filter \
                   --outdir $output_dir

Submit this job using the Slurm sbatch command.

sbatch -c 2 --mem=4g --time=8:00:00 fuseq-wes.sh

The master process submitting jobs should be run either as a batch job or on an interactive node - not on the biowulf login node.