fuseq-wes: discovering fusion genes from whole exome sequencing data in cancer patients

fuseq-wes: discovering fusion genes from whole exome sequencing data in cancer patients

Quick Links

This tool is developed based on FuSeq, the method for detecting fusion genes from RNA-seq data. A subsampling study of the prostate data suggests that a coverage of at least 75x is necessary to achieve high accuracy.

References:

Fusion Gene Detection Using Whole-Exome Sequencing Data in Cancer Patients

PubMed

Front Genet

Documentation

fuseq-wes GitHub Page

Important Notes

Module Name: fuseq-wes (see the modules page for more information)
Unusual environment variables set
- FUSEQ_WES fuseq-wes installation directory
- FUSEQ_WES_REF fuseq-wes reference directory
- FUSEQ_WES_TEST_DATA sample data for running fuseq-wes

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=4g --gres=lscratch:10 
[user@cn3144 ~]$ module load fuseq-wes 
[+] Loading python 3.8  ...
[+] Loading gcc  9.2.0  ...
[-] Unloading gcc  9.2.0  ...
[+] Loading gcc  9.2.0  ...
[+] Loading openmpi 4.0.5  for GCC 9.2.0
[+] Loading ImageMagick  7.0.8  on cn4313
[+] Loading HDF5  1.10.4
[-] Unloading gcc  9.2.0  ...
[+] Loading gcc  9.2.0  ...
[+] Loading NetCDF 4.7.4_gcc9.2.0
[+] Loading pandoc  2.17.1.1  on cn4313
[+] Loading pcre2 10.21  ...
[+] Loading R 4.2.0
[+] Loading fuseq-wes  1.0.0

Create soft links to the sample read data:

[user@cn3144 ]$ cp -r $FUSEQ_WES_TEST_DATA/* . 
[user@cn3144 ]$ bamfile="FuSeq_WES_testdata/test.bam"
[user@cn3144 ]$ ref_json="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\
UCSC_hg19_wes_contigSize3000_bigLen130000_r100.json"
[user@cn3144 ]$ gtfSqlite="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\
UCSC_hg19_wes_contigSize3000_bigLen130000_r100.sqlite"

[user@cn3144 ]$ output_dir="test_out"
[user@cn3144 ]$ mkdir $output_dir

#extract mapped reads and split reads

[user@cn3144 ]$ python3 $FUSEQ_WES/FuSeq_WES_v1.0.0/fuseq_wes.py \
                   --bam $bamfile \
                   --gtf $ref_json \
                   --mapq-filter \
                   --outdir $output_dir

#process the reads

[user@cn3144 ]$ fusiondbFn="$FUSEQ_WES/FuSeq_WES_v1.0.0/Data/Mitelman_fusiondb.RData"
[user@cn3144 ]$ paralogdb="$FUSEQ_WES/FuSeq_WES_v1.0.0/Data/ensmbl_paralogs_grch37.RData"
[user@cn3144 ]$ Rscript $FUSEQ_WES/FuSeq_WES_v1.0.0/process_fuseq_wes.R \
                   in=$output_dir \
                   sqlite=$gtfSqlite \
                   fusiondb=$fusiondbFn \
                   paralogdb=$paralogdbFn \
                   out=$output_dir

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. fuseq-wes.sh). For example:

#! /bin/bash
module load fuseq-wes
set -e
cp -r $FUSEQ_WES_TEST_DATA/* .
bamfile="FuSeq_WES_testdata/test.bam"
ref_json="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\
UCSC_hg19_wes_contigSize3000_bigLen130000_r100.json"
gtfSqlite="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\
UCSC_hg19_wes_contigSize3000_bigLen130000_r100.sqlite"
output_dir="test_out"
mkdir -p $output_dir

python3 $FUSEQ_WES/FuSeq_WES_v1.0.0/fuseq_wes.py \
                   --bam $bamfile \
                   --gtf $ref_json \
                   --mapq-filter \
                   --outdir $output_dir

Submit this job using the Slurm sbatch command.

sbatch -c 2 --mem=4g --time=8:00:00 fuseq-wes.sh

The master process submitting jobs should be run either as a batch job or on an interactive node - not on the biowulf login node.