fuseq-wes: discovering fusion genes from whole exome sequencing data in cancer patients
This tool is developed based on FuSeq, the method for detecting fusion genes from RNA-seq data. A subsampling study of the prostate data suggests that a coverage of at least 75x is necessary to achieve high accuracy.
Deng W, Murugan S, Lindberg J, Chellappa V, Shen X, Pawitan Y, Vu TN.
Fusion Gene Detection Using Whole-Exome Sequencing Data in Cancer Patients
PubMed Front Genet, 2022, 13 820493
Important Notes
- Module Name: fuseq-wes (see the modules page for more information)
- Unusual environment variables set
- FUSEQ_WES fuseq-wes installation directory
- FUSEQ_WES_REF fuseq-wes reference directory
- FUSEQ_WES_TEST_DATA sample data for running fuseq-wes
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=4g --gres=lscratch:10 [user@cn3144 ~]$ module load fuseq-wes [+] Loading python 3.8 ... [+] Loading gcc 9.2.0 ... [-] Unloading gcc 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading openmpi 4.0.5 for GCC 9.2.0 [+] Loading ImageMagick 7.0.8 on cn4313 [+] Loading HDF5 1.10.4 [-] Unloading gcc 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading NetCDF 4.7.4_gcc9.2.0 [+] Loading pandoc on cn4313 [+] Loading pcre2 10.21 ... [+] Loading R 4.2.0 [+] Loading fuseq-wes 1.0.0Create soft links to the sample read data:
[user@cn3144 ]$ cp -r $FUSEQ_WES_TEST_DATA/* . [user@cn3144 ]$ bamfile="FuSeq_WES_testdata/test.bam" [user@cn3144 ]$ ref_json="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\ UCSC_hg19_wes_contigSize3000_bigLen130000_r100.json" [user@cn3144 ]$ gtfSqlite="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\ UCSC_hg19_wes_contigSize3000_bigLen130000_r100.sqlite" [user@cn3144 ]$ output_dir="test_out" [user@cn3144 ]$ mkdir $output_dir#extract mapped reads and split reads
[user@cn3144 ]$ python3 $FUSEQ_WES/FuSeq_WES_v1.0.0/fuseq_wes.py \ --bam $bamfile \ --gtf $ref_json \ --mapq-filter \ --outdir $output_dir#process the reads
[user@cn3144 ]$ fusiondbFn="$FUSEQ_WES/FuSeq_WES_v1.0.0/Data/Mitelman_fusiondb.RData" [user@cn3144 ]$ paralogdb="$FUSEQ_WES/FuSeq_WES_v1.0.0/Data/ensmbl_paralogs_grch37.RData" [user@cn3144 ]$ Rscript $FUSEQ_WES/FuSeq_WES_v1.0.0/process_fuseq_wes.R \ in=$output_dir \ sqlite=$gtfSqlite \ fusiondb=$fusiondbFn \ paralogdb=$paralogdbFn \ out=$output_dir
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. fuseq-wes.sh). For example:
#! /bin/bash module load fuseq-wes set -e cp -r $FUSEQ_WES_TEST_DATA/* . bamfile="FuSeq_WES_testdata/test.bam" ref_json="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\ UCSC_hg19_wes_contigSize3000_bigLen130000_r100.json" gtfSqlite="$FUSEQ_WES_REF/UCSC_hg19_wes_contigSize3000_bigLen130000_r100/\ UCSC_hg19_wes_contigSize3000_bigLen130000_r100.sqlite" output_dir="test_out" mkdir -p $output_dir python3 $FUSEQ_WES/FuSeq_WES_v1.0.0/fuseq_wes.py \ --bam $bamfile \ --gtf $ref_json \ --mapq-filter \ --outdir $output_dir
Submit this job using the Slurm sbatch command.
sbatch -c 2 --mem=4g --time=8:00:00 fuseq-wes.sh
The master process submitting jobs should be run either as a batch job or on an interactive node - not on the biowulf login node.