FuSeq is a fast and accurate method to discover fusion genes based on quasi-mapping to quickly map the reads, extract initial candidates from split reads and fusion equivalence classes of mapped reads, and finally apply multiple filters and statistical tests to get the final candidates.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=4g --gres=lscratch:10 [user@cn3144 ~]$ module load FuSeq [+] Loading gcc 9.2.0 ... [+] Loading GSL 2.6 for GCC 9.2.0 ... [+] Loading openmpi 3.1.4 for GCC 9.2.0 [+] Loading ImageMagick 7.0.8 on cn2367 [+] Loading HDF5 1.10.4 [+] Loading NetCDF 4.7.4_gcc9.2.0 [+] Loading pandoc 2.10.1 on cn2367 [+] Loading pcre2 10.21 ... [+] Loading R 4.0.0 [+] Loading FuSeq / 1.1.3 ...Create soft links to the sample read data:
[user@cn3144 ~]$ cp -r $FUSEQ_DATA ./Index the sample transcript data if needed, GRCh37_release75 and GRCh38_release106 are already indexed at /fdb/fuseq/:
[user@cn3144 ~]$ TxIndexer -t Homo_sapiens.GRCh37.75.cdna.all.fa -o TxIndexer_idx --------------------------------------------------------------- --- FuSeq acknowledges the Sailfish and Rapmap for this indexer --- --------------------------------------------------------------- writing log to TxIndexer_idx/LogDir/FuSeq_index.log RapMap Indexer [Step 1 of 4] : counting k-mers counted k-mers for 180000 transcriptsElapsed time: 4.19777s Replaced 0 non-ATCG nucleotides Clipped poly-A tails from 1401 transcripts Building rank-select dictionary and saving to disk done Elapsed time: 0.0257811s Writing sequence data to file . . . done Elapsed time: 0.137891s [info] Building 32-bit suffix array (length of generalized text is 287322183) Building suffix array . . . success saving to disk . . . done Elapsed time: 0.440665s done Elapsed time: 41.0055s processed 287000000 positions khash had 102501816 keys saving hash to disk . . . done Elapsed time: 9.4689sExtract fusion equivalence classes and split reads by running FuSeq with 16 cpus:
[user@cn3144 ~]$ FuSeq -i /fdb/fuseq/TxIndexer_idx/GRCh37.75/ \ -l IU -1 SRR064287_1.fastq -2 SRR064287_2.fastq -p 16 \ -g $FUSEQ_REF/Homo_sapiens.GRCh37.gtf -o feqDir- a folder feqDir will be produced.
[user@cn3144 ~]$ Rscript $FUSEQ_R/createSqlite.R $FUSEQ_REF/Homo_sapiens.GRCh37.gtf Homo_sapiens.GRCh37.sqliteSecond, use the file Homo_sapiens.GRCh37.sqlite, together with the supporting annotation file Homo_sapiens.GRCh37.txAnno.RData containing information of paralogs, gene types, etc., to perform the discovery of fusion genes:
[user@cn3144 ~]$ Rscript $FUSEQ_R/FuSeq.R in=feqDir \ txfasta=$FUSEQ_REF/Homo_sapiens.GRCh37.cdna.all.fa \ sqlite=Homo_sapiens.GRCh37.sqlite \ txanno=Homo_sapiens.GRCh37.txAnno.RData [user@cn3144 ~]$ exit exit salloc.exe: Relinquishing job allocation 18320304 [user@biowulf ~]$