Clinker is a bioinformatics pipeline that generates a superTranscriptome from popular fusion finder outputs (JAFFA, tophatFusion, SOAP, deFUSE, Pizzly, etc), that can be then be either viewed in genome viewers such as IGV or through the included plotting feature developed with GViz.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --cpus-per-task=16 --mem=40g --gres=lscratch:20 [user@cn3200 ~]$module load Clinker [+] Loading gcc 7.2.0 ... [+] Loading GSL 2.4 for GCC 7.2.0 ... [+] Loading openmpi 3.0.0 for GCC 7.2.0 [+] Loading R 3.5.0_build2 [+] Loading samtools 1.9 ... [+] Loading STAR 2.6.1c [-] Unloading samtools 1.9 ... [+] Loading samtools 1.9 ... [+] Loading STAR-Fusion 1.5.0 [+] Loading python 2.7 ... [+] Loading IGV 2.4.14 on cn3200 [-] Unloading python 2.7 ... [+] Loading python 2.7 ... [+] Loading Clinker 1.32Here is how Clinker can be run on the test example provided together with the source code in the GitHub repository:
[user@cn3200 ~]$ bpipe -p out=test -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=true -p header=true -p align_mem=32000000000 -p genome_mem=32000000000 -p fusions=BCR:ABL1 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz ==================================================================================================== | Starting Pipeline at 2018-11-19 12:45 | ==================================================================================================== ======================================== Stage generate_fst ======================================== ============================================================== Fusion Super Transcript Generator A fusion visualiser. ============================================================== ============================================================== Create fusion superTranscriptome: -------------------------------------------------------------- Gene Symbols Mapped: 1 Not Mapped: 0 Total: 1 ============================================================== Creating output directory at: test Creating fused superTranscriptome and annotation files ...Success! Use the plot_fst bpipe workflow or IGV to visualise your results. ============================================================== ====================================== Stage star_genome_gen ======================================= Nov 19 12:45:53 ..... started STAR run Nov 19 12:45:53 ... starting to generate Genome files Nov 19 12:46:32 ... starting to sort Suffix Array. This may take a long time... Nov 19 12:46:59 ... sorting Suffix Array chunks and saving them to disk... Nov 19 12:49:17 ... loading chunks from disk, packing SA... Nov 19 12:49:48 ... finished generating suffix array Nov 19 12:49:48 ... generating Suffix Array index Nov 19 12:49:48 ... completed Suffix Array index Nov 19 12:49:48 ... writing Genome to disk ... Nov 19 12:50:07 ... writing Suffix Array to disk ... Nov 19 12:50:08 ... writing SAindex to disk Nov 19 12:50:08 ..... finished successfully ===================================== Stage star_align (test) ====================================== Nov 19 12:50:11 ..... started STAR run Nov 19 12:50:11 ..... loading genome Nov 19 12:50:27 ..... started mapping Nov 19 12:50:30 ..... started sorting BAM Nov 19 12:50:33 ..... started wiggle output Nov 19 12:50:34 ..... finished successfully ... ==================================== Stage prepare_plot (test) ===================================== BCR:ABL1 ------------------------------------------ filtering BAM file for fusion of interest filtering BAM file for reads with overhangs < 5 (noise reduction) Creating ancillilary files Index BAM files ===================================== Stage plot_fusion (test) ===================================== [1] "Plotting: BCR:ABL1" [1] "------------------------------------------------------" [1] "Libraries and ancillary files loaded. Creating Tracks." [1] "Tracks created, printing PDF." [1] "PDF created." ======================================== Pipeline Succeeded ======================================== 12:51:17 MSG: Finished at Mon Nov 19 12:51:17 EST 2018 12:51:17 MSG: Outputs are: test/genome/GenomeOne can now visualize fusions using IGV, as described in Clinker wiki.
[user@biowulf ~]$ mkdir star_fusion_out_SKBR3 [user@biowulf ~]$ STAR-Fusion \ --genome_lib_dir /fdb/CTAT/GRCh38_v27_CTAT_lib_Feb092018/ctat_genome_lib_build_dir \ --left_fq $CLINKER_DATA/SKBR3.Left.fq.gz \ --right_fq $CLINKER_DATA/SKBR3.Right.fq.gz \ --output_dir star_fusion_out_SKBR3 ... Dec 03 11:52:46 ..... started STAR run Dec 03 11:52:46 ..... loading genome Dec 03 11:53:47 ..... started 1st pass mapping Dec 03 11:58:08 ..... finished 1st pass mapping Dec 03 11:58:09 ..... inserting junctions into the genome indices Dec 03 12:00:19 ..... started mapping Dec 03 12:06:15 ..... finished successfully -sample contains 18145504 ... -building interval tree based on /fdb/CTAT/GRCh38_v27_CTAT_lib_Feb092018/ctat_genome_lib_build_dir/ref_annot.gtf.mini.sortu -done building interval tree (0.10 min). -parsing fusion evidence: Chimeric.out.junction -mapping reads to genes [24450000], rate=764859.23/min ... * STAR-Fusion complete. See output: star-fusion.fusion_candidates.tsv (or .abridged.tsv version)Now we are going to run the Clinker pipeline.
[user@biowulf ~]$bpipe \ -m 36000 \ -n 16 \ -p out=SKBR3_dir \ -p caller=star_fusion_out_SKBR3/star-fusion.fusion_predictions.tsv \ -p del="t" \ -p print="true" \ -p col=6,8 \ -p genome="38" \ -p fusions="TATDN1:GSDMB" \ -p pdf_width="9" \ -p pdf_height="16" \ -p competitive="true" \ $CLINKERDIR/workflow/clinker.pipe \ $CLINKER_DATA/SKBR3_R1.fastq.gz \ $CLINKER_DATA/SKBR3_R2.fastq.gz ==================================================================================================== | Starting Pipeline at 2018-12-03 12:55 | ==================================================================================================== ======================================== Stage generate_fst ======================================== ====================================== Stage star_genome_gen ======================================= Dec 03 12:55:25 ..... started STAR run Dec 03 12:55:25 ... starting to generate Genome files Dec 03 12:56:18 ... starting to sort Suffix Array. This may take a long time... Dec 03 12:56:56 ... sorting Suffix Array chunks and saving them to disk... Dec 03 12:59:46 ... loading chunks from disk, packing SA... Dec 03 13:00:23 ... finished generating suffix array Dec 03 13:00:23 ... generating Suffix Array index Dec 03 13:00:23 ... completed Suffix Array index Dec 03 13:00:23 ... writing Genome to disk ... Dec 03 13:00:40 ... writing Suffix Array to disk ... Dec 03 13:00:42 ... writing SAindex to disk Dec 03 13:00:43 ..... finished successfully ===================================== Stage star_align (SKBR3) ===================================== Dec 03 13:00:47 ..... started STAR run Dec 03 13:00:47 ..... loading genome Dec 03 13:01:52 ..... started mapping Dec 03 13:33:07 ..... started sorting BAM Dec 03 13:34:16 ..... started wiggle output Dec 03 13:35:18 ..... finished successfully ... ==================================== Stage prepare_plot (SKBR3) ==================================== TATDN1:GSDMB ------------------------------------------ filtering BAM file for fusion of interest filtering BAM file for reads with overhangs < 5 (noise reduction) Creating ancillilary files Index BAM files ...NOTE:
[user@cn3200 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$