Clinker: linking fusions to cancer

Clinker is a bioinformatics pipeline that generates a superTranscriptome from popular fusion finder outputs (JAFFA, tophatFusion, SOAP, deFUSE, Pizzly, etc), that can be then be either viewed in genome viewers such as IGV or through the included plotting feature developed with GViz.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --cpus-per-task=16 --mem=40g --gres=lscratch:20
[user@cn3200 ~]$module load Clinker 
[+] Loading gcc  7.2.0  ... 
[+] Loading GSL 2.4 for GCC 7.2.0 ... 
[+] Loading openmpi 3.0.0  for GCC 7.2.0 
[+] Loading R 3.5.0_build2 
[+] Loading samtools 1.9  ... 
[+] Loading STAR  2.6.1c 
[-] Unloading samtools 1.9  ... 
[+] Loading samtools 1.9  ... 
[+] Loading STAR-Fusion  1.5.0 
[+] Loading python 2.7  ... 
[+] Loading IGV 2.4.14 on cn3200 
[-] Unloading python 2.7  ... 
[+] Loading python 2.7  ... 
[+] Loading Clinker  1.32 
Here is how Clinker can be run on the test example provided together with the source code in the GitHub repository:
[user@cn3200 ~]$ bpipe -p out=test -p caller=$CLINKERDIR/test/caller/bcr_abl1.csv -p col=1,2,3,4 -p genome=19 -p print=true -p competitive=true -p header=true -p align_mem=32000000000 -p genome_mem=32000000000 -p fusions=BCR:ABL1 $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz

====================================================================================================
|                              Starting Pipeline at 2018-11-19 12:45                               |
====================================================================================================

======================================== Stage generate_fst ========================================


==============================================================


	Fusion Super Transcript Generator

	A fusion visualiser.


==============================================================



==============================================================

Create fusion superTranscriptome:


--------------------------------------------------------------
Gene Symbols Mapped: 1 Not Mapped: 0 Total: 1

==============================================================

Creating output directory at: test
Creating fused superTranscriptome and annotation files


...Success!

Use the plot_fst bpipe workflow or IGV to visualise your results.

==============================================================


====================================== Stage star_genome_gen =======================================
Nov 19 12:45:53 ..... started STAR run
Nov 19 12:45:53 ... starting to generate Genome files
Nov 19 12:46:32 ... starting to sort Suffix Array. This may take a long time...
Nov 19 12:46:59 ... sorting Suffix Array chunks and saving them to disk...
Nov 19 12:49:17 ... loading chunks from disk, packing SA...
Nov 19 12:49:48 ... finished generating suffix array
Nov 19 12:49:48 ... generating Suffix Array index
Nov 19 12:49:48 ... completed Suffix Array index
Nov 19 12:49:48 ... writing Genome to disk ...
Nov 19 12:50:07 ... writing Suffix Array to disk ...
Nov 19 12:50:08 ... writing SAindex to disk
Nov 19 12:50:08 ..... finished successfully

===================================== Stage star_align (test) ======================================
Nov 19 12:50:11 ..... started STAR run
Nov 19 12:50:11 ..... loading genome
Nov 19 12:50:27 ..... started mapping
Nov 19 12:50:30 ..... started sorting BAM
Nov 19 12:50:33 ..... started wiggle output
Nov 19 12:50:34 ..... finished successfully

...

==================================== Stage prepare_plot (test) =====================================

BCR:ABL1
------------------------------------------
filtering BAM file for fusion of interest
filtering BAM file for reads with overhangs < 5 (noise reduction)
Creating ancillilary files
Index BAM files

===================================== Stage plot_fusion (test) =====================================
[1] "Plotting: BCR:ABL1"
[1] "------------------------------------------------------"
[1] "Libraries and ancillary files loaded. Creating Tracks."
[1] "Tracks created, printing PDF."
[1] "PDF created."

======================================== Pipeline Succeeded ========================================
12:51:17 MSG:  Finished at Mon Nov 19 12:51:17 EST 2018
12:51:17 MSG:  Outputs are: 
		test/genome/Genome 
One can now visualize fusions using IGV, as described in Clinker wiki.

Next example illustrates how Clinker can be run on the output produced by the STAR-Fusion software. To this end, we first run STAR-Fusion on the sample dataset SKBR3:
[user@biowulf ~]$ mkdir star_fusion_out_SKBR3
[user@biowulf ~]$ STAR-Fusion \
--genome_lib_dir /fdb/CTAT/GRCh38_v27_CTAT_lib_Feb092018/ctat_genome_lib_build_dir \
--left_fq  $CLINKER_DATA/SKBR3.Left.fq.gz \
--right_fq $CLINKER_DATA/SKBR3.Right.fq.gz \
--output_dir star_fusion_out_SKBR3
...
Dec 03 11:52:46 ..... started STAR run
Dec 03 11:52:46 ..... loading genome
Dec 03 11:53:47 ..... started 1st pass mapping
Dec 03 11:58:08 ..... finished 1st pass mapping
Dec 03 11:58:09 ..... inserting junctions into the genome indices
Dec 03 12:00:19 ..... started mapping
Dec 03 12:06:15 ..... finished successfully
-sample contains 18145504
...
-building interval tree based on /fdb/CTAT/GRCh38_v27_CTAT_lib_Feb092018/ctat_genome_lib_build_dir/ref_annot.gtf.mini.sortu
-done building interval tree (0.10 min).
-parsing fusion evidence: Chimeric.out.junction
-mapping reads to genes
[24450000], rate=764859.23/min
...
	* STAR-Fusion complete.  See output: star-fusion.fusion_candidates.tsv (or .abridged.tsv version)
Now we are going to run the Clinker pipeline.
IMPORTANT NOTE:
The current version of Clinker follows the input FASTQ file naming conventions that contradict the conventions of STAR-Fusion. More specifically, instead of
      SKBR3.Left.fq.gz and SKBR3.Right.fq.gz,
Clinker will accept the files that should be named like
      SKBR3_R1.fastq.gz and SKBR3_R2.fastq.gz
i.e. the names should end up with "R1.fastq.gz" and "R2.fastq.gz", respectively. If this requirement is not met, the "alignment" subfolder in the Clinker output will be empty. On the other hand, the STAR-Fusion will produce an empty "caller" file
      star-fusion.fusion_predictions.abridged.tsv
if Clinker's naming conventions are used.
[user@biowulf ~]$bpipe \
-m 36000 \
-n 16 \
-p out=SKBR3_dir  \
-p caller=star_fusion_out_SKBR3/star-fusion.fusion_predictions.tsv \
-p del="t" \
-p print="true" \
-p col=6,8 \
-p genome="38" \
-p fusions="TATDN1:GSDMB" \
-p pdf_width="9"  \
-p pdf_height="16" \
-p competitive="true" \
$CLINKERDIR/workflow/clinker.pipe \
$CLINKER_DATA/SKBR3_R1.fastq.gz \
$CLINKER_DATA/SKBR3_R2.fastq.gz

====================================================================================================
|                              Starting Pipeline at 2018-12-03 12:55                               |
====================================================================================================

======================================== Stage generate_fst ========================================

====================================== Stage star_genome_gen =======================================
Dec 03 12:55:25 ..... started STAR run
Dec 03 12:55:25 ... starting to generate Genome files
Dec 03 12:56:18 ... starting to sort Suffix Array. This may take a long time...
Dec 03 12:56:56 ... sorting Suffix Array chunks and saving them to disk...
Dec 03 12:59:46 ... loading chunks from disk, packing SA...
Dec 03 13:00:23 ... finished generating suffix array
Dec 03 13:00:23 ... generating Suffix Array index
Dec 03 13:00:23 ... completed Suffix Array index
Dec 03 13:00:23 ... writing Genome to disk ...
Dec 03 13:00:40 ... writing Suffix Array to disk ...
Dec 03 13:00:42 ... writing SAindex to disk
Dec 03 13:00:43 ..... finished successfully

===================================== Stage star_align (SKBR3) =====================================
Dec 03 13:00:47 ..... started STAR run
Dec 03 13:00:47 ..... loading genome
Dec 03 13:01:52 ..... started mapping
Dec 03 13:33:07 ..... started sorting BAM
Dec 03 13:34:16 ..... started wiggle output
Dec 03 13:35:18 ..... finished successfully
...
==================================== Stage prepare_plot (SKBR3) ====================================

TATDN1:GSDMB
------------------------------------------
filtering BAM file for fusion of interest
filtering BAM file for reads with overhangs < 5 (noise reduction)
Creating ancillilary files
Index BAM files
...
NOTE:
Due to a minor bag in Clinker, one may need to run the "bpipe" command twice:
after the first run, manually edit the file /reference/fst.fasta by deleting the first (empty) line

End the interactive session:
[user@cn3200 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$