Biowulf High Performance Computing at the NIH
FitHiChIP: Identification of significant chromatin contacts from HiChIP data

FitHiChIP is a computational method for identifying chromatin contacts among regulatory regions such as enhancers and promoters from HiChIP/PLAC-seq data. FitHiChIP jointly models the non-uniform coverage and genomic distance scaling of HiChIP data, captures previously validated enhancer interactions for several genes including MYC and TP53, and recovers contacts genome-wide that are supported by ChIA-PET, promoter capture Hi-C and Hi-C data.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=4g
[user@cn3316 ~]$ module load FitHiChIP 
The general command to run FitHiChIP on sample data is:
[user@cn3123 user]$ FitHiChIP_HiCPro.sh -C  
To copy sample configuration files to your current folder, type:
[user@cn3316 ~]$ cp $FITHICHIP_CONFIG/* . 
This command will copy four sample configuration files:
[user@cn3316 ~]$ ls configfile_* 
configfile_BiasCorrection_CoverageBias  
configfile_P2P_BiasCorrection_CoverageBias
configfile_BiasCorrection_ICEBias       
configfile_P2P_BiasCorrection_ICEBias
Each of this files assumes that sample data for running FitHiChIP are stored in the folder ./TestData (to change the location of the data, edit the corresponding configfile). The following command will create a link to the sample data folder in your current directory:
[user@cn3316 ~]$ ln -s $FITHICHIP_DATA TestData
Now run FitHiChIP on the sample data using one of the fonfig files:
[user@cn3316 ~]$ FitHiChIP_HiCPro.sh -C configfile_BiasCorrection_ICEBias 
================ Parsing input configuration file =================
Content of ValidPairs is ./TestData/Sample_ValidPairs.txt.gz
Content of Interval is
Content of Matrix is
Content of PeakFile is ./TestData/Sample.Peaks.gz
Content of OutDir is ./TestData/results/
Content of HiCProBasedir is /usr/local/apps/hicpro/2.11.1/
Content of ChrSizeFile is ./TestData/chrom_hg19.sizes
Content of RefFasta is
Content of MappabilityFile is
Content of REFragFile is
Content of GCSize is 200
Content of MappSize is 500
Content of IntType is 3
Content of BINSIZE is 5000
Content of LowDistThr is 20000
Content of UppDistThr is 2000000
Content of QVALUE is 0.01
Content of NBins is 200
Content of UseP2PBackgrnd is 0
Content of BiasCorrection is 1
Content of BiasType is 2
Content of MergeInt is 1
Content of PREFIX is FitHiChIP
Content of Draw is 0
Content of TimeProf is 0
Content of OverWrite is 1

 ================ Verifying input configuration parameters =================
HiC-pro is installed in the system
Installed HiC-pro version: 2.11.1

pythonversion=2.7.15 :: Anaconda custom (64-bit)


parsedVersion=2715

*** Valid python version is detected - installed version: 2.7.15 :: Anaconda custom (64-bit)
*** Python library gzip is installed
*** Python module OptionParser (from the package optparse) is installed
*** Python package networkx is installed
*** Found MACS2 package (for peak calling) installed in the system -  2.1.2
*** Valid R version is detected - installed R version: 3.5.2
*** Valid samtools version is detected - installed version: 1.9
*** bgzip utility is installed in the system
*** tabix utility is installed in the system
*** Valid bedtools version is detected - installed version: 2.27.1
 ================ Changing relative pathnames of the input files to their absolute path names =================
...
Executable of python (2): /usr/local/Anaconda/envs/py2.7/bin/python
Executable of R : /usr/local/apps/R/3.5/3.5.2/bin/Rscript

 ================ Processing HiC-pro contact matrices =================

 *** MatrixBuildExec: /usr/local/apps/hicpro/2.11.1/scripts/build_matrix
*** Computing HiC-pro matrices from the input valid pairs file
***** HiC-pro input valid pairs file in gzipped format

 ================ Creating input interactions =================

 ================ Limiting input interactions to the specified distance ranges =================

 ================ Generating coverage statistics for individual bins =================
======== Computed initial coverage information for individual genomic bins
 ================ Merging coverage with bias statistics =================

 *** ICE computation Executable: /usr/local/apps/hicpro/2.11.1//scripts/ice
*** Computing ICE based bias vector from the HiC-pro contact matrix
======== Appended ICE bias information for individual genomic bins

 ================ Merging coverage + bias with mappability, GC content, and number of cut sites - creating all feature file =================

 ================ Generating interactions + features =================
...
Contact count col: 7
Total number of columns for the complete feature interactions: 19
Specified IntType: 3
Derived IntLow: 3
Specified IntType: 3
Derived IntLow: 3
Derived IntHigh: 3

 **** Start of while Loop ----- current interaction type: 3  ******

 ============ Calling significant interactions ===============

Created sorted genomic distance based interaction file

...

 Peak to peak background usage for spline fit: 0
 Number of cores in the system: 56
 Total Number of interactions: 286723
 ****** Total number of training interactions: 286723 ********
 ****** Number of contacts per bin (allowed for equal occupancy binning): 1851 ********

 modeled fit_spline_coeff_Intercept
 modeled fit_spline_coeff_Logbias1
 modeled fit_spline_coeff_Logbias2Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NaNs produced
2: In xy.coords(x, y, xlabel, ylabel, log) : NaNs produced
3: In xy.coords(x, y) : NaNs produced
4: In xy.coords(x, y, xlabel, ylabel, log) : NaNs produced
5: In xy.coords(x, y) : NaNs produced
******** FINISHED calling significant interactions
----- Extracted significant interactions ---- FDR threshold lower than: 0.01

 ---- Within function of plotting distance vs contact count ----
...
Removed 2 rows containing missing values (geom_bar).
generated WashU epigenome browser compatible significant interactions
****** Merge filtering of adjacent loops is enabled *****
***** within function of merged filtering - printing the parameters ***
*** bin_size:  5000
*** headerInp:  1
*** connectivity_rule:  8
*** TopPctElem:  100
*** NeighborHoodBinThr:  10000
*** QValCol:  25
*** PValCol:  24
*** SortOrder:  0
OutDir:  /usr/local/apps/FitHiChIP/6.0/sample_data/results/FitHiChIP_Peak2ALL_b5000_L20000_U2000000/P2PBckgr_0/ICE_Bias/FitHiC_BiasCorr/Merge_Nearby_Interactions
Processing the chromosome:  chr1
No of nodes of G:  28
No of edges of G:  13
Number of connected components of G:  18
Processing the chromosome:  chr2
No of nodes of G:  40
No of edges of G:  22
Number of connected components of G:  25
Processing the chromosome:  chr3
No of nodes of G:  1
No of edges of G:  0
Number of connected components of G:  1
Processing the chromosome:  chr4
No of nodes of G:  7
No of edges of G:  1
Number of connected components of G:  6
Processing the chromosome:  chr5
No of nodes of G:  4
No of edges of G:  1
Number of connected components of G:  3
Processing the chromosome:  chr6
No of nodes of G:  22
No of edges of G:  10
Number of connected components of G:  16
Processing the chromosome:  chr7
No of nodes of G:  10
No of edges of G:  1
Number of connected components of G:  9
Processing the chromosome:  chr8
No of nodes of G:  14
No of edges of G:  1
Number of connected components of G:  13
Processing the chromosome:  chr9
No of nodes of G:  15
No of edges of G:  12
Number of connected components of G:  7
Processing the chromosome:  chr10
No of nodes of G:  12
No of edges of G:  5
Number of connected components of G:  8
Processing the chromosome:  chr11
No of nodes of G:  16
No of edges of G:  1
Number of connected components of G:  15
Processing the chromosome:  chr12
No of nodes of G:  24
No of edges of G:  7
Number of connected components of G:  18
Processing the chromosome:  chr13
No of nodes of G:  8
No of edges of G:  0
Number of connected components of G:  8
Processing the chromosome:  chr14
No of nodes of G:  32
No of edges of G:  16
Number of connected components of G:  19
Processing the chromosome:  chr15
No of nodes of G:  6
No of edges of G:  0
Number of connected components of G:  6
Processing the chromosome:  chr16
No of nodes of G:  9
No of edges of G:  2
Number of connected components of G:  7
Processing the chromosome:  chr17
No of nodes of G:  26
No of edges of G:  4
Number of connected components of G:  22
No of nodes of G:  1
No of edges of G:  0
Number of connected components of G:  1
Processing the chromosome:  chr19
No of nodes of G:  22
No of edges of G:  5
Number of connected components of G:  17
Processing the chromosome:  chr20
No of nodes of G:  10
No of edges of G:  9
Number of connected components of G:  3
Processing the chromosome:  chr21
No of nodes of G:  15
No of edges of G:  6
Number of connected components of G:  11
Processing the chromosome:  chr22
No of nodes of G:  10
No of edges of G:  0
Number of connected components of G:  10
Processing the chromosome:  chrX
Processing the chromosome:  chrY
==================== End of merge filtering adjacent interactions !!! ======================
-------- *** Merged filtering option is true
----- Applied merged filtering (connected component model) on the adjacent loops of FitHiChIP
-------- *** Merged filtering option is true
----- Applied merged filtering (connected component model) on the adjacent loops of FitHiChIP

 ---- Within function of plotting distance vs contact count ----
...
Removed 2 rows containing missing values (geom_bar).
Merged filtering significant interactions - created washu browser compatible file for these interactions!!!
Updated CurrIntType: 4

 **** Now summarizing FitHiChIP results in the HTML file ***

***** FitHiChIP pipeline is completely executed - congratulations !!! *****
Likewise, FitHiChIP can be run with other configuration files.
[user@cn3316 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. fithichip.sh). For example:

#!/bin/bash
module load FitHiChIP
ln -s $FITHICHIP_DATA TestData
cp $FITHICHIP_CONFIG/* .
FitHiChIP_HiCPro.sh -C configfile_BiasCorrection_CoverageBias
FitHiChIP_HiCPro.sh -C configfile_BiasCorrection_ICEBias
FitHiChIP_HiCPro.sh -C configfile_P2P_BiasCorrection_CoverageBias
FitHiChIP_HiCPro.sh -C configfile_P2P_BiasCorrection_ICEBias

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] fithichip.sh