Biowulf High Performance Computing at the NIH
FitHiChIP: Identification of significant chromatin contacts from HiChIP data

FitHiChIP is a computational method for identifying chromatin contacts among regulatory regions such as enhancers and promoters from HiChIP/PLAC-seq data. FitHiChIP jointly models the non-uniform coverage and genomic distance scaling of HiChIP data, captures previously validated enhancer interactions for several genes including MYC and TP53, and recovers contacts genome-wide that are supported by ChIA-PET, promoter capture Hi-C and Hi-C data.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=8g -c10 --gres=lscratch:20
[user@cn3316 ~]$ module load FitHiChIP 
[+] Loading singularity  3.8.5-1  on cn0853
[+] Loading FitHiChIP  9.1
The general command to run FitHiChIP on sample data is:
[user@cn3123 user]$ FitHiChIP_HiCPro.sh -C <configuration_file> 
To copy sample data to your current folder, type:
[user@cn3316 ~]$ cp -rP $FITHICHIP_DATA .
To copy a sample configuration file to your current folder, type:
[user@cn3316 ~]$ cp $FITHICHIP_CONFIG/* . 
The following command assumes that sample data for running FitHiChIP are stored in the folder ./TestData. (To change the location of the data, edit the corresponding configfile):
[user@cn3316 ~]$ FitHiChIP_HiCPro.sh -C configfile_P2P_BiasCorrection_ICEBias
 

 ================ Parsing input configuration file =================


Content of ValidPairs is ./TestData/Sample_ValidPairs.txt.gz
Content of Interval is
Content of Matrix is
Content of Bed is
Content of CircularGenome is 0
Content of PeakFile is ./TestData/Sample.Peaks.gz
Content of OutDir is ./TestData/results/
Content of IntType is 3
Content of BINSIZE is 5000
Content of LowDistThr is 20000
Content of UppDistThr is 2000000
Content of UseP2PBackgrnd is 1
Content of BiasType is 2
Content of MergeInt is 1
Content of QVALUE is 0.01
Content of ChrSizeFile is ./TestData/chrom_hg19.sizes
Content of PREFIX is FitHiChIP
Content of OverWrite is 1
Base directory containing HiCPro package : /opt/conda/envs/fithichip/HiC-Pro-3.1.0

 ================ Verifying input configuration parameters =================


***** Specified output directory : ./TestData/results/
HiC-pro is installed in the system
Installed HiC-pro version: 3.1.0
Installed python version: 3.8.10
*** Python library gzip is installed
*** Python module OptionParser (from the package optparse) is installed
*** Python package networkx is installed
*** Found MACS2 package (for peak calling) installed in the system -  line
Installed R version: 3.6.3
Installed samtools version: 1.12
*** bgzip utility is installed in the system
*** tabix utility is installed in the system
Installed bedtools version: 2.30.0


 ====== Changing relative pathnames of the input files to their absolute path names ==========
...
 ====== Writing input parameters ==========


Executable of python3: /opt/conda/envs/fithichip/bin/python3
Executable of R : /usr/bin/Rscript


 ================ Processing HiC-pro generated valid pairs and / or matrix files provided as input =================



 ====>> Computing HiC-pro matrices from the input valid pairs file

 ====>> Executable to generate contact matrix from valid pairs: /opt/conda/envs/fithichip/HiC-Pro_3.1.0/scripts/build_matrix
***** HiC-pro input valid pairs file in gzipped format
...

 ======= Limiting input interactions to the specified distance ranges 20000 to 2000000 =========


===>> Number of cis pairs with nonzero contact count (after distance thresholding): 619988

                                                                                                               
================ Generating coverage statistics and bias for individual bins =================


Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
                                                                                                              
Attaching package: 'BiocGenerics'
...
Attaching package: 'S4Vectors'
...
Computing 1D coverage - Processing chromosome : chr1  -- number of input peaks for this chromosome : 4175
 Computing 1D coverage - Processing chromosome : chr10  -- number of input peaks for this chromosome : 1797
 Computing 1D coverage - Processing chromosome : chr11  -- number of input peaks for this chromosome : 2155
 Computing 1D coverage - Processing chromosome : chr11_gl000202_random
 Computing 1D coverage - Processing chromosome : chr12  -- number of input peaks for this chromosome : 2159
 Computing 1D coverage - Processing chromosome : chr13  -- number of input peaks for this chromosome : 810
 Computing 1D coverage - Processing chromosome : chr14  -- number of input peaks for this chromosome : 1448
 Computing 1D coverage - Processing chromosome : chr15  -- number of input peaks for this chromosome : 1203
 Computing 1D coverage - Processing chromosome : chr16  -- number of input peaks for this chromosome : 1701
 Computing 1D coverage - Processing chromosome : chr17  -- number of input peaks for this chromosome : 2439
 Computing 1D coverage - Processing chromosome : chr17_ctg5_hap1
 Computing 1D coverage - Processing chromosome : chr17_gl000203_random
 Computing 1D coverage - Processing chromosome : chr17_gl000204_random
 Computing 1D coverage - Processing chromosome : chr17_gl000205_random
 Computing 1D coverage - Processing chromosome : chr17_gl000206_random
 Computing 1D coverage - Processing chromosome : chr18  -- number of input peaks for this chromosome : 592
 Computing 1D coverage - Processing chromosome : chr18_gl000207_random
 Computing 1D coverage - Processing chromosome : chr19  -- number of input peaks for this chromosome : 2595
 Computing 1D coverage - Processing chromosome : chr19_gl000208_random
 Computing 1D coverage - Processing chromosome : chr19_gl000209_random
 Computing 1D coverage - Processing chromosome : chr1_gl000191_random
 Computing 1D coverage - Processing chromosome : chr1_gl000192_random
 Computing 1D coverage - Processing chromosome : chr2  -- number of input peaks for this chromosome : 2858
 Computing 1D coverage - Processing chromosome : chr20  -- number of input peaks for this chromosome : 1037
 Computing 1D coverage - Processing chromosome : chr21  -- number of input peaks for this chromosome : 500
 Computing 1D coverage - Processing chromosome : chr21_gl000210_random
...
 ================ Computing bias statistics - ICE bias will be employed =================

 *** ICE computation Executable: /opt/conda/envs/fithichip/HiC-Pro_3.1.0/scripts/ice
*** Computing ICE based bias vector from the HiC-pro contact matrix
/opt/conda/envs/fithichip/HiC-Pro_3.1.0/scripts/ice:65: SyntaxWarning: "is" with a literal. Did you mean "=="?  if "--filtering_perc" is None and "--filter_low_counts_perc" not in sys.argv:
/opt/conda/envs/fithichip/lib/python3.8/site-packages/iced/normalization/_ca_utils.py:8: UserWarning: The API of this module is likely to change. Use only for testing purposes
  warnings.warn(
Assuming the file is 1-based. If this is not the desired option, set option --base to 0
======== Appended ICE bias information for individual genomic bins
...

 ================ creating full feature file for FitHiChIP =================
...

 ================ Generating interactions + features for significance estimation =================
...
 Processing chromosome for interaction features: chr1  --- number of bin pairs with nonzero contacts for this chromosome : 61410
 Processing chromosome for interaction features: chr10  --- number of bin pairs with nonzero contacts for this chromosome : 26255
 Processing chromosome for interaction features: chr11  --- number of bin pairs with nonzero contacts for this chromosome : 32117
 Processing chromosome for interaction features: chr11_gl000202_random  --- number of bin pairs with nonzero contacts for this chromosome : 0
 Processing chromosome for interaction features: chr12  --- number of bin pairs with nonzero contacts for this chromosome : 34450
 Processing chromosome for interaction features: chr13  --- number of bin pairs with nonzero contacts for this chromosome : 15401
...
                                                                                                               **** Start of while Loop ----- current interaction type: 3  ******
                                                                                                              
                                                                                                               ============ Calling significant interactions ===============
                                                                                                              
 ---- Processing distance value for sorting locus pairs based on interaction distance: 20000
 ---- Processing distance value for sorting locus pairs based on interaction distance: 25000
 ---- Processing distance value for sorting locus pairs based on interaction distance: 30000
 ---- Processing distance value for sorting locus pairs based on interaction distance: 35000
 ---- Processing distance value for sorting locus pairs based on interaction distance: 40000
 ---- Processing distance value for sorting locus pairs based on interaction distance: 45000
...
 ============= Calling significant interactions ==========



 Number of cores in the system: 72

 ******* Within significant interaction module - list of parameters -
 BinSize : 5000
 IntType : 3
 P2P background usage : 1
 BiasCorr : 1
 BiasType : 2
 nbins : 200


 ===>> Total Number of input interactions (locus pairs): 286822
 ****** Total number of training interactions: 53595 ********
 ****** Number of contacts per bin (allowed for equal occupancy binning): 433 ********
...

***** FitHiChIP pipeline is completely executed - congratulations !!! *****
Likewise, FitHiChIP can be run with other configuration files.
[user@cn3316 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. fithichip.sh). For example:

#!/bin/bash
module load FitHiChIP
FitHiChIP_HiCPro.sh -C configfile_1
FitHiChIP_HiCPro.sh -C configfile_2
FitHiChIP_HiCPro.sh -C configfile_3
FitHiChIP_HiCPro.sh -C configfile_4

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] fithichip.sh