FitHiChIP: Identification of significant chromatin contacts from HiChIP data
FitHiChIP is a computational method for identifying chromatin contacts among regulatory regions such as enhancers and promoters from HiChIP/PLAC-seq data. FitHiChIP jointly models the non-uniform coverage and genomic distance scaling of HiChIP data, captures previously validated enhancer interactions for several genes including MYC and TP53, and recovers contacts genome-wide that are supported by ChIA-PET, promoter capture Hi-C and Hi-C data.
References:
- Sourya Bhattacharyya, Vivek Chandra, Pandurangan Vijayanand, and Ferhat Ay,
FitHiChIP: Identication of signicant chromatin contacts from HiChIP data.
bioRxiv Sep. 10, 2018; doi: http://dx.doi.org/10.1101/412833.
Documentation
Important Notes
- Module Name: FitHiChIP (see the modules page for more information)
- Implemented as a Singularity container
- Unusual environment variables set
- FITHICHIP_HOME installation directory
- FITHICHIP_BIN executables directory
- FITHICHIP_CONFIGsample configuration files directory
- FITHICHIP_SRC source directory
- FITHICHIP_DATA sample data directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=8g -c10 --gres=lscratch:20 [user@cn3316 ~]$ module load FitHiChIP [+] Loading singularity 3.8.5-1 on cn0853 [+] Loading FitHiChIP 9.1The general command to run FitHiChIP on sample data is:
[user@cn3123 user]$ FitHiChIP_HiCPro.sh -C <configuration_file>To copy sample data to your current folder, type:
[user@cn3316 ~]$ cp -rP $FITHICHIP_DATA .To copy a sample configuration file to your current folder, type:
[user@cn3316 ~]$ cp $FITHICHIP_CONFIG/* .The following command assumes that sample data for running FitHiChIP are stored in the folder ./TestData. (To change the location of the data, edit the corresponding configfile):
[user@cn3316 ~]$ FitHiChIP_HiCPro.sh -C configfile_P2P_BiasCorrection_ICEBias ================ Parsing input configuration file ================= Content of ValidPairs is ./TestData/Sample_ValidPairs.txt.gz Content of Interval is Content of Matrix is Content of Bed is Content of CircularGenome is 0 Content of PeakFile is ./TestData/Sample.Peaks.gz Content of OutDir is ./TestData/results/ Content of IntType is 3 Content of BINSIZE is 5000 Content of LowDistThr is 20000 Content of UppDistThr is 2000000 Content of UseP2PBackgrnd is 1 Content of BiasType is 2 Content of MergeInt is 1 Content of QVALUE is 0.01 Content of ChrSizeFile is ./TestData/chrom_hg19.sizes Content of PREFIX is FitHiChIP Content of OverWrite is 1 Base directory containing HiCPro package : /opt/conda/envs/fithichip/HiC-Pro-3.1.0 ================ Verifying input configuration parameters ================= ***** Specified output directory : ./TestData/results/ HiC-pro is installed in the system Installed HiC-pro version: 3.1.0 Installed python version: 3.8.10 *** Python library gzip is installed *** Python module OptionParser (from the package optparse) is installed *** Python package networkx is installed *** Found MACS2 package (for peak calling) installed in the system - line Installed R version: 3.6.3 Installed samtools version: 1.12 *** bgzip utility is installed in the system *** tabix utility is installed in the system Installed bedtools version: 2.30.0 ====== Changing relative pathnames of the input files to their absolute path names ========== ... ====== Writing input parameters ========== Executable of python3: /opt/conda/envs/fithichip/bin/python3 Executable of R : /usr/bin/Rscript ================ Processing HiC-pro generated valid pairs and / or matrix files provided as input ================= ====>> Computing HiC-pro matrices from the input valid pairs file ====>> Executable to generate contact matrix from valid pairs: /opt/conda/envs/fithichip/HiC-Pro_3.1.0/scripts/build_matrix ***** HiC-pro input valid pairs file in gzipped format ... ======= Limiting input interactions to the specified distance ranges 20000 to 2000000 ========= ===>> Number of cis pairs with nonzero contact count (after distance thresholding): 619988 ================ Generating coverage statistics and bias for individual bins ================= Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel Attaching package: 'BiocGenerics' ... Attaching package: 'S4Vectors' ... Computing 1D coverage - Processing chromosome : chr1 -- number of input peaks for this chromosome : 4175 Computing 1D coverage - Processing chromosome : chr10 -- number of input peaks for this chromosome : 1797 Computing 1D coverage - Processing chromosome : chr11 -- number of input peaks for this chromosome : 2155 Computing 1D coverage - Processing chromosome : chr11_gl000202_random Computing 1D coverage - Processing chromosome : chr12 -- number of input peaks for this chromosome : 2159 Computing 1D coverage - Processing chromosome : chr13 -- number of input peaks for this chromosome : 810 Computing 1D coverage - Processing chromosome : chr14 -- number of input peaks for this chromosome : 1448 Computing 1D coverage - Processing chromosome : chr15 -- number of input peaks for this chromosome : 1203 Computing 1D coverage - Processing chromosome : chr16 -- number of input peaks for this chromosome : 1701 Computing 1D coverage - Processing chromosome : chr17 -- number of input peaks for this chromosome : 2439 Computing 1D coverage - Processing chromosome : chr17_ctg5_hap1 Computing 1D coverage - Processing chromosome : chr17_gl000203_random Computing 1D coverage - Processing chromosome : chr17_gl000204_random Computing 1D coverage - Processing chromosome : chr17_gl000205_random Computing 1D coverage - Processing chromosome : chr17_gl000206_random Computing 1D coverage - Processing chromosome : chr18 -- number of input peaks for this chromosome : 592 Computing 1D coverage - Processing chromosome : chr18_gl000207_random Computing 1D coverage - Processing chromosome : chr19 -- number of input peaks for this chromosome : 2595 Computing 1D coverage - Processing chromosome : chr19_gl000208_random Computing 1D coverage - Processing chromosome : chr19_gl000209_random Computing 1D coverage - Processing chromosome : chr1_gl000191_random Computing 1D coverage - Processing chromosome : chr1_gl000192_random Computing 1D coverage - Processing chromosome : chr2 -- number of input peaks for this chromosome : 2858 Computing 1D coverage - Processing chromosome : chr20 -- number of input peaks for this chromosome : 1037 Computing 1D coverage - Processing chromosome : chr21 -- number of input peaks for this chromosome : 500 Computing 1D coverage - Processing chromosome : chr21_gl000210_random ... ================ Computing bias statistics - ICE bias will be employed ================= *** ICE computation Executable: /opt/conda/envs/fithichip/HiC-Pro_3.1.0/scripts/ice *** Computing ICE based bias vector from the HiC-pro contact matrix /opt/conda/envs/fithichip/HiC-Pro_3.1.0/scripts/ice:65: SyntaxWarning: "is" with a literal. Did you mean "=="? if "--filtering_perc" is None and "--filter_low_counts_perc" not in sys.argv: /opt/conda/envs/fithichip/lib/python3.8/site-packages/iced/normalization/_ca_utils.py:8: UserWarning: The API of this module is likely to change. Use only for testing purposes warnings.warn( Assuming the file is 1-based. If this is not the desired option, set option --base to 0 ======== Appended ICE bias information for individual genomic bins ... ================ creating full feature file for FitHiChIP ================= ... ================ Generating interactions + features for significance estimation ================= ... Processing chromosome for interaction features: chr1 --- number of bin pairs with nonzero contacts for this chromosome : 61410 Processing chromosome for interaction features: chr10 --- number of bin pairs with nonzero contacts for this chromosome : 26255 Processing chromosome for interaction features: chr11 --- number of bin pairs with nonzero contacts for this chromosome : 32117 Processing chromosome for interaction features: chr11_gl000202_random --- number of bin pairs with nonzero contacts for this chromosome : 0 Processing chromosome for interaction features: chr12 --- number of bin pairs with nonzero contacts for this chromosome : 34450 Processing chromosome for interaction features: chr13 --- number of bin pairs with nonzero contacts for this chromosome : 15401 ... **** Start of while Loop ----- current interaction type: 3 ****** ============ Calling significant interactions =============== ---- Processing distance value for sorting locus pairs based on interaction distance: 20000 ---- Processing distance value for sorting locus pairs based on interaction distance: 25000 ---- Processing distance value for sorting locus pairs based on interaction distance: 30000 ---- Processing distance value for sorting locus pairs based on interaction distance: 35000 ---- Processing distance value for sorting locus pairs based on interaction distance: 40000 ---- Processing distance value for sorting locus pairs based on interaction distance: 45000 ... ============= Calling significant interactions ========== Number of cores in the system: 72 ******* Within significant interaction module - list of parameters - BinSize : 5000 IntType : 3 P2P background usage : 1 BiasCorr : 1 BiasType : 2 nbins : 200 ===>> Total Number of input interactions (locus pairs): 286822 ****** Total number of training interactions: 53595 ******** ****** Number of contacts per bin (allowed for equal occupancy binning): 433 ******** ... ***** FitHiChIP pipeline is completely executed - congratulations !!! *****Likewise, FitHiChIP can be run with other configuration files.
[user@cn3316 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. fithichip.sh). For example:
#!/bin/bash module load FitHiChIP FitHiChIP_HiCPro.sh -C configfile_1 FitHiChIP_HiCPro.sh -C configfile_2 FitHiChIP_HiCPro.sh -C configfile_3 FitHiChIP_HiCPro.sh -C configfile_4
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] fithichip.sh