High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed

HINT (Hmm-based IdeNtification of Transcription factor footprints) integrates both DNase I hypersensitivity and histone modifications for the detection of open chromatin regions and active binding sites. Within transcription factor binding sites, there is a specific grammar of DNase I digestion and histone marks. The authors have therefore devised a multivariate HMM to model this regulatory grammar by simultaneous analysis of DNase-seq and the ChIP-seq profiles of histone modifications on a genome-wide level. The HMM has as input a normalized and a slope signal of DNase-seq and one of the histone marks. It can therefore detect the increase, top and decrease regions of either histone modification and DNase signals. The genomic regions annotated with the HMM state are considered predictions and represent likely binding sites within that cell context.  For more details on the method, please check full paper. Please cite the paper if use this tool. For benchmarking data of main publication please visit authors lab's website.

Example files are under /usr/local/apps/rgt/HINT_example directory.
To test hint with the example files:

  $ cp -r /usr/local/apps/rgt/HINT_example /data/$USER
  $ cd /data/$USER/HINT_example
  $ sinteractive --mem=10g
  $ module load rgt
  $ rgt-hint --output-location ./Output/ ./InputMatrix_HINT_DNase+Histone.txt
  $ rgt-hint --output-location ./Output/ ./InputMatrix_HINT_DNase.txt
  $ rgt-hint --default-bias-correction --output-location ./Output/ ./InputMatrix_HINTBC_DNase.txt

Note: the Output directory needs to be exist before submit the commands otherwise will fail with error.

The default data directory is /fdb/rgt/rgtdata/.

On Helix

Sample session:

[susanc@helix ~]$ module load rgt
[susanc@helix ~]$ rgt-hint -h
The 'hint' program predicts TFBSs given open chromatin data.
In order to use this tools, please type: 

rgt-hint [options] 

The  should contain:
- One region file representing the regions in which the HMM
  will be applied. It should contain 'regions' in the type field
- One DNase aligned reads file (bam) file with 'DNASE' in the name field.
- One to Three histone modification aligned reads file (bam).

For more information, please refer to:

  --version             show program's version number and exit
  -h, --help            show this help message and exit
                        List of HMM files separated by comma. If one file
                        only, then this HMM will be applied for all histone
                        signals, otherwise, the list must have the same number
                        of histone files given. The order of the list should
                        be the order of the histones in the input_matrix file.
                        If the argument is not given, then a default HMM will
                        be used. In case multiple input groups are used, then
                        other lists can be passed using semicolon. The number
                        of group of lists should equals the number of input
                        List of files (for each input group; separated by
                        semicolon) with all possible k-mers (for any k) and
                        their bias estimates. Each input groupshould have two
                        files: one for the forward and one for the negative
                        strand.Each line should contain a kmer and the bias
                        estimate separated by tab. Leave an empty set for
                        histone-only groups. Eg. FILE1;;FILE3.
  --organism=STRING     Organism considered on the analysis. Check our full
                        documentation for all available options. All default
                        files such as genomes will be based on the chosen
                        organism and the data.config file. This option is used
                        only if a bigbed output is asked.
                        Applies DNase-seq cleavage bias correction with k-mer
                        bias estimated from the given DNase-seq data (SLOW
                        Applies DNase-seq cleavage bias correction with
                        default k-mer bias estimates (FAST HINT-BC).
                        Path where the output files will be written.
  --print-bb            If used, the output will be a bigbed (.bb) file.

Batch job on Biowulf

Create a batch input file (e.g. script.sh). For example:

module load rgt

cd /data/$USER/dir
hint command 1
hint command 2

Then submit the file on biowulf

biowulf> $ sbatch script.sh

For more information regarding sbatch command : https://hpc.nih.gov/docs/userguide.html#submit

Swarm of Jobs on Biowulf

Create a swarmfile (e.g. script.swarm). For example:

# this file is called script.swarm
cd dir1;hint command 1; hint command 2
cd dir2;hint command 1; hint command 2
cd dir3;hint command 1; hint command 2

Submit this job using the swarm command.

swarm -f script.swarm --module rgt

For more information regarding swarm: https://hpc.nih.gov/apps/swarm.html#usage

Interactive job on Biowulf

Allocate an interactive session. Sample session:

[susanc@biowulf ~]$ sinteractive --mem=5g
salloc.exe: Pending job allocation 15194042
salloc.exe: job 15194042 queued and waiting for resources
salloc.exe: job 15194042 has been allocated resources
salloc.exe: Granted job allocation 15194042
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn1719 are ready for job

[susanc@cn1719 ~]$ module load rgt

[susanc@cn1719 ~]$ hint command