HiNT (Hi-C for copy Number variation and Translocation detection), a computational method to detect CNVs and Translocations from Hi-C data. HiNT has three main components: HiNT-PRE, HiNT-CNV, and HiNT-TL. HiNT-PRE preprocesses Hi-C data and computes the contact matrix, which stores contact frequencies between any two genomic loci; both HiNT-CNV and HiNT-TL starts with HI-C contact matrix, predicts copy number segments, and inter-chromosomal translocations, respectively
- Module Name: hint (see the modules page for more information)
- see /fdb/hint for reference, index, matrices, and example files
- Programs are multithreaded.
Allocate an interactive session and run the interactive job there.
[biowulf]$ sinteractive --mem=40g --cpus-per-task=16 salloc.exe: Granted job allocation 789523 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn0135 are ready for job [cn0135]$ cd /data/$USER/ [cn0135]$ module load hint [cn0135]$ cp /fdb/hint/testData/ . [cn0135]$ cd testData [cn0135]$ hint pre -d $TESTDIR/test.bam \ --refdir $REFDIR/hg19 \ --informat bam \ --outformat juicer \ -g hg19 \ -n test \ -o /data/$USER/out \ --samtoolspath $SAMTOOLS \ -p $SLURM_CPUS_PER_TASK \ --pairtoolspath $PAIRTOOLS \ --juicerpath $JUICERPATH/scripts/juicer_tools_1.22.01.jar [14:42:34] Argument List: [14:42:34] Hi-C data = /, f, d, b, /, h, i, n, t, /, t, e, s, t, D, a, t, a, /, t, e, s, t, ., b, a, m [14:42:34] Input format = bam [14:42:34] Output format = juicer [14:42:34] Genome = hg19 ... ... Calculating norms for zoom BP_10000 Calculating norms for zoom BP_5000 Writing expected Writing norms Finished writing norms or [cn0135]$ hint pre \ -d $TESTDIR/TestSub_1.fq.gz,$TESTDIR/TestSub_2.fq.gz \ -a $BWA \ -i $BWAINDEXDIR/hg19/hg19.fa \ --refdir $REFDIR/hg19 \ --informat fastq \ --outformat cooler \ -g hg19 \ -n test \ -o /data/$USER/out \ --samtoolspath $SAMTOOLS \ -p $SLURM_CPUS_PER_TASK \ --pairtoolspath $PAIRTOOLS \ --coolerpath $COOLER or [cn0135]$ hint tl \ -m test.hic \ -f juicer \ --refdir $REFDIR/hg19 \ --backdir $MATRICESDIR/hg19 \ -g hg19 \ -n test \ -c 0.05 \ --ppath $PAIRIX \ -p $SLURM_CPUS_PER_TASK \ -o testout [cn0135]$ exit salloc.exe: Job allocation 789523 has been revoked. [biowulf]$
Note: this job allocates 10 GB of memory and automatically assign the number of cpus allocated to the variable $SLURM_CPUS_PER_TASK.
The test takes less than 30 minutes.
1. Create a script file (myscript) similar to the one below.
#!/bin/bash cd /data/$USER/testData module load hint hint pre -d $TESTDIR/test.bam \ --refdir $REFDIR/hg19 \ --informat bam \ --outformat juicer \ -g hg19 \ -n test \ -o /data/$USER/out \ --samtoolspath $SAMTOOLS \ -p $SLURM_CPUS_PER_TASK \ --pairtoolspath $PAIRTOOLS \ --juicerpath $JUICERPATH
2. Submit the script on biowulf:
[biowulf]$ sbatch --mem=40g --cpus-per-task=8 myscript
Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.
Set up a swarm command file (eg /data/$USER/cmdfile).
cd /data/$USER/dir1; hint pre -d $TESTDIR/test.bam ... cd /data/$USER/dir2; hint pre -d $TESTDIR/test.bam ... cd /data/$USER/dir3; hint pre -d $TESTDIR/test.bam ... ... cd /data/$USER/dir20; hint pre -d $TESTDIR/test.bam ...
submit the swarm job:
$ swarm -f cmdfile --module hint -g 40 -t 16
For more information regarding running swarm, see swarm.html