homer solid tools and methods for interpreting ChIP-Seq and other HTS experiments. In addition to UCSC visualization support, peak finding, and motif finding of course, homer can help assemble data across multiple experiments and look at position-specific relationships between sequencing tags, motifs, and other features. You do not need to use the peak finding methods in this package to use motif finding. (Use the bed2pos.pl program to create peak files from BED files).
$HOMER_TEST_DATA
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=10g --gres=lscratch:20 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load homer [user@cn3144 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144 ~]$ # copy some test data (CTCF ChIP-Seq from human Parathyroid adenoma) [user@cn3144 ~]$ cp -L $HOMER_TEST_DATA/* . [user@cn3144 ~]$ ls -lh -rw-rw-r-- 1 user group 869M Mar 15 10:42 ENCFF374UMR.bam [user@cn3144 ~]$ makeTagDirectory ctcf -unique -genome hg38 -checkGC ENCFF374UMR.bam Will parse file: ENCFF374UMR.bam Creating directory: ctcf and removing existing *.tags.tsv Treating ENCFF374UMR.bam as a bam file Reading alignment file ENCFF374UMR.bam Optimizing tag files... Estimated genome size = 3098172877 Estimated average read density = 0.017473 per bp Total Tags = 54133547.0 Total Positions = 43151510 Average tag length = 76.0 Median tags per position = 1 (ideal: 1) Average tags per position = 1.254 Fragment Length Estimate: 198 Peak Width Estimate: 333 Autocorrelation quality control metrics: Same strand fold enrichment: 2.0 Diff strand fold enrichment: 1.9 Same / Diff fold enrichment: 1.0 [...snip...] [user@cn3144 ~]$ du -sh ctcf 1021M ctcf [user@cn3144 ~]$ ml ucsc [user@cn3144 ~]$ samtools view -H ENCFF374UMR.bam | grep '@SQ' | tr ':' '\t' | cut -f3,5 > chroms [user@cn3144 ~]$ makeUCSCfile ctcf -bigWig chroms -fsize 1e20 -o ctcf.bw ... [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. homer.sh), which uses the input file 'homer.in'. For example:
#!/bin/bash module load || exit 1 #set up the local version of preparsed files since findMotifsGenome needs write right cd /data/$USER/ mkdir homer_local cd homer_local # create symbolic link to preparsed files for each in /fdb/homer/genomes/hg38/preparsed/*; do ln -s $each .; done # please give the full path to /data/$USER/homer_local findMotifsGenome.pl file.txt hg38 output_dir -preparsedDir /data/$USER/homer_local -size 200 -len 8
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=2 --mem=10g homer.sh
Create a swarmfile (e.g. homer.swarm). For example:
cd /data/$USER/dir1; findMotifs.pl genelist.txt mouse Results -len 10 cd /data/$USER/dir2; findMotifs.pl genelist.txt mouse Results -len 10 cd /data/$USER/dir3; findMotifs.pl genelist.txt mouse Results -len 10
Submit this job using the swarm command.
swarm -f homer.swarm -g 10 -t 2 --module homer/4.9.1where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module homer | Loads the homer module for each subjob in the swarm |