CutRunTools is a pipeline for analysis of data produced by the CUT&RUN (Cleavage Under Targets and Release Using Nuclease) technology for high-resolution mapping of DNA binding sites. It is a flexible, general pipeline that facilitates identification of chromatin-associated protein binding and performs genomic footprinting analysis from antibody-targeted CutRun primary cleavage data. CutRunTools extracts endonuclease cut site information from sequences of short read fragments and produces single-locus binding estimates, aggregate motif footprints, and informative visualizations to support the high-resolution mapping capability of CutRun
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=12g -c5 --gres=lscratch:10 [user@cn3335 ~]$ module load CutRunTools/20200629 [+] Loading bedops 2.4.35 [+] Loading bedtools 2.27.1 [+] Loading bowtie 2-2.3.4.3 [+] Loading macs 2.1.2 [+] Loading meme 4.12.0 on cn2382 [+] Loading HDF5 1.10.4 [+] Loading NetCDF 4.7.4_gcc9.2.0 [+] Loading picard 2.22.2 [+] Loading gcc 7.3.0 ... [+] Loading GSL 2.4 for GCC 7.2.0 ... [+] Loading openmpi 3.0.2 for GCC 7.3.0 [+] Loading ImageMagick 7.0.8 on cn2382 [+] Loading pandoc 2.9.2.1 on cn2382 [+] Loading R 3.5.2 [+] Loading samtools 1.9 ... [+] Loading trimmomatic 0.36 on cn2382 [+] Loading CutRunTools 20200629 [user@cn3335 ~]$mkdir -p /data/$USER/CutRunTools && cd /data/$USER/CutRunToolsThe processing described below involves both interactive and batch submission steps and follows the outline presented in the CutRunTools Usage documentation (see above).
[user@cn3335 ~]$ cp -r $CUTRUNTOOLS_SRC/* . [user@cn3335 ~]$ cp -P $CUTRUNTOOLS_DATA/* .
Then customize the configuration file config.json for your needs by editing/replacing the lines containing the string "user". The current configuration assumes that the input/output directory for your data processing will be a folder "workdir" in your currecvt directory.
[user@cn3335 ~]$ ./validate.py config.jsonIf no error messages are produced, run the command:
[user@cn3335 ~]$ ./create_scripts.py config.jsonA folder workdir with a number of subfolders and files in it will be created.
[user@cn3335 ~]$ cd workdirThe CutRunTools data processing involves four steps.
[user@cn3335 ~]$ sbatch ./integrated.sh GATA1_D7_30min_chr11_R1_001.fastq.gzor
[user@cn3335 ~]$ ./integrated.sh GATA1_D7_30min_chr11_R1_001.fastq.gzEven though the commands specify the *_R1_001.fastq.gz file as the only input, CutRunTools will actually check that both forward and reverse fastq files are present.
aligned.aug10 └──GATA1_D7_30min_chr11_aligned_reads.bam trimmed ├── GATA1_D7_30min_chr11_1.paired.fastq.gz ├── GATA1_D7_30min_chr11_1.unpaired.fastq.gz ├── GATA1_D7_30min_chr11_2.paired.fastq.gz └── GATA1_D7_30min_chr11_2.unpaired.fastq.gz trimmed3 ├── GATA1_D7_30min_chr11_1.paired.fastq.gz └── GATA1_D7_30min_chr11_2.paired.fastq.gzSTEP 2: BAM processing and peak calling.
[user@cn3335 ~]$ cd aligned.aug10Run either one of the following commands:
[user@cn3335 ~]$ sbatch ./integrated.step2.sh GATA1_D7_30min_chr11_aligned_reads.bamor
[user@cn3335 ~]$ ./integrated.step2.sh GATA1_D7_30min_chr11_aligned_reads.bamA number of output files will be produced, inclusing the peak files *broadPeak and *narrowPeak in folders ../macs2.* and files .stringent.sort.bed in folders ../seacr.*, as you can see by running the command:
[user@cn3335 ~]$ ls ../macs2.*/*Peak ../seacr*/*.stringent.sort.bed
[user@cn3335 ~]$ cd .. [user@cn3335 ~]$ ./run_step3.shThis command will submit 12 jobs to the compute cluster. Upon completion of the jobs, a number of new files and subfolders inside of the folders macs2.* and seacr.* will be produced. Alternatively, you can run any of these jobs interactively. For example:
[user@cn3335 ~]$ cd macs2.broad.aug18 [user@cn3335 ~]$ ./integrate.motif.find.sh GATA1_D7_30min_chr11_aligned_reads_peaks.broadPeak [user@cn3335 ~]$ cd ../macs2.narrow.aug18 [user@cn3335 ~]$ ./integrate.motif.find.sh GATA1_D7_30min_chr11_aligned_reads_peaks.narrowPeak [user@cn3335 ~]$ cd ../seacr.aug12 [user@cn3335 ~]$ ./integrate.motif.find.sh GATA1_D7_30min_chr11_aligned_reads_treat.stringent.sort.bedetc.
[user@cn3335 ~]$ cd /data/$USER/CutRunTools/workdir [user@cn3335 ~]$ ./run_step4.shThis command will submit 12 jobs to the compute cluster. Upon completion of the jobs, a number of new files and subfolders inside of the folders macs2.* and seacr.* will be produced. Alternatively, you can run any of these jobs interactively. For example:
[user@cn3335 ~]$ cd macs2.broad.aug18 [user@cn3335 ~]$ ./integrate.footprinting.sh GATA1_D7_30min_chr11_aligned_reads_peaks.broadPeak [user@cn3335 ~]$ cd ../macs2.narrow.aug18 [user@cn3335 ~]$ ./integrate.footprinting.sh GATA1_D7_30min_chr11_aligned_reads_peaks.narrowPeak [user@cn3335 ~]$ cd ../seacr.aug12 [user@cn3335 ~]$ ./integrate.footprinting.sh GATA1_D7_30min_chr11_aligned_reads_treat.stringent.sort.bedetc.
[user@cn3335 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$