SEACR is intended to call peaks and enriched regions from sparse Cleavage Under Targets and Release Using Nuclease (CUT&RUN) or chromatin profiling data in which background is dominated by "zeroes" (i.e. regions with no read coverage).
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=16g [user@cn3144 ~]$ module load SEACR [+] Loading GSL 2.6 for GCC 9.2.0 ... [+] Loading gcc 9.2.0 ... [+] Loading openmpi 3.1.4 for GCC 9.2.0 [+] Loading ImageMagick 7.0.8 on cn2357 [+] Loading HDF5 1.10.4 [+] Loading pandoc 2.9.1 on cn2357 [+] Loading R 3.6.1 [+] Loading bedtools 2.29.0 [+] Loading SEACR 1.3Display the usage message for SEACR:
[user@cn3144 ~]$ SEACR.sh SEACR: Sparse Enrichment Analysis for CUT&RUN Usage: bash SEACR.shCopy sample data from an application folder to your current folder:.bg [ .bg | ] [norm | non] [union | AUC] output prefix Description of input fields: Field 1: Target data bedgraph file in UCSC bedgraph format (https://genome.ucsc.edu/goldenpath/help/bedgraph.html) that omits regions containing 0 signal. Field 2: Control (IgG) data bedgraph file to generate an empirical threshold for peak calling. Alternatively, a numeric threshold n between 0 and 1 returns the top n fraction of peaks based on total signal within peaks. Field 3: “norm” denotes normalization of control to target data, “non” skips this behavior. norm is recommended unless experimental and control data are already rigorously normalized to each other (e.g. via spike-in). Field 4: “union” forces implementation of a maximum signal threshold in addition to the total signal threshold, and corresponds to the “union” mode described in the text, whereas “AUC” avoids this behavior, and corresponds to “AUC only” mode. Field 5: Output prefix Output file:
[user@cn3144 ~]$ cp $SEACR_DATA/* .This command will copy the following six files:
[user@cn3144 ~]$ ls DE_FoxA2.hg19.bed DE_IgG.hg19.bed DE_Sox2.hg19.bed H1_FoxA2.hg19.bed H1_IgG.hg19.bed H1_Sox2.hg19.bedThe data have been downloaded from the GEO website. Here,
[user@cn3144 ~]$ ln -s /fdb/igenomes/Homo_sapiens/UCSC/hg19/GenomeStudio/Homo_sapiens/UCSC-hg19/ChromInfo.txt [user@cn3144 ~]$ genomeCoverageBed -i DE_FoxA2.hg19.bed -g ChromInfo.txt -bg > DE_FoxA2.hg19.bedGraph [user@cn3144 ~]$ genomeCoverageBed -i H1_FoxA2.hg19.bed -g ChromInfo.txt -bg > H1_FoxA2.hg19.bedGraph [user@cn3144 ~]$ genomeCoverageBed -i DE_IgG.hg19.bed -g ChromInfo.txt -bg > DE_IgG.hg19.bedGraph [user@cn3144 ~]$ genomeCoverageBed -i H1_IgG.hg19.bed -g ChromInfo.txt -bg > H1_IgG.hg19.bedGraph [user@cn3144 ~]$ genomeCoverageBed -i DE_Sox2.hg19.bed -g ChromInfo.txt -bg > DE_SoxA2.hg19.bedGraph [user@cn3144 ~]$ genomeCoverageBed -i H1_Sox2.hg19.bed -g ChromInfo.txt -bg > H1_SoxA2.hg19.bedGraphCall enriched regions in target data using normalized IgG control track with AUC threshold:
[user@cn3144 ~]$ SEACR.sh DE_FoxA2.hg19.bedGraph DE_IgG.hg19.bedGraph norm stringent AUC output Calling enriched regions with control file Must specify "norm" for normalized or "non" for non-normalized data processing in third input [helixapp@cn4470 SEACR]$ SEACR.sh DE_FoxA2.hg19.bedGraph DE_IgG.hg19.bedGraph norm stringent AUC output Calling enriched regions with control file Normalizing control to experimental bedgraph Using stringent threshold Creating experimental AUC file: Wed Jan 29 13:55:19 EST 2020 Creating control AUC file: Wed Jan 29 13:55:49 EST 2020 Calculating optimal AUC threshold: Wed Jan 29 13:56:17 EST 2020 Calculating threshold using normalized control: Wed Jan 29 13:56:17 EST 2020 Warning messages: 1: In ((pctremain(x0) + min(pctremain(z)))/2) - pctremain(z) : longer object length is not a multiple of shorter object length 2: In ((pctremain(x0) + min(pctremain(z)))/2) - pctremain(z) : longer object length is not a multiple of shorter object length Creating thresholded feature file: Wed Jan 29 13:58:05 EST 2020 Empirical false discovery rate = 0.356839643073724 Merging nearby features and eliminating control-enriched features: Wed Jan 29 13:58:20 EST 2020 Removing temporary files: Wed Jan 29 13:58:20 EST 2020 Done: Wed Jan 29 13:58:20 EST 2020Likewise,
[user@cn3144 ~]$ SEACR.sh DE_Sox2.hg19.bedGraph DE_IgG.hg19.bedGraph norm stringent AUC output_DE_Sox2_norm_IgG ... [user@cn3144 ~]$ SEACR.sh H1_FoxA2.hg19.bedGraph H1_IgG.hg19.bedGraph norm stringent AUC output_H1_FoxA2_norm_IgG ... [user@cn3144 ~]$ SEACR.sh H1_Sox2.hg19.bedGraph H1_IgG.hg19.bedGraph norm stringent AUC output_H1_Sox2_norm_IgG ...Now, call enriched regions in target data in a different mode, using non-normalized IgG control track with AUC and max signal thresholds:
[user@cn3144 ~]$ SEACR.sh DE_FoxA2.hg19.bedGraph DE_IgG.hg19.bedGraph non stringent union output_DE_FoxA2_non-norm_IgG ... [user@cn3144 ~]$ SEACR.sh DE_Sox2.hg19.bedGraph DE_IgG.hg19.bedGraph non stringent union output_DE_Sox2_non-norm_IgG ... [user@cn3144 ~]$ SEACR.sh H1_FoxA2.hg19.bedGraph H1_IgG.hg19.bedGraph non stringent union output_H1_FoxA2_non-norm_IgG ... [user@cn3144 ~]$ SEACR.sh H1_Sox2.hg19.bedGraph H1_IgG.hg19.bedGraph non stringent union output_H1_Sox2_non-norm_IgG ...As expected, the output indicates that in both the modes of runn ing the software, SEACR detected a large number of called peaks in the cell types where the Sox2 factor or FoxA2 factor is expressed:
H1_Sox2_norm_IgG - 17047 peaks H1_Sox2_non-norm_IgG - 20267 peaks DE_FoxA2_norm_IgG - 6227 peaks DE_FoxA2_non-norm_IgG - 7436 peaksand a small number of (spurious) peaks in the cells where the corresponding factor is not expressed:
H1_FoxA2_norm_IgG - 1 peak H1_FoxA2_non-norm_IgG - 4 peaks DE_Sox2_norm_IgG - 2 peaks DE_Sox2_non-norm_IgG - 4 peaks [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Create a batch input file (e.g. SEACR.sh). For example:
#!/bin/bash module load SEACR genomeCoverageBed -i DE_IgG.hg19.bed -g ChromInfo.txt -bg > DE_IgG.hg19.bedGraph genomeCoverageBed -i H1_IgG.hg19.bed -g ChromInfo.txt -bg > H1_IgG.hg19.bedGraph genomeCoverageBed -i DE_FoxA2.hg19.bed -g ChromInfo.txt -bg > DE_FoxA2.hg19.bedGraph genomeCoverageBed -i H1_FoxA2.hg19.bed -g ChromInfo.txt -bg > H1_FoxA2.hg19.bedGraph genomeCoverageBed -i DE_Sox2.hg19.bed -g ChromInfo.txt -bg > DE_Sox2.hg19.bedGraph genomeCoverageBed -i H1_Sox2.hg19.bed -g ChromInfo.txt -bg > H1_Sox2.hg19.bedGraph SEACR.sh DE_FoxA2.hg19.bedGraph DE_IgG.hg19.bedGraph norm AUC output_DE_FoxA2_norm_IgG SEACR.sh DE_Sox2.hg19.bedGraph DE_IgG.hg19.bedGraph norm AUC output_DE_Sox2_norm_IgG SEACR.sh H1_FoxA2.hg19.bedGraph H1_IgG.hg19.bedGraph norm AUC output_H1_FoxA2_norm_IgG SEACR.sh H1_Sox2.hg19.bedGraph H1_IgG.hg19.bedGraph norm AUC output_H1_Sox2_norm_IgG ... etc.
Submit this job using the Slurm sbatch command.
sbatch [--cpus-per-task=#] [--mem=#] SEACR.sh