PEPATAC: a modular pipeline for ATAC-seq data processing
PEPATAC is a robust pipeline for Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) built on a loosely coupled modular framework. It may be easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. It is optimized on unique features of ATAC-seq data to be fast and accurate and provides several unique analytical approaches.
Documentation
Important Notes
- Module Name: PEPATAC (see the modules page for more information)
- Unusual environment variables set
- PEPATAC_HOME installation directory
- PEPATAC_BIN executable directory
- PEPATAC_SRC source code directory
- PEPATAC_DATA sample data directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --cpus-per-task=16 --mem=32g --gres=lscratch:10 [user@cn3200 ~]$module load PEPATAC/0.9.16 [+] Loading bedtools 2.27.1 [+] Loading bowtie 2-2.4.4 [+] Loading preseq 3.1.2 [+] Loading fastqc 0.11.9 [+] Loading samtools 1.12 ... [+] Loading samblaster 0.1.25 [+] Loading GSL 2.4 for GCC 7.2.0 ... [+] Loading gcc 7.3.0 ... [+] Loading openmpi 3.0.2 for GCC 7.3.0 [+] Loading ImageMagick 7.0.8 on cn0852 [+] Loading HDF5 1.10.4 [+] Loading pandoc 2.14.0.2 on cn0852 [+] Loading R 3.5.2 [+] Loading pigz 2.4 on cn0852 [+] Loading pepatac 0.9.16Described below are the steps to run PEPATAC on the tutorial example:
[user@cn3200 ~]$ mkdir pepatac_tutorial [user@cn3200 ~]$ export TUTORIAL=$PWD/pepatac_tutorial [user@cn3200 ~]$ cd pepatac_tutorial [user@cn3200 ~]$ mkdir data genomes processed templates tools [user@cn3200 ~]$ cd tools [user@cn3200 ~]$ git clone https://github.com/databio/pepatac.gitDownload tutorial files:
[user@cn3200 ~]$ wget http://big.databio.org/pepatac/tutorial1_r1.fastq.gz [user@cn3200 ~]$ wget http://big.databio.org/pepatac/tutorial1_r2.fastq.gz [user@cn3200 ~]$ wget http://big.databio.org/pepatac/tutorial2_r1.fastq.gz [user@cn3200 ~]$ wget http://big.databio.org/pepatac/tutorial2_r2.fastq.gz [user@cn3200 ~]$ mv *.fastq.gz pepatac/examples/data/Configure project files:
[user@cn3200 ~]$ cd ../ [user@cn3200 ~]$ cat >my_file.sh<<'EOF' adapters: CODE: looper.command JOBNAME: looper.job_name CORES: compute.cores LOGFILE: looper.log_file TIME: compute.time MEM: compute.mem compute_packages: default: submission_template: templates/localhost_template.sub submission_command: sh EOF [user@cn3200 ~]$ export DIVCFG=$TUTORIAL/compute_config.yaml [user@cn3200 ~]$ cd templates [user@cn3200 ~]$ cat >localhost_template.sub<<'EOF' #!/bin/bash echo 'Compute node:' `hostname` echo 'Start time:' `date +'%Y-%m-%d %T'` { {CODE} } | tee {LOGFILE} --ignore-interrupts EOFInitialize refgenie:
[user@cn3200 ~]$ cd ../tools/pepatac [user@cn3200 ~]$ refgenie init -c genome_folder/genome_config.yaml [user@cn3200 ~]$ export REFGENIE=./genome_folder/genome_config.yamlRun the PEPATAC pipeline using looper:
[user@cn3200 ~]$ looper run examples/tutorial/tutorial.yaml ... Looper version: 1.3.0 Command: run /usr/local/apps/PEPATAC/0.9.16/lib/python3.8/site-packages/divvy/compute.py:150: UserWarning: The '_file_path' property is deprecated and will be removed in a future release. Use ComputingConfiguration["__internal"][_file_path] instead. os.path.dirname(self._file_path), /usr/local/apps/PEPATAC/0.9.16/lib/python3.8/site-packages/divvy/compute.py:58: UserWarning: The '_file_path' property is deprecated and will be removed in a future release. Use ComputingConfiguration["__internal"][_file_path] instead. self.config_file = self._file_path ## [1 of 2] sample: tutorial1; pipeline: PEPATAC ## [2 of 2] sample: tutorial2; pipeline: PEPATAC Writing 2 submission scripts for skipped samples Writing script to $TUTORIAL/processed/submission/PEPATAC_tutorial1.sub Writing script to $TUTORIAL/processed/submission/PEPATAC_tutorial2.sub Looper finished Samples valid for job generation: 2 of 2 Commands submitted: 0 of 2 Jobs submitted: 0The previous command produces two sbatch scripts.
Finally, submit these scripts to the cluster:
[user@cn3200 ~]$ sbatch $TUTORIAL/processed/submission/PEPATAC_tutorial1.sub [user@cn3200 ~]$ sbatch $TUTORIAL/processed/submission/PEPATAC_tutorial2.subEnd the interactive session:
[user@cn3200 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$