Biowulf High Performance Computing at the NIH
PEPATAC: a modular pipeline for ATAC-seq data processing

PEPATAC is a robust pipeline for Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) built on a loosely coupled modular framework. It may be easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. It is optimized on unique features of ATAC-seq data to be fast and accurate and provides several unique analytical approaches.

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --cpus-per-task=16 --mem=32g --gres=lscratch:10
[user@cn3200 ~]$module load PEPATAC/0.9.16 
[+] Loading bedtools  2.27.1
[+] Loading bowtie  2-2.4.4
[+] Loading preseq  3.1.2
[+] Loading fastqc  0.11.9
[+] Loading samtools 1.12  ...
[+] Loading samblaster  0.1.25
[+] Loading GSL 2.4 for GCC 7.2.0 ...
[+] Loading gcc  7.3.0  ...
[+] Loading openmpi 3.0.2  for GCC 7.3.0
[+] Loading ImageMagick  7.0.8  on cn0852
[+] Loading HDF5  1.10.4
[+] Loading pandoc  2.14.0.2  on cn0852
[+] Loading R 3.5.2
[+] Loading pigz  2.4  on cn0852
[+] Loading pepatac  0.9.16
Described below are the steps to run PEPATAC on the tutorial example:
[user@cn3200 ~]$ mkdir pepatac_tutorial
[user@cn3200 ~]$ export TUTORIAL=$PWD/pepatac_tutorial
[user@cn3200 ~]$ cd pepatac_tutorial
[user@cn3200 ~]$ mkdir data genomes processed templates tools
[user@cn3200 ~]$ cd tools
[user@cn3200 ~]$ git clone https://github.com/databio/pepatac.git
Download tutorial files:
[user@cn3200 ~]$ wget http://big.databio.org/pepatac/tutorial1_r1.fastq.gz
[user@cn3200 ~]$ wget http://big.databio.org/pepatac/tutorial1_r2.fastq.gz
[user@cn3200 ~]$ wget http://big.databio.org/pepatac/tutorial2_r1.fastq.gz
[user@cn3200 ~]$ wget http://big.databio.org/pepatac/tutorial2_r2.fastq.gz

[user@cn3200 ~]$ mv *.fastq.gz pepatac/examples/data/
Configure project files:
[user@cn3200 ~]$ cd ../
[user@cn3200 ~]$ cat >my_file.sh<<'EOF'
adapters:
  CODE: looper.command
  JOBNAME: looper.job_name
  CORES: compute.cores
  LOGFILE: looper.log_file
  TIME: compute.time
  MEM: compute.mem
compute_packages:
  default:
    submission_template: templates/localhost_template.sub
    submission_command: sh
EOF

[user@cn3200 ~]$ export DIVCFG=$TUTORIAL/compute_config.yaml

[user@cn3200 ~]$ cd templates
[user@cn3200 ~]$ cat >localhost_template.sub<<'EOF'
#!/bin/bash
echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`
{
{CODE}
} | tee {LOGFILE} --ignore-interrupts
EOF
Initialize refgenie:
[user@cn3200 ~]$ cd ../tools/pepatac
[user@cn3200 ~]$ refgenie init -c genome_folder/genome_config.yaml
[user@cn3200 ~]$ export REFGENIE=./genome_folder/genome_config.yaml
Run the PEPATAC pipeline using looper:
[user@cn3200 ~]$ looper run examples/tutorial/tutorial.yaml
...
Looper version: 1.3.0
Command: run
/usr/local/apps/PEPATAC/0.9.16/lib/python3.8/site-packages/divvy/compute.py:150: UserWarning: The '_file_path' property is deprecated and will be removed in a future release. Use ComputingConfiguration["__internal"][_file_path] instead.
  os.path.dirname(self._file_path),
/usr/local/apps/PEPATAC/0.9.16/lib/python3.8/site-packages/divvy/compute.py:58: UserWarning: The '_file_path' property is deprecated and will be removed in a future release. Use ComputingConfiguration["__internal"][_file_path] instead.
  self.config_file = self._file_path
## [1 of 2] sample: tutorial1; pipeline: PEPATAC
## [2 of 2] sample: tutorial2; pipeline: PEPATAC
Writing 2 submission scripts for skipped samples
Writing script to $TUTORIAL/processed/submission/PEPATAC_tutorial1.sub
Writing script to $TUTORIAL/processed/submission/PEPATAC_tutorial2.sub

Looper finished
Samples valid for job generation: 2 of 2
Commands submitted: 0 of 2
Jobs submitted: 0
The previous command produces two sbatch scripts.

Finally, submit these scripts to the cluster:
[user@cn3200 ~]$  sbatch $TUTORIAL/processed/submission/PEPATAC_tutorial1.sub
[user@cn3200 ~]$  sbatch $TUTORIAL/processed/submission/PEPATAC_tutorial2.sub

End the interactive session:
[user@cn3200 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$