From the Encode documentation:
This pipeline is designed for automated end-to-end quality control and processing of ATAC-seq or DNase-seq data. The pipeline can be run on compute clusters with job submission engines or stand alone machines. It inherently makes uses of parallelized/distributed computing.
$EASP_BACKEND_CONF
: configuration for local backend$EASP_WFOPTS
: singularity backend opts (Versions > 1.0 only)$EASP_WDL
: WDL file defining the workflow$EASP_TEST_DATA
: Input data for example belowAllocate an interactive session and run the program. Sample session:
A note about resource allocation:
"atac.bowtie2_cpu"
in the input json to the number of CPUs you want
bowtie2 to use. Usually 8 or so.NUM_CONCURRENT_TASKS * atac.bowtie2_cpu
CPUs20GB * NUM_CONCURRENT_TASKS
for big samples and
10GB * NUM_CONCURRENT_TASKS
for small samplesWDL based workflows need a json file to define input and settings for a workflow run. In this example, we will use the 76nt data from ENCODE sample ENCSR356KRQ (keratinocyte). This includes 2 and 6 fastq files respectively for each of 2 replicates.
Continues to use the v3 annotation. However, caper apparently changed significantly so you should backup your old caper configuration and create fresh config files for this version.
[user@biowulf]$ sinteractive --cpus-per-task=8 --mem=20g --gres=lscratch:30 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ wd=$PWD # so we can copy results back later [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ module load encode-atac-seq-pipeline/2.2.0 [user@cn3144]$ cp -Lr ${EASP_TEST_DATA:-none}/* . [user@cn3144]$ tree . ├── ENCSR356KRQ_subsampled.json.2.2.0 └── input └── ENCSR356KRQ ├── ENCFF007USV.subsampled.400.fastq.gz ├── ENCFF031ARQ.subsampled.400.fastq.gz ├── ENCFF106QGY.subsampled.400.fastq.gz ├── ENCFF193RRC.subsampled.400.fastq.gz ├── ENCFF248EJF.subsampled.400.fastq.gz ├── ENCFF341MYG.subsampled.400.fastq.gz ├── ENCFF366DFI.subsampled.400.fastq.gz ├── ENCFF368TYI.subsampled.400.fastq.gz ├── ENCFF573UXK.subsampled.400.fastq.gz ├── ENCFF590SYZ.subsampled.400.fastq.gz ├── ENCFF641SFZ.subsampled.400.fastq.gz ├── ENCFF734PEQ.subsampled.400.fastq.gz ├── ENCFF751XTV.subsampled.400.fastq.gz ├── ENCFF859BDM.subsampled.400.fastq.gz ├── ENCFF886FSC.subsampled.400.fastq.gz ├── ENCFF927LSG.subsampled.400.fastq.gz └── hg38.tsv [user@cn3144]$ cat ENCSR356KRQ_subsampled.json.2.2.0 { "atac.pipeline_type" : "atac", "atac.genome_tsv" : "/fdb/encode-atac-seq-pipeline/v3/hg38/hg38.tsv", "atac.fastqs_rep1_R1" : [ "input/ENCSR356KRQ/ENCFF341MYG.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF106QGY.subsampled.400.fastq.gz" ], "atac.fastqs_rep1_R2" : [ "input/ENCSR356KRQ/ENCFF248EJF.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF368TYI.subsampled.400.fastq.gz" ], "atac.fastqs_rep2_R1" : [ "input/ENCSR356KRQ/ENCFF641SFZ.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF751XTV.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF927LSG.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF859BDM.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF193RRC.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF366DFI.subsampled.400.fastq.gz" ], "atac.fastqs_rep2_R2" : [ "input/ENCSR356KRQ/ENCFF031ARQ.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF590SYZ.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF734PEQ.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF007USV.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF886FSC.subsampled.400.fastq.gz", "input/ENCSR356KRQ/ENCFF573UXK.subsampled.400.fastq.gz" ], "atac.paired_end" : true, "atac.auto_detect_adapter" : true, "atac.enable_xcor" : true, "atac.title" : "ENCSR356KRQ (subsampled 1/400)", "atac.description" : "ATAC-seq on primary keratinocytes in day 0.0 of differentiation" }
In this example the pipeline will only be run locally - i.e. it will not submit tasks as slurm jobs. Follow the caper docs to set up a config file for slurm submission. This has to be done only once.
[user@cn3144]$ [[ -d ~/.caper ]] && mv ~/.caper ~/caper.$(date +%F).bak # back up old caper config [user@cn3144]$ mkdir -p ~/.caper && caper init local [user@cn3144]$ # note the need for --singularity in this version [user@cn3144]$ caper run $EASP_WDL -i ENCSR356KRQ_subsampled.json.2.2.0 --singularity [...much output...] This workflow ran successfully. There is nothing to troubleshoot
This version of the pipeline comes with a tool to copy and organize pipeline output.
[user@cn3144]$ ls atac a0fb9f58-ede3-4c02-9bcc-26d21ab5ccbb [user@cn3144]$ croo --method copy --out-dir=${wd}/ENCSR889WQX \ atac/a0fb9f58-ede3-4c02-9bcc-26d21ab5ccbb/metadata.json
Create a batch input file. For example the following batch job will run a local job (assuming the caper config file is set up correctly):
#! /bin/bash wd=$PWD module load encode-atac-seq-pipeline/2.2.0 || exit 1 cd /lscratch/$SLURM_JOB_ID mkdir input cp -rL $EASP_TEST_DATA/* . caper run $EASP_WDL -i ENCSR356KRQ_subsampled.json.2.2.0 rc=$? croo --method copy --out-dir=${wd}/ENCSR356KRQ \ atac/*/metadata.json exit $rc
Submit this job using the Slurm sbatch command.
sbatch --time=4:00:00 --cpus-per-task=8 --mem=20g --gres=lscratch:50 encode-atac-seq-pipeline.sh