PALEOMIX on Biowulf

The PALEOMIX pipelines are a set of pipelines and tools designed to aid the rapid processing of High-Throughput Sequencing (HTS) data

References:

Documentation
Important Notes

Getting Started
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load paleomix
[+] Loading paleomix  1.3.7  on cn3144
[+] Loading singularity  4.0.3  on cn3144
[user@cn3144 ~]$ paleomix
PALEOMIX - pipelines and tools for NGS data analyses
Version: 1.3.7

Pipelines:
    paleomix bam              -- Pipeline for trimming and mapping of NGS reads.
    paleomix trim             -- Equivalent to the 'bam' pipeline, but only runs
                                 the FASTQ trimming steps.
    paleomix phylo            -- Pipeline for genotyping and phylogenetic
                                 inference from BAMs.
    paleomix zonkey           -- Pipeline for detecting F1 (equine) hybrids.

BAM/SAM tools:
    paleomix coverage         -- Calculate coverage across reference sequences
                                 or regions of interest.
    paleomix depths           -- Calculate depth histograms across reference
                                 sequences or regions of interest.
    paleomix rmdup_collapsed  -- Filters PCR duplicates for collapsed paired-
                                 ended reads generated by the AdapterRemoval
                                 tool.

VCF/GTF/BED/Pileup tools:
    paleomix vcf_filter       -- Quality filters for VCF records, similar to
                                 'vcfutils.pl varFilter'.
    paleomix vcf_to_fasta     -- Create most likely FASTA sequence from tabix-
                                 indexed VCF file.

If you make use of PALEOMIX in your work, please cite
  Schubert et al, "Characterization of ancient and modern genomes by SNP
  detection and phylogenomic and metagenomic analysis using PALEOMIX".
  Nature Protocols. 2014 May; 9(5): 1056-82. doi: 10.1038/nprot.2014.063
Generate example files
[user@cn3144 ~]$ cd /data/$USER
#Create example files within current directory
[user@cn3144 ~]$ paleomix bam example .
11:12:02 INFO Copying example project to '.'
11:12:03 INFO Sucessfully saved example in './bam_pipeline'
[user@cn3144 ~]$ cd bam_pipeline/

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. paleomix.sh). For example:

#!/bin/bash
module load paleomix
paleomix bam run --jar-root  /usr/local/apps/picard/2.23.7 makefile.yaml

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] paleomix.sh