PALEOMIX on Biowulf

Quick Links

The PALEOMIX pipelines are a set of pipelines and tools designed to aid the rapid processing of High-Throughput Sequencing (HTS) data

References:

Schubert M, Ermini L, Sarkissian CD, Jónsson H, Ginolhac A, Schaefer R, Martin MD, Fernández R, Kircher M, McCue M, Willerslev E, and Orlando L. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX Nat Protoc. 2014 May;9(5):1056-82. doi: 10.1038/nprot.2014.063. Epub 2014 Apr 10

Schubert M, Mashkour M, Gaunitz C, Fages A, Seguin-Orlando A, Sheikhi S, Alfarhan AH, Alquraishi SA, Al-Rasheid KAS, Chuang R, Ermini L, Gamba C, Weinstock J, Vedat O, and Orlando L. Zonkey: A simple, accurate and sensitive pipeline to genetically identify equine F1-hybrids in archaeological assemblages Journal of Archaeological Science. 2007 Feb; 78:147-157

Documentation

PALEOMIX Main Site

Important Notes

Module Name: paleomix (see the modules page for more information)
Please use --jar-root /usr/local/apps/picard/2.23.7 for correct picard.jar path
Currently only the Zonkey pipeline is installed on biowulf, if you wish to use the other pipelines, please email us at staff@hpc.nih.gov

Getting Started

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load paleomix
[+] Loading paleomix  1.3.7  on cn3144
[+] Loading singularity  4.0.3  on cn3144
[user@cn3144 ~]$ paleomix
PALEOMIX - pipelines and tools for NGS data analyses
Version: 1.3.7

Pipelines:
    paleomix bam              -- Pipeline for trimming and mapping of NGS reads.
    paleomix trim             -- Equivalent to the 'bam' pipeline, but only runs
                                 the FASTQ trimming steps.
    paleomix phylo            -- Pipeline for genotyping and phylogenetic
                                 inference from BAMs.
    paleomix zonkey           -- Pipeline for detecting F1 (equine) hybrids.

BAM/SAM tools:
    paleomix coverage         -- Calculate coverage across reference sequences
                                 or regions of interest.
    paleomix depths           -- Calculate depth histograms across reference
                                 sequences or regions of interest.
    paleomix rmdup_collapsed  -- Filters PCR duplicates for collapsed paired-
                                 ended reads generated by the AdapterRemoval
                                 tool.

VCF/GTF/BED/Pileup tools:
    paleomix vcf_filter       -- Quality filters for VCF records, similar to
                                 'vcfutils.pl varFilter'.
    paleomix vcf_to_fasta     -- Create most likely FASTA sequence from tabix-
                                 indexed VCF file.

If you make use of PALEOMIX in your work, please cite
  Schubert et al, "Characterization of ancient and modern genomes by SNP
  detection and phylogenomic and metagenomic analysis using PALEOMIX".
  Nature Protocols. 2014 May; 9(5): 1056-82. doi: 10.1038/nprot.2014.063

Generate example files

[user@cn3144 ~]$ cd /data/$USER
#Create example files within current directory
[user@cn3144 ~]$ paleomix bam example .
11:12:02 INFO Copying example project to '.'
11:12:03 INFO Sucessfully saved example in './bam_pipeline'
[user@cn3144 ~]$ cd bam_pipeline/

Batch job

Most jobs should be run as batch jobs.

Create a batch input file (e.g. paleomix.sh). For example:

#!/bin/bash
module load paleomix
paleomix bam run --jar-root  /usr/local/apps/picard/2.23.7 makefile.yaml

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] paleomix.sh