peddy on Biowulf

peddy is used to compare sex and familial relationships given in a PED file with those inferred from a VCF file. This is done by sampling 25000 sites plus chrX from the VCF file to estimate relatedness, heterozygosity, sex and ancestry. It uses data from the thousand genome project.


  • Brent S. Pedersen, Aaron R. Quinlan. Who’s Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy. Am. J. Hum. Genetics 2017, 3:406-413. PubMed |  PMC |  Journal
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --cpus-per-task=2 --gres=lscratch:10
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144]$ cd /lscratch/$SLURM_JOB_ID
[user@cn3144]$ cp -r $PEDDY_TEST_DATA/data .
[user@cn3144]$ peddy -p $SLURM_CPUS_PER_TASK --plot --prefix ceph-1463 \
                     data/ceph1463.peddy.vcf.gz data/ceph1463.ped
2018-05-21 08:14:22 cn4242 peddy.cli[35381] INFO Running Peddy version 0.3.1
2018-05-21 08:14:23 cn4242 peddy.cli[35381] INFO ped_check
[user@cn3144]$ ls -lh
-rw-r--r-- 1 user group 169K May 21 08:14 ceph-1463.background_pca.json
-rw-r--r-- 1 user group 2.0K May 21 08:14 ceph-1463.het_check.csv
-rw-r--r-- 1 user group  18K May 21 08:14 ceph-1463.het_check.png
-rw-r--r-- 1 user group 211K May 21 08:14 ceph-1463.html
-rw-r--r-- 1 user group 118K May 21 08:14 ceph-1463.pca_check.png
-rw-r--r-- 1 user group  13K May 21 08:14 ceph-1463.ped_check.csv
-rw-r--r-- 1 user group 108K May 21 08:14 ceph-1463.ped_check.png
-rw-r--r-- 1 user group   96 May 21 08:14 ceph-1463.ped_check.rel-difference.csv
-rw-r--r-- 1 user group 1.7K May 21 08:14 ceph-1463.peddy.ped
-rw-r--r-- 1 user group  835 May 21 08:14 ceph-1463.sex_check.csv
-rw-r--r-- 1 user group  25K May 21 08:14 ceph-1463.sex_check.png


[user@cn3144]$ exit
salloc.exe: Relinquishing job allocation 46116226

peddy creates several plots, data tables, and a sumamry report in html. This includes, for example, the following check on ancestry showing reported ancestry in the pedigree overlayed on a PCA of background genomes.

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g., which uses the data in $PEDDY_TEST_DATA. For example:

#! /bin/bash
# this file is peddy.batch
module load peddy/0.2.9 || exit 1

cp -r $PEDDY_TEST_DATA/data .
peddy -p $SLURM_CPUS_PER_TASK --plot --prefix ceph-1463 \
    data/ceph1463.peddy.vcf.gz data/ceph1463.ped

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=4 --mem=4g
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. peddy.swarm). For example:

peddy -p $SLURM_CPUS_PER_TASK --plot --prefix fam1 fam1/fam1.vcf.gz fam1/fam1.ped
peddy -p $SLURM_CPUS_PER_TASK --plot --prefix fam2 fam2/fam2.vcf.gz fam2/fam2.ped
peddy -p $SLURM_CPUS_PER_TASK --plot --prefix fam3 fam3/fam3.vcf.gz fam3/fam3.ped

Submit this job using the swarm command.

swarm -f peddy.swarm -g 4 -t 4 --module peddy/0.2.9
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module peddy Loads the peddy module for each subjob in the swarm