peddy is used to compare sex and familial relationships given in a PED file with those inferred from a VCF file. This is done by sampling 25000 sites plus chrX from the VCF file to estimate relatedness, heterozygosity, sex and ancestry. It uses data from the thousand genome project.
$PEDDY_TEST_DATA
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --cpus-per-task=2 --gres=lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144]$ cd /lscratch/$SLURM_JOB_ID [user@cn3144]$ module load peddy [user@cn3144]$ cp -r $PEDDY_TEST_DATA/data . [user@cn3144]$ peddy -p $SLURM_CPUS_PER_TASK --plot --prefix ceph-1463 \ data/ceph1463.peddy.vcf.gz data/ceph1463.ped 2018-05-21 08:14:22 cn4242 peddy.cli[35381] INFO Running Peddy version 0.4.8 2018-05-21 08:14:23 cn4242 peddy.cli[35381] INFO ped_check 2022-10-13 21:04:54 cn4291 peddy.peddy[83025] INFO plotting 2022-10-13 21:04:55 cn4291 peddy.cli[83025] INFO ran in 2.0 seconds 2022-10-13 21:04:55 cn4291 peddy.cli[83025] INFO het_check 2022-10-13 21:04:56 cn4291 peddy.pca[83025] INFO loaded and subsetted thousand-genomes genotypes (shape: (2504, 2724)) in 0.3 seconds 2022-10-13 21:04:56 cn4291 peddy.pca[83025] INFO ran randomized PCA on thousand-genomes samples at 2724 sites in 0.5 seconds 2022-10-13 21:04:57 cn4291 peddy.pca[83025] INFO Projected thousand-genomes genotypes and sample genotypes and predicted ancestry via SVM in 0.1 seconds 2022-10-13 21:04:57 cn4291 peddy.cli[83025] INFO ran in 2.7 seconds 2022-10-13 21:04:57 cn4291 peddy.cli[83025] INFO sex_check 2022-10-13 21:04:57 cn4291 peddy.peddy[83025] INFO sex-check: 0 skipped / 814 kept 2022-10-13 21:04:57 cn4291 peddy.cli[83025] INFO ran in 0.1 seconds [user@cn3144]$ ls -lh -rw-r--r-- 1 user group 169K May 21 08:14 ceph-1463.background_pca.json -rw-r--r-- 1 user group 2.0K May 21 08:14 ceph-1463.het_check.csv -rw-r--r-- 1 user group 18K May 21 08:14 ceph-1463.het_check.png -rw-r--r-- 1 user group 211K May 21 08:14 ceph-1463.html -rw-r--r-- 1 user group 118K May 21 08:14 ceph-1463.pca_check.png -rw-r--r-- 1 user group 13K May 21 08:14 ceph-1463.ped_check.csv -rw-r--r-- 1 user group 108K May 21 08:14 ceph-1463.ped_check.png -rw-r--r-- 1 user group 96 May 21 08:14 ceph-1463.ped_check.rel-difference.csv -rw-r--r-- 1 user group 1.7K May 21 08:14 ceph-1463.peddy.ped -rw-r--r-- 1 user group 835 May 21 08:14 ceph-1463.sex_check.csv -rw-r--r-- 1 user group 25K May 21 08:14 ceph-1463.sex_check.png [user@cn3144]$ [user@cn3144]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf]$
peddy creates several plots, data tables, and a sumamry report in html. This includes, for example, the following check on ancestry showing reported ancestry in the pedigree overlayed on a PCA of background genomes.
Create a batch input file (e.g. peddy.sh), which uses the data in $PEDDY_TEST_DATA
.
For example:
#! /bin/bash # this file is peddy.batch module load peddy/0.4.8 || exit 1 td=/usr/local/apps/peddy/TEST_DATA/data cp -r $PEDDY_TEST_DATA/data . peddy -p $SLURM_CPUS_PER_TASK --plot --prefix ceph-1463 \ data/ceph1463.peddy.vcf.gz data/ceph1463.ped
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=4 --mem=4g peddy.sh
Create a swarmfile (e.g. peddy.swarm). For example:
peddy -p $SLURM_CPUS_PER_TASK --plot --prefix fam1 fam1/fam1.vcf.gz fam1/fam1.ped peddy -p $SLURM_CPUS_PER_TASK --plot --prefix fam2 fam2/fam2.vcf.gz fam2/fam2.ped peddy -p $SLURM_CPUS_PER_TASK --plot --prefix fam3 fam3/fam3.vcf.gz fam3/fam3.ped
Submit this job using the swarm command.
swarm -f peddy.swarm -g 4 -t 4 --module peddy/0.4.8where
-g # | Number of Gigabytes of memory required for each process (1 line in the swarm command file) |
-t # | Number of threads/CPUs required for each process (1 line in the swarm command file). |
--module peddy | Loads the peddy module for each subjob in the swarm |