Percolator on Biowulf

Percolator is a software package for postprocessing of shotgun proteomics data.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. This example runs through the test data provided by the developer.
Sample session (user input in bold):

[user@biowulf percolator]$ sinteractive -c2 --mem=4g --gres=lscratch:10
salloc.exe: Pending job allocation 11290667
salloc.exe: job 11290667 queued and waiting for resources
salloc.exe: job 11290667 has been allocated resources
salloc.exe: Granted job allocation 11290667
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0863 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
error: unable to open file /tmp/slurm-spank-x11.11290667.0
slurmstepd: error: x11: unable to read DISPLAY value

[user@cn0863 percolator]$ cd /lscratch/$SLURM_JOB_ID

[user@cn0863 11290667]$ module load percolator 
[+] Loading percolator  3.6.5  on cn4268

[user@cn0863 11290667]$ mkdir test; cd test 

[user@cn0863 11290667]$ tar xf $PERCOLATOR_DATA/yeast-01.sqt.tar.gz 

[user@cn0863 11290667]$ sqt2pin -o pin.tab yeast-01.sqt yeast-01.shuffled.sqt 
Written by Lukas Käll (lukas.kall@scilifelab.se) in the
School of Biotechnology at KTH - Royal Institute of Technology, Stockholm.
Issued command:
sqt2pin -o pin.tab yeast-01.sqt yeast-01.shuffled.sqt
on biowulf.nih.gov
Reading yeast-01.sqt
Reading yeast-01.shuffled.sqt

[user@cn0863 11290667]$ percolator -v 1 -X pout.xml pin.tab > yeast-01.psms 
Percolator version 3.06.05, Build Date Feb 15 2024 13:55:38
Copyright (c) 2006-9 University of Washington. All rights reserved.
Written by Lukas Käll (lukall@u.washington.edu) in the
Department of Genome Sciences at the University of Washington.
Issued command:
percolator -v 1 -X pout.xml pin.tab
Started Thu Feb 15 09:56:45 2024
 on biowulf.nih.gov
Hyperparameters: selectionFdr=0.01, Cpos=0, Cneg=0, maxNiter=10
Finding protein decoy prefix for pin.tab
Using protein decoy prefix "random_"
Separate target and decoy search inputs detected, using mix-max method.
Selecting Cpos by cross-validation.
Selecting Cneg by cross-validation.
Found 7004 test set positives with q<0.01 in initial direction
---Training with Cpos selected by cross validation, Cneg selected by cross validation, initial_fdr=0.01, fdr=0.01
Found 11446 test set PSMs with q<0.01.
Tossing out "redundant" PSMs keeping only the best scoring PSM for each unique peptide.
Selecting pi_0=0.86749
Calculating q values.
New pi_0 estimate on final list yields 7371 target peptides with q<0.01.
Calculating posterior error probabilities (PEPs).

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. percolator.sh). For example:

#!/bin/bash
set -e
module load percolator
percolator -X pout.xml pin.tab > out.psms

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] percolator.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. percolator.swarm). For example:

percolator -X pout1.xml pin1.tab > out1.psms
percolator -X pout2.xml pin2.tab > out2.psms
percolator -X pout3.xml pin3.tab > out3.psms
percolator -X pout4.xml pin4.tab > out4.psms

Submit this job using the swarm command.

swarm -f percolator.swarm [-g #] [-t #] --module percolator
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module percolator Loads the percolator module for each subjob in the swarm