High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
JUMPg

JUMPg is a proteogenomics software pipeline for analyzing large mass spectrometry (MS) and functional genomics datasets. The pipeline includes customized database building, tag-based database search, peptide-spectrum match filtering, and data visualization.

References:

There may be multiple versions of JUMPg available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail jumpg

To select a module, type

module load jumpg/[ver]

where [ver] is the version of choice.

Environment variables set:

On Helix

While JUMPg will run on Helix, it will run very slowly, especially since you cannot use many cores.

Sample session:

[teacher@helix ~]$ mkdir -p /data/$USER/jumpg-helix
[teacher@helix ~]$ cd !$
[teacher@helix jumpg-helix]$ module load jumpg
[teacher@helix jumpg-helix]$ cp $JUMPG_HOME/*.params .
[teacher@helix jumpg-helix]$ cp -rL $JUMPG_HOME/data .
[teacher@helix jumpg-helix]$ sed -i "s|/usr/local/apps/jumpg/data/[^/]*|$PWD/data|" *.params
[teacher@helix jumpg-helix]$ mv data/spectra/MS_test.mzXML .
[teacher@helix jumpg-helix]$ JUMPg jump_g_v2.3.stage1.params MS_test.mzXML &> stage1.log
[teacher@helix jumpg-helix]$ cp gnm_stage1/multistage/qc_MSMS_input.txt .
[teacher@helix jumpg-helix]$ JUMPg jump_g_v2.3.stage2.params qc_MSMS_input.txt &> stage2.log
Interactive job on Biowulf

[teacher@biowulf ~]$ sinteractive --cpus-per-task=32
salloc.exe: Pending job allocation 38176399
salloc.exe: job 38176399 queued and waiting for resources
salloc.exe: job 38176399 has been allocated resources
salloc.exe: Granted job allocation 38176399
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn2969 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
[teacher@cn2969 ~]$ mkdir -p /data/$USER/jumpg-biowulf
[teacher@cn2969 ~]$ cd !$
[teacher@cn2969 jumpg-biowulf]$ module load jumpg
[+] Loading jumpg, version 2.3.1...
[teacher@cn2969 jumpg-biowulf]$ cp $JUMPG_HOME/*.params .
[teacher@cn2969 jumpg-biowulf]$ cp -rL $JUMPG_HOME/data .
[teacher@cn2969 jumpg-biowulf]$ sed -i "s/\(processors_used\s*=\s*\)[0-9]*\(.*\)/\1 ${SLURM_CPUS_PER_TASK} \2/" *.params
[teacher@cn2969 jumpg-biowulf]$ sed -i "s|/usr/local/apps/jumpg/data/[^/]*|$PWD/data|" *.params
[teacher@cn2969 jumpg-biowulf]$ mv data/spectra/MS_test.mzXML .
[teacher@cn2969 jumpg-biowulf]$ JUMPg jump_g_v2.3.stage1.params MS_test.mzXML &> stage1.log
[teacher@cn2969 jumpg-biowulf]$ cp gnm_stage1/multistage/qc_MSMS_input.txt .
[teacher@cn2969 jumpg-biowulf]$ JUMPg jump_g_v2.3.stage2.params qc_MSMS_input.txt &> stage2.log
[teacher@cn2969 jumpg-biowulf]$ ls
customizedDB  gnm_stage1  jump_g_v2.3.stage1.params  MS_test.mzXML
data      gnm_stage2  jump_g_v2.3.stage2.params

Batch job on Biowulf

This program currently does not support SLURM, so can only run on a single node at a time. A sample submission script would therefore look like the following, after setting up the params files as described above for the interactive job.

#!/bin/sh

JUMPg jump_g_v2.3.stage1.params MS_test.mzXML
cp gnm_stage1/multistage/qc_MSMS_input.txt .
JUMPg jump_g_v2.3.stage2.params qc_MSMS_input.txt

Documentation