High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed

JUMPg is a proteogenomics software pipeline for analyzing large mass spectrometry (MS) and functional genomics datasets. The pipeline includes customized database building, tag-based database search, peptide-spectrum match filtering, and data visualization.


There may be multiple versions of JUMPg available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail jumpg

To select a module, type

module load jumpg/[ver]

where [ver] is the version of choice.

Environment variables set:

On Helix

While JUMPg will run on Helix, it will run very slowly, especially since you cannot use many cores.

Sample session:

[teacher@helix ~]$ mkdir -p /data/$USER/jumpg-helix
[teacher@helix ~]$ cd !$
[teacher@helix jumpg-helix]$ module load jumpg
[teacher@helix jumpg-helix]$ cp $JUMPG_HOME/*.params .
[teacher@helix jumpg-helix]$ cp -rL $JUMPG_HOME/data .
[teacher@helix jumpg-helix]$ sed -i "s|/usr/local/apps/jumpg/data/[^/]*|$PWD/data|" *.params
[teacher@helix jumpg-helix]$ mv data/spectra/MS_test.mzXML .
[teacher@helix jumpg-helix]$ JUMPg jump_g_v2.3.stage1.params MS_test.mzXML &> stage1.log
[teacher@helix jumpg-helix]$ cp gnm_stage1/multistage/qc_MSMS_input.txt .
[teacher@helix jumpg-helix]$ JUMPg jump_g_v2.3.stage2.params qc_MSMS_input.txt &> stage2.log
Interactive job on Biowulf

[teacher@biowulf ~]$ sinteractive --cpus-per-task=32
salloc.exe: Pending job allocation 38176399
salloc.exe: job 38176399 queued and waiting for resources
salloc.exe: job 38176399 has been allocated resources
salloc.exe: Granted job allocation 38176399
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn2969 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
[teacher@cn2969 ~]$ mkdir -p /data/$USER/jumpg-biowulf
[teacher@cn2969 ~]$ cd !$
[teacher@cn2969 jumpg-biowulf]$ module load jumpg
[+] Loading jumpg, version 2.3.1...
[teacher@cn2969 jumpg-biowulf]$ cp $JUMPG_HOME/*.params .
[teacher@cn2969 jumpg-biowulf]$ cp -rL $JUMPG_HOME/data .
[teacher@cn2969 jumpg-biowulf]$ sed -i "s/\(processors_used\s*=\s*\)[0-9]*\(.*\)/\1 ${SLURM_CPUS_PER_TASK} \2/" *.params
[teacher@cn2969 jumpg-biowulf]$ sed -i "s|/usr/local/apps/jumpg/data/[^/]*|$PWD/data|" *.params
[teacher@cn2969 jumpg-biowulf]$ mv data/spectra/MS_test.mzXML .
[teacher@cn2969 jumpg-biowulf]$ JUMPg jump_g_v2.3.stage1.params MS_test.mzXML &> stage1.log
[teacher@cn2969 jumpg-biowulf]$ cp gnm_stage1/multistage/qc_MSMS_input.txt .
[teacher@cn2969 jumpg-biowulf]$ JUMPg jump_g_v2.3.stage2.params qc_MSMS_input.txt &> stage2.log
[teacher@cn2969 jumpg-biowulf]$ ls
customizedDB  gnm_stage1  jump_g_v2.3.stage1.params  MS_test.mzXML
data      gnm_stage2  jump_g_v2.3.stage2.params

Batch job on Biowulf

This program currently does not support SLURM, so can only run on a single node at a time. A sample submission script would therefore look like the following, after setting up the params files as described above for the interactive job.


JUMPg jump_g_v2.3.stage1.params MS_test.mzXML
cp gnm_stage1/multistage/qc_MSMS_input.txt .
JUMPg jump_g_v2.3.stage2.params qc_MSMS_input.txt