High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed


PROVEAN (PROtein Variation Effect ANalyzer) is a software tool which predicts whether an amino acid substitution or indel has an impact on the biological function of a protein. PROVEAN is useful for filtering sequence variants to identify nonsynonymous or indel variants that are predicted to be functionally important. The performance of PROVEAN is comparable to popular tools such as SIFT or PolyPhen-2.


How to Use

PROVEAN uses environment modules. Type

module load PROVEAN

at the prompt.

Here is an example using files supplied with the PROVEAN installation:

$ module load PROVEAN
$ provean.sh -q P04637.fasta -v P04637.var --save_supporting_set P04637.sss

How to run on Biowulf

NOTE 1: PROVEAN uses the NCBI nr blast database. When a large number of PROVEAN jobs are to be run simultaneously, it is best to make a local copy of the BLAST nr database. This can be done by including the option --local_nr with the provean.sh commandline. There is no reason to use the --local_nr option for one or two jobs, however, as the time taken to copy the database may outweigh the time needed to complete the analysis.

NOTE 2: PROVEAN is a multithreaded program. The number of threads can be changed by setting the option --num_threads. By default the program only uses 1 thread.

As a batch job

Create a batch script PROVEAN.sh:

module load PROVEAN
provean.sh -q myfasta.fasta -v myfasta.var --save_supporting_set myfasta.sss --num_threads 24 > myfasta.out 2>&1

Then submit the job:

$ sbatch --cpus-per-task=24 PROVEAN.sh

Note that the option --number_threads (24) must match the number of cpus allocated (--cpus-per-task=24).

As a swarm job

Create a swarmfile containing commandlines (for example, the file is named "swarmfile"),

provean.sh -q seq1.fasta -v seq1.var --local_nr --save_supporting_set seq1.sss
provean.sh -q seq2.fasta -v seq2.var --local_nr --save_supporting_set seq2.sss
provean.sh -q seq3.fasta -v seq3.var --local_nr --save_supporting_set seq3.sss
provean.sh -q seq4.fasta -v seq4.var --local_nr --save_supporting_set seq4.sss

then submit it to swarm. Make sure to include --module PROVEAN to set your environment to use PROVEAN. --local_nr is included because by definition swarm is for a large number of PROVEAN jobs.

swarm --module PROVEAN --file swarmfile