iphop on Biowulf

iPHoP stands for Integrated Phage Host Prediction. It is an automated command-line pipeline designed to predict the host genus of novel bacteriophages and archaeoviruses based on their genome sequences.

  • The pipeline can be broken down into 6 main steps:
  • Step 1: Running Individual Host Prediction Tools
  • Step 2: Collect All Scores and All Distances Between Hits for Host-based Tools
  • Steps 3 and 4: Compile an Organized List of Hits for Each Virus - Tool - Candidate Host Combination
  • Step 5: Derive 3 Scores for Host-based Tools for Each Virus - Candidate Host Combination
  • Step 6: Calculate a Composite Score for Each Virus - Candidate Host Genus Combination Integrating Host-based and Phage-based Signals
    Interactive job
    [user@biowulf]$ sinteractive --gres=lscratch:10 -c 8 --mem=32g 
    salloc.exe: Pending job allocation 46116226
    salloc.exe: job 46116226 queued and waiting for resources
    salloc.exe: job 46116226 has been allocated resources
    salloc.exe: Granted job allocation 46116226
    salloc.exe: Waiting for resource configuration
    salloc.exe: Nodes cn3144 are ready for job
    [user@cn3144 ~]$ module load iphop
    [user@cn3144 ~]$ cd /data/$USER/
    [user@cn3144 ~]$ cp ${IPHOP_TEST_DATA:-none}/* .
    [user@cn3144 ~]$ iphop predict test_input_phages.fna --db_dir /fdb/iphop_db/Aug_2023_pub_rw/ --out_dir iphop_out

    Batch job
    Create a batch input file (e.g. iphop.sh). For example:

    set -e
    module load iphop
    cd /data/$USER
    iphop predict test_input_phages.fna --db_dir /fdb/iphop_db/Aug_2023_pub_rw/ --out_dir iphop_out

    sbatch [--cpus-per-task=#] [--mem=#] iphop.sh