PhIP-Stat: analysis tools for PhIP-seq experiments

PhIP-Stat is a set of analysis tools for tools for PhIP-seq experiments. It allows for processing of PhIP-Seq raw data.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive  --mem=4g  --gres=gpu:p100:1
[user@cn3316 ~]$ module load phip-stat  
[+] Loading phip-stat  0.3.0
Available executable phip can be used as follows:
[user@cn3316 ~]$ phip -h 
Usage: phip [OPTIONS] COMMAND [ARGS]...

  phip -- PhIP-seq analysis tools

Options:
  -h, --help  Show this message and exit.

Commands:
  align-parts                  Align fastq files to peptide reference.
  call-hits                    Call hits at specified FDR using a heuristic.
  clipped-factorization-model  Fit matrix factorization model.
  compute-counts               Compute counts from aligned bam file.
  compute-pvals                Compute p-values from counts.
  count-exact-matches          Match reads to reference exactly.
  gamma-poisson-model          Fit a gamma-poisson model.
  gen-covariates               Compute covariates for input to stat model.
  merge-columns                Merge tab-delimited files.
  merge-kallisto-tpm           Merge kallisto abundance results.
  normalize-counts             Normalize count matrix.
  split-fastq                  Split fastq files into smaller chunks.
  truncate-fasta               Truncate each sequence of a fasta file.
  zip-reads-and-barcodes       Zip reads with barcodes and split into files.
Copy test scripts to the current folder:
[user@cn3316 ~]$ cp $PHIPSTAT_TEST/* . 
Run a test:
[user@cn3316 ~]$ python test_speed.py

rmalized to reads-per-million.
*** Iteration     0 ***
  Pseudocount       : 0.0044
  Hit threshold     : 1.9492 (=10**0.2899)
  Clones w/ a hit   : 383 (0.4%)
  Median hits/sample: 2
  ... min/mean/max  : 0.0000/1.9350/6.0000
  Total hits        : 387
  Elapsed           : 7.9 sec.
*** Iteration     1 ***
  Pseudocount       : 229.4456
  Hit threshold     : 0.0131 (=10**-1.8822)
  Clones w/ a hit   : 1000 (1.0%)
  Median hits/sample: 5
  ... min/mean/max  : 0.0000/5.0100/13.0000
  Total hits        : 1002
  Elapsed           : 15.6 sec.
*** Iteration     2 ***
  Pseudocount       : 189950.5039
  Hit threshold     : 0.0000 (=10**-4.6806)
  Clones w/ a hit   : 991 (1.0%)
  Median hits/sample: 5
  ... min/mean/max  : 0.0000/4.9650/11.0000
  Total hits        : 993
  Elapsed           : 23.2 sec.
...
*** Iteration     7 ***
  Pseudocount       : 0.6585
  Hit threshold     : 0.2680 (=10**-0.5719)
  Clones w/ a hit   : 1008 (1.0%)
  Median hits/sample: 5
  ... min/mean/max  : 0.0000/5.0600/13.0000
  Total hits        : 1012
  Elapsed           : 62.2 sec.

*** HIT CALLING RESULTS ***
  Pseudocount       : 0.2772
  Hit threshold     : 0.3796 (=10**-0.4207)
  Clones w/ a hit   : 1030 (1.0%)
  Median hits/sample: 5
  ... min/mean/max  : 0.0000/5.1750/13.0000
  Total hits        : 1035
  Elapsed           : 70.0 sec.
SPEED BENCHMARK
Results:
hit_calling    70.728838
dtype: float64
**** hit_calling ****
         2402959 function calls (2364536 primitive calls) in 70.727 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'pop' of 'set' objects}
        1    0.000    0.000    0.000    0.000 /usr/local/apps/release/20200316/lib/python3.6/site-packages/numpy/core/numeric.py:2283(_array_equal_dispatcher)
        1    0.000    0.000    0.000    0.000 /usr/local/apps/release/20200316/lib/python3.6/site-packages/numpy/core/multiarray.py:1078(putmask)
        9    0.000    0.000    0.000    0.000 /usr/local/apps/release/20200316/lib/python3.6/site-packages/pandas/core/indexes/base.py:609()
...
     1872    0.013    0.000   29.741    0.016 /usr/local/apps/release/20200316/lib/python3.6/site-packages/pandas/core/frame.py:2988(_set_item)
     1872    0.014    0.000   29.772    0.016 /usr/local/apps/release/20200316/lib/python3.6/site-packages/pandas/core/frame.py:2922(__setitem__)
18824/10533    0.718    0.000   30.495    0.003 {built-in method numpy.core._multiarray_umath.implement_array_function}
        8    0.007    0.001   62.159    7.770 /usr/local/apps/phip-stat/0.3.0/phip-stat/phip/hit_calling.py:151(function_to_minimize)
     10/2    0.011    0.001   62.167   31.084 /usr/local/apps/release/20200316/lib/python3.6/site-packages/scipy/optimize/optimize.py:1753(_minimize_scalar_bounded)
     10/2    0.000    0.000   62.167   31.084 /usr/local/apps/release/20200316/lib/python3.6/site-packages/scipy/optimize/_minimize.py:639(minimize_scalar)
        9    9.774    1.086   68.957    7.662 /usr/local/apps/phip-stat/0.3.0/phip-stat/phip/hit_calling.py:187(hits_at_specified_pseudocount)
        1    0.071    0.071   70.727   70.727 /usr/local/apps/phip-stat/0.3.0/phip-stat/phip/hit_calling.py:11(do_hit_calling)
Exiting the interactive session:
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$