Biowulf High Performance Computing at the NIH
ncbi-toolkit on Biowulf

The NCBI C++ Toolkit is a set of executables and libraries for a multitude of sequence analysis functions.

These executables have been compiled and made available.

Documentation

Many of the executables have help functions. These can be displayed with the -help option:

$ fastq-dump -help

Usage:
  fastq-dump [options] 
  fastq-dump [options] [ -A ] 

INPUT
  -A|--accession        Replaces accession derived from  in 
                                   filename(s) and deflines (only for single 
                                   table dump) 
  --table              Table name within cSRA object, default is 
                                   "SEQUENCE" 

PROCESSING

Read Splitting                     Sequence data may be used in raw form or
                                     split into individual reads
  --split-spot                     Split spots into individual reads 

Full Spot Filters                  Applied to the full spot independently
                                     of --split-spot
  -N|--minSpotId            Minimum spot id 
  -X|--maxSpotId            Maximum spot id 
  --spot-groups <[list]>           Filter by SPOT_GROUP (member): name[,...] 
  -W|--clip                        Apply left and right clips 

... etc ...
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program.
Sample session (user input in bold):

[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load ncbi-toolkit
[user@cn3144 ~]$ gi2taxid -gi 36209385

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. ncbi-toolkit.sh). For example:

#!/bin/bash
module load ncbi-toolkit

# NOTE: This is merely a test to see that ncbi-toolkit runs correctly.
# This example may not be rational or useful.

# Create a nucleotide blast database suitable for the ncbi-toolkit version.
# In this example, we extract the top 1,000,000 lines from nt.fas.

head -1000000 /fdb/fastadb/nt.fas > nt_1M.fas
makeblastdb -in nt_1M.fas -dbtype nucl

# Now run Repeat Masker blast against this database.

rmblastn -query gi_255958152.nt.fas -db nt_1M.fas -gapopen 3 -gapextend 3

Submit this job using the Slurm sbatch command.

sbatch [--cpus-per-task=#] [--mem=#] ncbi-toolkit.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. ncbi-toolkit.swarm). For example:

igblastn -db mydb -query seq1.fas -out seq1.out
igblastn -db mydb -query seq2.fas -out seq2.out
igblastn -db mydb -query seq3.fas -out seq3.out
igblastn -db mydb -query seq4.fas -out seq4.out

Submit this job using the swarm command.

swarm -f ncbi-toolkit.swarm [-g #] [-t #] --module ncbi-toolkit
where
-g # Number of Gigabytes of memory required for each process (1 line in the swarm command file)
-t # Number of threads/CPUs required for each process (1 line in the swarm command file).
--module ncbi-toolkit Loads the ncbi-toolkit module for each subjob in the swarm

List of current executables

Judy1TablesGen                  lds_indexer                     test_id_mux
JudyLTablesGen                  lds_sample                      test_image
SRR574828-crash-test            lds_test                        test_interprocess_lock
abi-dump                        lds_unit_test                   test_lds
abi-load                        legacy_blast.pl                 test_limited_map
ace2asn                         localfinder                     test_line_reader
agp_count                       logs_splitter                   test_logrotate
agp_renumber                    makeblastdb                     test_math
agp_val_test                    makembindex                     test_multipart.cgi
agpconvert                      makeprofiledb                   test_nc_stress
align-info                      mapper_unit_test                test_nc_stress_pubmed
align_filter_unit_test          md5appendtest                   test_ncbi_buffer
align_format_unit_test          md5cp                           test_ncbi_clog_mt_ctx
aln_build                       multi_command                   test_ncbi_config
alnmgr_sample                   multireader                     test_ncbi_conn
alnmrg                          mysql_lang                      test_ncbi_conn_stream
alnvwr                          ncbi_applog                     test_ncbi_conn_stream_mt
annotwriter                     ncfetch.cgi                     test_ncbi_connutil_hit
args-test                       nenctest                        test_ncbi_connutil_misc
asn2asn                         nenctool                        test_ncbi_core
asn2fasta                       nencvalid                       test_ncbi_disp
asn2flat                        netcache_cgi_sample.cgi         test_ncbi_download
asn_assign                      netcache_client_sample          test_ncbi_dsock
asn_sample                      netcache_control                test_ncbi_file_connector
asniotest                       netschedule_client_sample       test_ncbi_ftp_connector
asnwalk_read                    netschedule_control             test_ncbi_ftp_download
asnwalk_type                    netschedule_node_sample         test_ncbi_heapmgr
asnwalk_write                   ngalign_test                    test_ncbi_hmac
autodef_demo                    nmer_repeats                    test_ncbi_http_connector
bam-load                        ns_remote_job_control           test_ncbi_http_get
bam-load.3                      ns_submit_remote_job            test_ncbi_limits
bam2graph                       nw_aligner                      test_ncbi_memory_connector
bam_test                        objects_sample                  test_ncbi_namedpipe
bamgraph_test                   objextract                      test_ncbi_namedpipe_connector
basic_sample                    objmgr_sample                   test_ncbi_null
basic_sample_lib_test           omssa2pepXML                    test_ncbi_os_unix
bdbloader_unit_test             omssacl                         test_ncbi_pipe
biosample_chk                   omssamerge                      test_ncbi_pipe_connector
bioseq_edit_sample              pacc                            test_ncbi_process
blast_dataloader_unit_test      phytree_calc_unit_test          test_ncbi_rate_monitor
blast_demo                      phytree_format_unit_test        test_ncbi_rwstream
blast_format_unit_test          pmem-test                       test_ncbi_sendmail
blast_formatter                 prefetch                        test_ncbi_service
blast_sample                    printf-test                     test_ncbi_service_connector
blast_services_unit_test        project_tree_builder            test_ncbi_socket
blast_unit_test                 psiblast                        test_ncbi_socket_connector
blastdb_aliastool               python_ncbi_dbapi_test          test_ncbi_system
blastdb_format_unit_test        qfiletest                       test_ncbi_table
blastdbcheck                    qual-recalib-stat               test_ncbi_tree
blastdbcmd                      rcexplain                       test_ncbiargs
blastdbcp                       re-compress                     test_ncbiargs_sample
blastinput_demo                 read-filter-redact              test_ncbicfg
blastinput_unit_test            readresult                      test_ncbidiag_f_mt
blastn                          refseq-load                     test_ncbidiag_mt
blastp                          regexplocdemo                   test_ncbidiag_p
blastx                          remote_app_client_sample        test_ncbidll
blobreader                      remote_blast_demo               test_ncbiexec
blobrwd                         rmblastn                        test_ncbiexpt
blobrws                         rowwritetest                    test_ncbifile
blobwriter                      rpsblast                        test_ncbimime
bm_sparse_sample                rpstblastn                      test_ncbireg_mt
bma_refiner                     sam-dump3                       test_ncbistr
bss_info                        schema-replace                  test_ncbitime
cache_demo                      score_builder_unit_test         test_ncbitime_mt
ccextract                       sdbapi_advanced_features        test_ncbiutil
cddalignview                    sdbapi_simple                   test_netcache_api
cg-load                         sdbapi_unit_test                test_netschedule_client
cgi2rcgi                        seedtop                         test_netschedule_crash
cgi_io_test                     segmasker                       test_netschedule_node
cgi_redirect                    seq_id_unit_test                test_netschedule_stress
cgi_sample.cgi                  seqalign_unit_test              test_nsstorage
cgi_session_sample.cgi          seqannot_splicer                test_objmgr
cgi_tunnel2grid.cgi             seqdb_demo                      test_objmgr_basic
cgitest                         seqdb_perf                      test_objmgr_data
clusterer                       seqdb_unit_test                 test_objmgr_gbloader
cobalt                          seqmasks_io_unit_test           test_objmgr_gbloader_mt
cobalt_unit_test                seqvec_bench                    test_objmgr_mem
compart                         sff-dump                        test_objmgr_mt
compartp                        sff-load                        test_objmgr_sv
conv_image                      soap_client_sample              test_objstore
convert2blastmask               soap_server_sample              test_param_mt
convert_seq                     socket_io_bouncer               test_plugins
copycat                         sortreadtest                    test_porter_stemming
coretest                        speedtest                       test_printf
cpgdemo                         split_cache                     test_queue_mt
csra_test_mt                    split_loader_demo               test_range_coll
ctl_lang_ftds64                 sra-dbcc                        test_rangemap
ctl_sp_databases_ftds64         sra-dflt-schema                 test_reader_gicache
ctl_sp_who_ftds64               sra-dump                        test_reader_id1
datatool                        sra-kar                         test_regexp
db_copy                         sra-pileup                      test_relloc
dbapi_advanced_features         sra-sort                        test_request_control
dbapi_bcp                       sra-stat                        test_resize_iter
dbapi_cache_admin               sra_test                        test_resource_info
dbapi_cache_test                srapath                         test_scheduler
dbapi_conn_policy               srf-load                        test_scoremat
dbapi_context_test              srsearch                        test_semaphore_mt
dbapi_cursor                    streamtest                      test_seq_entry_ci
dbapi_driver_check              struct_dp_demo                  test_seqio
dbapi_query                     struct_util_demo                test_seqmap_switch
dbapi_send_data                 sub_image                       test_seqport
dbapi_simple                    subcheck                        test_seqvector_ci
dbapi_testspeed                 tblastn                         test_serial
dbapi_unit_test                 tblastx                         test_source_mod_parser
deltablast                      test-aes-ciphers                test_sra_loader
demo_contig_assembly            test-align                      test_stacktrace
demo_gene_model                 test-block-cross-error          test_staticmap
demo_genomic_compart            test-bzip-concat                test_strdbl
demo_html                       test-cipher-speed               test_strsearch
demo_html_template              test-encapptrunc                test_sub_reg
demo_ncbi_clog                  test-encv2                      test_tar
demo_score_builder              test-error                      test_tempstr
demo_seqtest                    test-fastq-loader               test_title
disc_report                     test-float                      test_tls_object
double-VCursorCommit-test       test-headfile                   test_transmissionrw
dump-blob-boundaries            test-kdb                        test_uoconv
dustmasker                      test-kfg                        test_user_agent
ecnum_unit_test                 test-kfs                        test_utf8
entrez2client                   test-kfsmanager                 test_uttp
eutils_sample                   test-klib                       test_validator
example_value_convert           test-kpath-read-path            test_value_convert
fasthello.fcgi                  test-ktst                       test_vdbgraph_loader
fastq-dump                      test-modes                      test_vmerge
fastq-load                      test-pagefile                   test_weakref
fcgi_sample.fcgi                test-path                       test_wgs_loader
feat_unit_test                  test-ram-file-c                 testencrypt
feattree_sample                 test-ramfile                    testld
formatguess                     test-ref-list                   testreenc
formatguess_unit_test           test-ref_sub_select             time-test
gene_info_reader                test-reference-mgr              txt2kdb
gene_info_unit_test             test-resolve                    unit_test_agp_seq_entry
gene_info_writer_unit_test      test-resolver                   unit_test_alnmgr
genomic_compart_unit_test       test-sra                        unit_test_alt_sample
gi2taxid                        test-static                     unit_test_autodef
graph_test                      test-sysfile-timeout            unit_test_basic_cleanup
grid_cgi_sample.cgi             test-sysfs                      unit_test_defline
grid_cli                        test-vdb                        unit_test_entry_edit
grid_client_sample              test-vdb-resolve                unit_test_extended_cleanup
gumbelparams                    test_algo_tree                  unit_test_fasta_ostream
gumbelparams_unit_test          test_align                      unit_test_fasta_reader
helicos-load                    test_annot_ci                   unit_test_feature_table_reader
hello.cgi                       test_bam_loader                 unit_test_field_collection
hfilter                         test_basic_cleanup              unit_test_format_guess_ex
hgvs2variation                  test_biotree                    unit_test_gene_model
hooks_commented                 test_bm                         unit_test_idmapper
hooks_copy_member               test_buffer_writer              unit_test_mol_wt
hooks_copy_object               test_buffile                    unit_test_polya
hooks_copy_variant              test_bulkinfo                   unit_test_sample
hooks_read_member               test_cgi_entry_reader           unit_test_seq_loc_util
hooks_read_object               test_chainer                    unit_test_seq_translator
hooks_read_variant              test_checksum                   unit_test_validator
hooks_skip_member               test_compress                   update_blastdb.pl
hooks_skip_object               test_compress_mt                varloc-load
hooks_skip_variant              test_conn_stream_pushback       vdb-config
hooks_write_member              test_conn_tar                   vdb-copy
hooks_write_object              test_csra_loader                vdb-decrypt
hooks_write_variant             test_csra_loader_mt             vdb-dump
http_connector_hit              test_date                       vdb-encrypt
id1_fetch                       test_diag_parser                vdb-lock
id1_fetch_simple                test_diff                       vdb-passwd
id2_fetch_simple                test_edit_saver                 vdb-unlock
id_unit_test                    test_expr                       vdb-validate
idmapper                        test_fasta_round_trip           vdb_test
igblastn                        test_feat_overlap               vecscreen
igblastp                        test_feat_tree                  vsrun_sample
illumina-dump                   test_floating_point_comparison  wb-test-bam-loader
illumina-load                   test_fstream_pushback           wb-test-fastq
image_info                      test_fw                         wb-test-vxf
kar                             test_get_console_password       wgs_test
kdb2vdb                         test_grid_worker                wig2table
kdbmeta                         test_gridclient_stress          windowmasker
kqsh                            test_hash                       windowmasker_2.2.22_adapter.py
krypto-test                     test_hgvs_parser                writedb_unit_test
ktartest                        test_html                       xcompareannotsdemo
lang_query                      test_ic_client
latf-load                       test_id1_client