High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
biom-format on Biowulf & Helix

Description

The biom-format python package contains a command line tool to manipulate and convert biom format files. It also includes an API for programatically manipulating biom files. It is therefore installed as both in independent application and as part of the python environments (2.7.X).

References

Web sites

On Helix

If only the biom command line tool for manipulating biom files is needed load the biom-format module

helix$ module load biom-format
And use it to inspect, modify, or convert biom format files
helix$ TD=/usr/local/apps/biom-format/TEST_DATA
helix$ biom head -i $TD/phinch_testdata.biom
# Constructed from biom file
#OTU ID 0.IntakeWater.1 0.IntakeWater.3 0.IntakeWater.2 27.WaterCoralpond.2 
228057  3.0     6.0     3.0     1.0     0.0
988537  0.0     0.0     0.0     2.0     0.0
89370   0.0     0.0     0.0     1.0     1.0
2562097 0.0     0.0     0.0     0.0     0.0
256904  8.0     0.0     1.0     32.0    58.0
helix$ biom convert -i $TD/phinch_testdata.biom -o test.biom --to-hdf5
helix$ module load hdf5
helix$ h5ls test.biom
observation              Group
sample                   Group
helix$ h5ls test.biom/sample
group-metadata           Group
ids                      Dataset {95}
matrix                   Group
metadata                 Group
helix$ biom summarize-table -i test.biom
Num samples: 95
Num observations: 67900
Total count: 10223009
Table density (fraction of non-zero values): 0.056

Counts/sample summary:
 Min: 16.0
 Max: 1106184.0
 Median: 65463.000
 Mean: 107610.621
 Std. dev.: 164313.408
 Sample Metadata Categories: description; alkalinity; material; ammonium; nitrite; 
  LinkerPrimerSequence; sulfide; BarcodeName; InternalCode; temp; 
  collection_date; BarcodeSequence; salinity; phosphate; ReverseBarcode; nitrate; 
  ph; ReverseName; ReversePrimerSequence; Hardness; diss_oxygen
 Observation Metadata Categories: taxonomy

Counts/sample detail:
0.WipesKoipondLgWaterfall.1: 16.0
0.WipesKoipondLFilter.1: 18.0
[...snip...]

When loading one of the python 2.7 modules, the biom command line tool will also become available (though the version may vary over time), as will the API. For example

helix$ module load python/2.7.9
helix$ which biom
/usr/local/Anaconda/envs/py2.7.9/bin/biom
helix$ python
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> import biom
>>> table = biom.load_table("test.biom")
>>> table
67900 x 95 <class 'biom.table.Table'> with 360783 nonzero entries (5% dense)
Interactive job on Biowulf

For more intense processing please use an interactive session on the cluster. For example

b2$ sinteractive
node$ module load biom-format
node$ biom convert -i large.biom -o large_hdf5.biom --to-hdf5
Documentation