The biom-format python package contains a command line tool to manipulate and convert biom format files. It also includes an API for programatically manipulating biom files. It is therefore installed as both in independent application and as part of the python environments.
Allocate an interactive session and run the program. Sample session (user input in bold:
[user@biowulf]$ sinteractive
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job
[user@cn3144 ~]$ module load biom-format
[user@cn3144 ~]$ TD=/usr/local/apps/biom-format/TEST_DATA
[user@cn3144 ~]$ biom head -i $TD/phinch_testdata.biom
# Constructed from biom file
#OTU ID 0.IntakeWater.1 0.IntakeWater.3 0.IntakeWater.2 27.WaterCoralpond.2
228057 3.0 6.0 3.0 1.0 0.0
988537 0.0 0.0 0.0 2.0 0.0
89370 0.0 0.0 0.0 1.0 1.0
2562097 0.0 0.0 0.0 0.0 0.0
256904 8.0 0.0 1.0 32.0 58.0
[user@cn3144 ~]$ biom convert -i $TD/phinch_testdata.biom -o test.biom --to-hdf5
[user@cn3144 ~]$ module load hdf5
[user@cn3144 ~]$ h5ls test.biom
observation Group
sample Group
[user@cn3144 ~]$ h5ls test.biom/sample
group-metadata Group
ids Dataset {95}
matrix Group
metadata Group
[user@cn3144 ~]$ biom summarize-table -i test.biom
Num samples: 95
Num observations: 67900
Total count: 10223009
Table density (fraction of non-zero values): 0.056
Counts/sample summary:
Min: 16.0
Max: 1106184.0
Median: 65463.000
Mean: 107610.621
Std. dev.: 164313.408
Sample Metadata Categories: description; alkalinity; material; ammonium; nitrite;
LinkerPrimerSequence; sulfide; BarcodeName; InternalCode; temp;
collection_date; BarcodeSequence; salinity; phosphate; ReverseBarcode; nitrate;
ph; ReverseName; ReversePrimerSequence; Hardness; diss_oxygen
Observation Metadata Categories: taxonomy
Counts/sample detail:
0.WipesKoipondLgWaterfall.1: 16.0
0.WipesKoipondLFilter.1: 18.0
[...snip...]
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$
When loading one of the python 2.7 modules, the biom
command line tool will also become available (though the version may vary
over time), as will the API. For example
[user@cn3144 ~]$ module load python
[user@cn3144 ~]$ which biom
/usr/local/Anaconda/envs/py2.7.9/bin/biom
[user@cn3144 ~]$ python
Python 2.7.9 |Continuum Analytics, Inc.| (default, Apr 14 2015, 12:54:25)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://binstar.org
>>> import biom
>>> table = biom.load_table("test.biom")
>>> table
67900 x 95 <class 'biom.table.Table'> with 360783 nonzero entries (5% dense)