Biowulf High Performance Computing at the NIH
hgvs on Biowulf
Quick Links

hgvs is a Python package to parse, format, validate, normalize, and map biological sequence variants according to recommendations of the Human Genome Variation Society.

This package, hgvs, is an easy-to-use Python library for parsing, representing, formatting, and mapping variants between genome, transcript, and protein sequences. The current implementation handles most (but not all) of the varnomen standard for precisely defined sequence variants. The intent is to centralize the subset of HGVS variant manipulation that is routinely used in modern, high-throughput sequencing analysis.

References:

Documentation
Important Notes

Interactive job

This application can only be run on Helix, and not on Biowulf.

[user@biowulf]$ ssh Helix 
[user@helix]$ APPS=/usr/local/apps 
[user@helix]$ export PATH=$APPS/hgvs/1.5.1/bin:$PATH 
[user@helix]$ export LD_LIBRARY_PATH=$APPS/hgvs/1.5.1/lib:$LD_LIBRARY_PATH 
[user@helix]$ export PYTHONPATH=$APPS/hgvs/1.5.1/lib/pythin3.6/site-packages:$PYTHONPATH 
[user@helix]$ export SEQREPO_ROOT_DIR=/fdb/hgvs/seqrepo 
[user@helix]$ hgvs-shell 
...
Python 3.6.10 |Anaconda, Inc.| (default, May  8 2020, 02:54:21)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.14.0 -- An enhanced Interactive Python. Type '?' for help.

############################################################################
hgvs-shell -- interactive hgvs
hgvs version: 1.5.1
data provider url: postgresql://anonymous:anonymous@uta.biocommons.org/uta/uta_20180821
schema_version: 1.1
data_version: uta_20180821
sequences source: bioutils.seqfetcher (network fetching)

The following variables are defined:
* global_config
* hp, parser, hgvs_parser -- Parser instance
* hdp, hgvs_data_provider -- UTA data provider instance
* vm, variant_mapper, hgvs_variant_mapper -- VariantMapper instance
* am37, hgvs_assembly_mapper_37 -- GRCh37 Assembly Mapper instance
* am38, projector, hgvs_assembly_mapper_38 -- GRCh38 Assembly Mapper instances
* hn, normalizer, hgvs_normalizer -- Normalizer instance
* hv, validator, hgvs_validator) -- Validator instance

The following functions are available:
  * parse, normalize, validate
  * g_to_c, g_to_n, g_to_t,
  * c_to_g, c_to_n, c_to_p,
  * n_to_c, n_to_g,
  * t_to_g,
  * get_relevant_transcripts

When submitting bug reports, include the version header shown above
and use these variables/variable names whenever possible.
Parrsing and Formatting:
In [1]: import hgvs.parser

In [2]: hgvs_g = 'NC_000007.13:g.36561662C>T'

In [3]: hgvs_c = 'NM_001637.3:c.1582G>A'

In [4]: hp = hgvs.parser.Parser()

In [5]: var_g = hp.parse_hgvs_variant(hgvs_g)

In [6]: var_g
Out[6]: SequenceVariant(ac=NC_000007.13, type=g, posedit=36561662C>T, gene=None)

In [7]: var_g.posedit.pos.start
Out[7]: SimplePosition(base=36561662, uncertain=False)
Projecting ("Mapping") variants between aligned genome and transcript sequences:
In [8]: import hgvs.dataproviders.uta

In [9]: import hgvs.assemblymapper

# initialize the mapper for GRCh37 with splign-based alignments

In [10]: hdp = hgvs.dataproviders.uta.connect()

In [11]: am = hgvs.assemblymapper.AssemblyMapper(hdp,assembly_name='GRCh37', alt_aln_metho
    ...: d='splign',replace_reference=True)

# identify transcripts that overlap this genomic variant

In [12]: transcripts = am.relevant_transcripts(var_g)

In [13]: sorted(transcripts)
Out[13]: ['NM_001177506.1', 'NM_001177507.1', 'NM_001637.3']

# map genomic variant to one of these transcripts

In [14]: var_c = am.g_to_c(var_g, 'NM_001637.3')

In [15]: var_c
Out[15]: SequenceVariant(ac=NM_001637.3, type=c, posedit=1582G>A, gene=None)

In [16]: str(var_c)
Out[16]: 'NM_001637.3:c.1582G>A'

# CDS coordinates use BaseOffsetPosition to support intronic offsets

In [17]: var_c.posedit.pos.start
Out[17]: BaseOffsetPosition(base=1582, offset=0, datum=Datum.CDS_START, uncertain=False)
End the interactive session:
In [18]:  quit()

[user@helix ~]$ exit
[user@biowulf ~]$