NIH HPC News & Announcements
Biowulf 20th anniversary seminar series: Videocast now available
Date: 04 March 2019 11:03:13
From: Susan Chacko
For those who missed the excellent talks at the opening session of the
Biowulf 20th anniversary seminar series last Thursday, the session is
now available on the NIH Videocast Archive:
https://videocast.nih.gov/Summary.asp?File=27344&bhcp=1
Talks:
Biowulf at 20: Celebrating Two Decades of Supporting Biomedical Computing in the NIH IRP
(Andy Baxevanis, Director of Biomedical Computing, NIH IRP and Senior Scientist, NHGRI)
State of the Cluster: Past, Present and Future
(Steven Fellini, Biowulf Architect)
Telomere-to-telomere assembly of a complete human X chromosome
(Sergey Koren, NHGRI)
Abstract: Release of the first human genome assembly was a landmark
achievement, and after nearly two decades of improvements, the current
human reference genome (GRCh38) is the most accurate and complete
vertebrate genome ever produced. However, no one chromosome has yet been
finished end to end, and hundreds of gaps persist across the genome.
These unresolved regions include segmental duplications, ribosomal rRNA
gene arrays, and satellite arrays that harbor unexplored variation of
unknown consequence. We aim to finish these remaining regions and
generate the first truly complete assembly of a human genome.
Here we announce a whole-genome de novo assembly that surpasses the
continuity of GRCh38, along with the first complete,
telomere-to-telomere assembly of a human X chromosome. In total, we
collected 40X coverage of ultra-long Oxford Nanopore sequencing for the
CHM13hTERT cell line, including 44 Gb of sequence in reads >100 kb and a
maximum read length exceeding 1 Mb. This unprecedented coverage of
ultra-long reads enabled the resolution of most repeats in the genome,
including large fractions of the centromeric satellite arrays and short
arms of the acrocentrics. A de novo assembly combining this nanopore
data with 70X of existing PacBio data achieved an NG50 contig size of 75
Mb (compared to 56 Mb for GRCh38), with some chromosomes broken only at
the centromere. Using this assembly as a basis, we chose to manually
finish the X chromosome. The few unresolved segmental duplications were
assembled using ultra-long reads spanning the individual copies, and the
~2.7 Mbp X centromere was assembled by identifying unique variants
within the array and using these to anchor overlapping ultra-long reads.
These results demonstrate that it is now possible to finish entire human
chromosomes without gaps, and our future work will focus on completing
and validating the remainder of the genome.
########################################################################
Please contact staff@hpc.nih.gov with any questions about the NIH HPC Systems
[Last 12 months of HPC announcements]