My blog
The ambitious challenge of finishing the human genome

The ambitious challenge of finishing the human genome

Generating a complete human genome sequence, chromosome by chromosome.

Published on by Anton Vasetenkov

The current human reference genome assembly, known as Genome Reference Consortium Human Build 38 (GRCh38 or hg38), does not contain a complete sequence of the human genome. The missing portions of the genome mostly lie in heterochromatic regions and near the centromeres and telomeres, and their sequences were never determined due to difficulties in mapping, cloning, or assembling the reads. Though absent from the reference, these sequences are known to contain genes and other functional elements that may be relevant to human health and disease.

The first complete telomere-to-telomere sequence of a human chromosome, the X chromosome, was published in July 2020. Using new sequencing technologies from PacBio and Oxford Nanopore, researchers were able to generate high-coverage, ultra-long reads that span hundreds of thousands of base pairs which helped bypass some of the challenges of the chromosome sequence assembly.

The new X chromosome sequence comes from the CHM13 (complete hydatidiform mole) cell line which is uniformly homozygous and has a 46,XX karyotype. This effectively haploid genome was used to avoid having to assemble both haplotypes of a "normal" diploid genome.

The X chromosome is linked to a number of diseases such as haemophilia, chronic granulomatous disease, and Duchenne muscular dystrophy. Closing the gaps in the X chromosome sequence assembly marks an important milestone in genomics and medical genetics.

See also

WikiPathways: A Wikipedia for biological pathways
An overview of the collaboratively edited structured pathway encyclopedia.
The RDF model of the Gene Ontology, demystified
An outline of the structure of the Gene Ontology RDF graph and ways to query it.
Scalable genomic alignment with Progressive Cactus
How progressive alignment makes it possible to efficiently align hundreds to thousands of large genomes.

Thanks for stopping by my digital playground! If you want to say hi, you can reach out to me on LinkedIn or via email. I'm always keen to chat and connect.

If you really-really like my work, you can support me by buying me a coffee.