Generating a complete human genome sequence, chromosome by chromosome.
The current human reference genome assembly, known as Genome Reference Consortium Human Build 38 (GRCh38 or hg38), does not contain a complete sequence of the human genome. The missing portions of the genome mostly lie in heterochromatic regions and near the centromeres and telomeres, and their sequences were never determined due to difficulties in mapping, cloning, or assembling the reads. Though absent from the reference, these sequences are known to contain genes and other functional elements that may be relevant to human health and disease.
The first complete telomere-to-telomere sequence of a human chromosome, the X chromosome, was published in July 2020. Using new sequencing technologies from PacBio and Oxford Nanopore, researchers were able to generate high-coverage, ultra-long reads that span hundreds of thousands of base pairs which helped bypass some of the challenges of the chromosome sequence assembly.
The new X chromosome sequence comes from the CHM13 (complete hydatidiform mole) cell line which is uniformly homozygous and has a 46,XX karyotype. This effectively haploid genome was used to avoid having to assemble both haplotypes of a "normal" diploid genome.
The X chromosome is linked to a number of diseases such as haemophilia, chronic granulomatous disease, and Duchenne muscular dystrophy. Closing the gaps in the X chromosome sequence assembly marks an important milestone in genomics and medical genetics.