Building a beautiful Pangenome
Cool tech and ingenuity are building a beautiful Pangenome. It’s a reference genome representing the full spectrum of human genetic variation.
For twenty years, geneticists at the Human Genome Project have built the Reference Genome to give “the world a resource of detailed information about the structure, organization, and function of the complete set of human genes.”
Looking into the nano-spaces of our genome to get precious information has been tough because DNA is bundled tightly into chromosomes and packed into the tiny nucleus in the cell. Especially difficult to unravel are the telomeres and centromeres.
Telomeres cap the end of our chromosomes. Centromeres are the knotted-looking section in the middle. Both have thousands of tightly wrapped, repeated segments of DNA and hold essential information.
During the past two decades, scientists used NexGen technology to “see” and “read” the exact ATCG sequence on the chromosomes. But Nexgen is a short-read method with fundamental limitations. There are gaps in the sequence data. It can’t make sense of the tightly wrapped telomeres and centromeres, nor can it fully sequence non-coding regions.
With long-read tech, scientists can “see every crater, every color, from something that only had the blurriest understanding of before.” And make thrilling discoveries, “researchers have uncovered more than 100 new genes that may be functional and have identified millions of genetic variations between people. Some of those differences probably play a role in diseases.”
Advanced sequencing technology also plays a role in addressing the diversity problem. The current Reference Genome DNA is from mostly European populations. It lacks the full spectrum of worldwide genetic variation. So scientists are building a Human Pangenome with long-read tech. They’re accumulating so much data that two projects are set up to sequence, map, and analyze it:
Telomere to Telomere Project fills the gaps by sequencing chromosomes end-to-end from people worldwide with diverse ancestry. They’ve already identified “hundreds of thousands of novel variants per sample — a new frontier for evolutionary and biomedical discovery.” More about the T2T project here.
Human Genome Reference Program will gather “high-quality gapless sequences from ancestrally diverse people.” It will consolidate data from many sources, such as the T2T project. Other researchers are improving sequence technology, and bioinformatics scientists are designing tools for better data analysis. More about the program here.
The Human Pangenome Reference Consortium “aims to issue a new pangenome reference assembly to the international community that reflects the full range of genomic diversity across the globe. We are committed to achieving this goal in a responsible and ethical manner, with explicit attention to community engagement, inclusion, and fair representation.” More about the consortium here.
Put them together, and voila! Pulling data from a wide variety of sources will take years to sort and analyze. The result won’t be just one Reference Genome. What started more than twenty years ago with “the one” will expand into a worldwide compilation of hundreds of data sets — beautiful Pangenomes for a unified representation of the human species.
Finding wonder and beauty in the science of us. From an artistic perspective, we could say scientists are gathering all the subtle hues of human variation. And we are mining the new data for hidden gems. What can we imagine with the full spectrum of human genetic diversity? What would you write, draw, compose, sing about your genome?
The image at the top is an imaginary interpretation of the diversity of the human species. Like a flowering garden under a multi-colored sky, your genome holds undiscovered variety, offering a treasure of possibilities.