Building a beautiful Pangenome
Building a beautiful Pangenome
Cool tech and human ingenuity are building a beautiful Pangenome. It has a curious backstory. For twenty years, geneticists at the Human Genome Project have been looking into the nano-spaces of our genome to get at the precious information. It’s been tough because DNA is bundled tightly into chromosomes and packed into the tiny nucleus in the cell.
They used NexGen technology to “see” and “read” the exact ATCG sequence on the chromosomes. Especially difficult are the telomeres, the bits at the end of our chromosomes, and the centromeres. Both have thousands of repeated segments of DNA.
Nevertheless, scientists slowly built the Reference Genome for the past two decades to give “the world a resource of detailed information about the structure, organization, and function of the complete set of human genes.”
Scientists aren’t satisfied with the twenty-year-old Reference Genome. They’ve been frustrated with the limits of Nexgen’s “short-read” technology. There are gaps in the sequence data. It couldn’t make sense of long repeats nor see into the “dark zones.” Furthermore, just one Reference Genome has a diversity problem — it lacks the full spectrum of worldwide genetic variation.
From an artistic perspective, we could say scientists are looking for all the colors of human variation. What a fantastic painter’s palette to have the full spectrum of human genetic diversity.
Now, with new “long-read” tech, they can “see every crater, every color, from something that only had the blurriest understanding of before.” And making thrilling discoveries: “researchers have uncovered more than 100 new genes that may be functional and have identified millions of genetic variations between people. Some of those differences probably play a role in diseases.”
Building the Pangenome. Data from the new long-read tech is the infrastructure for a new reference genome, the Human Pangenome. There is so much data (it’s called Big Data) that two projects are set up to sequence, map, and analyze it:
1) Telomere to Telomere Project fills the gaps by sequencing chromosomes end-to-end from people worldwide with diverse ancestry. They’ve already identified “hundreds of thousands of novel variants per sample — a new frontier for evolutionary and biomedical discovery.” More about the T2T project here.
2) Human Genome Reference Program will gather “high-quality gapless sequences from ancestrally diverse people.” It will consolidate data from many sources, such as the T2T project. Other researchers are improving sequence technology, and bioinformatics scientists are designing tools for better data analysis. More about the program here.
Put them together, and voila! Pulling data from a wide variety of sources will take years to sort and analyze. No more, just one Reference Genome. What started twenty years ago will expand into a worldwide compilation of hundreds of data sets for a unified representation of the human species — a beautiful Pangenome.
Finding wonder and beauty in the science-of-us. From an artistic perspective, we could say scientists are gathering all the subtle hues of human variation. And we are mining the new data for hidden gems. What can we imagine with the full spectrum of human genetic diversity? What would you write, draw, compose, sing about your genome?
Tips: Follow the science. Get your whole-genome data. And your inner artist and scientist will shine.