What is a gene?

What is a gene?

Scientists don’t always agree on what is a gene. More surprising, in the 20+ years since the launch of the Human Genome Project, we don’t know exactly how many genes we have. The problem is part technology and part definition.

Defining what a gene is and the challenge of finding and counting them turn out to synchronize in a thought-provoking way. Think whimsical concept with a technology challenge.

About the technology challenge. The original draft of the Human Reference Genome was assembled from short-read sequencing technology. The early tech couldn’t map regions considered “dark,” such as large duplications and long repetitive sequences. New, long-reading machines are decoding the dark zones and “eureka!” finding new genes.

How many genes? Most sources tasked with gene counting identify only protein-coding genes on our chromosomes. Many genes don’t code for protein, yet these non-coding genes provide important functions. Until long-read tech becomes the standard, new genes and what they do await discovery. For now, (summer 2021), this is the latest count of chromosomal genes: 

Protein coding genes: 19,969
Non-coding genes: 23,982

What about the number of genes in our mitochondria (mtDNA)? Scientists are pretty sure mtDNA has 37 genes and that they all code for essential functions:

Mitochondrial genes: 37

What is a gene? The textbook definition explains that a gene is a gene if it codes for protein. But some genes don’t. Textbooks also say a gene is a linear sequence of base pairs that starts in one place and continues to end at another. But the exact what and where turn out to be fuzzy.

Genes have parts, and they sit far away from each other on the long strands of DNA. The strands bend and turn to bring the parts closer together to allow a gene to activate. You know it as gene expression and regulation.

A new definition: A gene is a concept: not just one, simple thing but a collection of interrelated functions that bind us together.

It’s a whimsical and abstract perspective that Dr. Thomas Gingeras, Principal Investigator, NIH ENCODE Project favors. Why a conceptual definition rather than a concrete one? Because the adjusted perspective opens a portal to expanding our understanding of biology. Provoking new questions that long-read tech might be able able to answer.

Watch Dr. Gingeras’s lecture “Gene is a Concept” about the evolving definition as gene functions are better understood (start at 11:01 min).