Plumbing the ‘dark’ genome for new genes

On 26 June 2000, former U.S. President, Bill Clinton, announced the completion of a draft sequence of the human genome, a historic landmark for genetic research. The Human Genome Project helped map our genes, strengthened the study of human diseases and aided new drug discovery. But even after two decades, the number of ‘known’ genes – encoding around 20,000 ‘known’ proteins - has remained constant. It is also a conundrum why only 1.5% of the entire human genome codes for proteins.

A team from the University of Cambridge set out to find whether new genes emerge in the genome of living organisms, and if they do, how they do so.

In the last seven years, the team extensively studied the human genome and has now catalogued 1, 94,000 novel regions. The results were published in Genome Research.

Novel regions

“These ‘novel’ genomic regions cannot be defined by our current ‘definition’ of a gene. Hence, we call these novel regions – novel Open Reading Frames or as nORFs. We show that the mutations in nORFs do have physiological consequences and a majority of mutations that are often annotated as benign have to be re-interpreted,” explains lead author Sudhakaran Prabakaran from the Laboratory of Noncoding genome and Data Science at the University.

When asked why we weren’t able to see or find these regions earlier, he added that in the last 10 years new technologies have helped look at the entire gene better. “For example, if you were to look at a mountain range from the top, you will only see the peaks. But as the resolution of the technology improves, you will see things that are present in the lower peaks and you can see the valleys. So, new genomic and proteomic technologies, algorithms have enabled us to see the complete landscape of everything that is being made from the human genome,” he explains.

Connected to disease

The team found that these regions are also broadly involved in diseases. The nORFs were seen as dysregulated in 22 cancer types. Dysregulated is a term which means that they could either be mutated, upregulated, or downregulated, or they could be uniquely present.

A paper published last month by the team in npj Genomic Medicine noted that these regions were uniquely present in the cancer tissues and not present in the control tissue. They found that some nORF disruptions strongly correlated with the survival of patients. “More importantly, we show that nORFs proteins can form structures, can undergo biochemical regulation like known proteins and be targeted by drugs in case they are disrupted in diseases,” adds Dr. Prabakaran.

The researchers also identified these nORFs in Plasmodium falciparum, the parasite which causes the deadliest form of malaria. The results were published last week in Malaria Journal.

This shows that there is an urgent need to redesign our existing drugs that target only the known proteins in the parasite.

The team is now systematically ‘mining’ the dark genome to identify more such novel proteins and investigating whether they could be involved in disease processes. They have also identified 50 such novel proteins disrupted in schizophrenia and bipolar disorder. The results are yet to be peer-reviewed and published. The researchers are positive that these novel proteins are the key to diagnosing and treating complex diseases.