Exploring the Human Genome: The Differences are in the Details
March, 2004
Click. Click. Not much more is needed to generate detailed online maps for any region of the globe. However, creating a high quality map of the human genome is far more complex than a simple mouse-click on your favorite website. The human genome is 3.1 billion bases long, and mapping what each of those bases actually does is one of the most challenging questions a biologist can ask.
For the most part, scientists have been restricted to studying the ~30,000 regions of DNA thought to code for protein. The remaining 98% of our DNA has long been considered silent and its function has not been thoroughly studied - largely because the technology needed to globally examine those billions of bases has not been available.
However, new collaborative efforts, such as the Encode (Encyclopedia of DNA Elements) project, are now attempting to examine these poorly understood regions of DNA. And as part of this international consortium, Affymetrix is applying innovative whole-genome analysis technologies to explore the organization and function of the complete human genome.
To this end, Dr. Tom Gingeras and his team at Affymetrix Research Laboratories have recently developed a new type of GeneChip® brand microarray which uses genomic sequence to "tile" millions of DNA probes representative of the complete genome, not just the 2% historically considered to be most important. The research group is using these tiling arrays to examine the inner workings of our genome and find out how the sequence of our DNA functions to create life. The team of scientists has already demonstrated transcription the conversion of DNA into RNA occurs in large regions of the genome previously considered to be silent. Their work raised the possibility that the human genome has hidden levels of complexity that we are only beginning to understand.
In the February 17, 2004 issue of the journal Cell, the research group provides additional information about these hidden transcripts, fundamentally changing the way we view the genome to function. The scientists used GeneChip tiling arrays to identify transcription factor binding sites regions of DNA that regulate RNA production on human chromosomes 21 and 22. They discovered a large number of unexpected binding sites distributed across the two chromosomes, suggesting a complex level of transcription and regulation, and challenging our long-held notion that a genome is made up of relatively small regions of coding DNA connected by long distances of "junk."
In fact, only about a quarter of the binding sites were located in their expected positions at the start of a gene. To help explain why the majority of these sites were unexpectedly located inside a gene, at the end of a gene or in a "junk" part of the genome, the research team used tiling arrays to examine transcription across the same two chromosomes. They found most of the binding sites are not randomly placed, but actually represent the start of many new RNA transcripts, both within and between known coding genes. These transcripts have very little potential to encode proteins, but Gingeras' team discovered that many of them are expressed at the same time and regulated together with the coding transcripts, suggesting the common group of RNAs function in concert.
"We can no longer think of a gene as a simple region of DNA that transcribes RNA for the sole purpose of making proteins," explains Gingeras. "The reality is that a single gene may be a large region of DNA from which a whole cast of RNA molecules are transcribed, all of which are expressed in a coordinated fashion to provide a biological function."
Scientists at Affymetrix Research Laboratories are using a new type of GeneChip® brand microarray to challenge fundamental concepts in biology. Then have shown that a "gene" is not only a single protein-coding stretch DNA (downward arrows), but may actually be a large region of the genome from which a whole cast of RNA molecules are transcribed (blue shaded box).
By extrapolating their binding site findings from chromosome 21 and 22 across the complete genome, the scientists estimate that there are 30-40,000 binding sites for just the three regulatory factors they studied (Sp1, cMyc, and p53). And given that not all well-characterized coding genes are regulated by these factors, they suspect that this data represents only a fraction of our total genes. In the future, as other transcription factors are studied, they expect to discover even more transcripts and added levels of complexity within the human genome.
While revolutionary, the research performed by these scientists really gets back to the basics of genetics, where the term "gene" was originally described as a theoretical entity responsible for a particular trait hair color or eye color, for example. In a recent issue of Science, Sydney Brenner, 2002 Nobel Laureate from the Salk Institute, wrote, "Old geneticists knew what they were talking about when they used the term 'gene,' but it seems to have become corrupted by modern genomics to mean any piece of expressed sequence." Defining the basic span of expressed DNA as a gene, may have been oversimplifying the situation.
"Our discovery of increased complexity may actually provide an explanation for the vast diversity found between human beings that share many of the same 30,000 coding genes," says Gingeras. "The answer lies not only with those RNA molecules that make protein, but with the tens of thousands of newly discovered RNAs that help regulate their function, thereby creating a complete gene, not just an expressed sequence of DNA."
Using GeneChip tiling arrays, Gingeras and his research team discovered unexpected functions for already known genes and new functions for previously unexplored parts of the genome, helping to redraw our map of the human genome. While the much studied protein-coding genes may represent the main roads and highways, the equally important side streets, service roads, traffic signals, and stop signs are just starting to be discovered. "Piece by piece, we are beginning to create a high-resolution map of the human genome, providing detailed information well beyond its basic sequence," notes Gingeras. "The hope is that one day, scientists can more readily understand the biology of health and disease by simply looking up a sequence of DNA from anywhere in the genome and quickly finding out its function."
In the end, using the genome map may not be so different than using your favorite online map site - a world of genomic information for over 3 billion bases at your fingertips, all at the click of a mouse.
The author, Thomas Broudy, interviewed Dr. Gingeras in February, 2004. Dr. Broudy received his Ph.D. in Microbiology at The Rockefeller University. He then continued his research in this field, first as a postdoctoral associate at The Rockefeller University and later as a postdoctoral scholar at Stanford University.