Word Count: 4219
“ If you don’t know where you are going you may end up someplace else” Yogi Berra
Any discussion of the human genome project must date back to the earliest sources, since the genome itself represents the history of mankind as well as all life forms. Aristotle generated the “ concept” of a chicken is implicit in the egg. As a corollary the “ concept” of an oak is implicit in an acorn. In 1648 Vanhelmont stated “all life is chemistry”. Certainly, things were becoming more obvious when Wohler noted that some life at least is chemistry. At that time he was able to synthesize urea from organic chemicals. This represented the first demonstration that some organic chemicals could be made from basic elements. The idea of genetic material actually had early consideration by Erasmus Darwin in 1794. The grandfather of Charles postulated living filaments, which control species in plants and animals. Gregor Mendel published his work in 1860 on 30 thousand pea plants that showed characteristics do not mix; his observation regarding genetic material was not “rediscovered” until the early 1900’s. In 1943 Avery was able to identify DNA as the compound capable of transforming a pneumococcus from a harmless species to a virulent bacterium. Certainly, all have heard of the important step just 10 years later when Watson and Crick were able to state, “ We have discovered the secret of life”. This important step allowed the coming together of the “ atomic” period of biology. Just as John Dalton had described atoms as being parts of more complex chemicals the actual gene unit finally had some level of definition. Certainly other names have been used including factor, plastidule, pangene, biophor, id, idant, and gemmule. The latter term was coined by Charles Darwin in his incorrect theory of pangenesis. In the gemmule theory the germ cells accept a genetic information unit “gemmule” from every cell in the body. These migrate during life to the germ cell and the germ cell itself will then have a complete copy of the body. Though this approach was inaccurate the term gene was derived from Darwin’s pangenesis. Possibly if Darwin had been aware of Mendel’s work we would now be speaking of the Human Gemmule Project.
The riddle of which came first, the chicken or the egg has application with the genome. A chicken is just an egg’s way of making another egg.
As the gene has become a digital read out mechanism for the information of life one might note that a protein is just a gene’s way of making another gene. Alternatively, a gene is a protein’s way of making another protein. One must remember that all cooks need recipes as all recipes need cooks. Though this paper is primarily related to the “genome” as a concept, the future will have the opportunity to examine the “proteome: as an expression of life itself.
Some definition of genetic terms from the book of life would certainly be appropriate at this point. Individual cells make up the human body, which has 100 trillion cells. Within each cell there is a nucleus, which contains genetic material, two sets of each making up an estimated 80 to 100 thousand genes. One can look at the collection and the book of life as having 23 pairs or chapters at least in humans. Each gene would represent a discrete story and there are thousands of stories per chapter. The term exon represents a segment of a gene with instructions, and are paragraphs within this story. To add complexity introduce introns, which are noncoding DNA sequences, which separate exons. One could think of these as advertisements, though they could represent hidden instructions critical to the digital message. The codon represents a 3-letter word for the 20 amino acids. As Watson and Crick defined, the final bases represents the actual letters making these words. Certainly adenine pairing thymine (A, T) and cytosine pairing with guanine (C, G) are known to most school children. The final book contains a billion words or 3 billion bases. The human genome project is primarily focusing on the exons or the message that codes to proteins. It is important to note that this is only 3% of the genome. A large amount of the genome represents introns, which are not directly responsible for generating protein. The study of the introns has shown that 1.3% of the human genomes includes indigenous retroviruses (HRVS). Each genome in the human literally contains hundreds of copies of reverse transcriptase – an enzyme not actually functional in the human cell but placed there presumably by viruses unable to reduplicate themselves but maintaining a space in the genome. Approximately 15% of the genome represents so-called retrotransposons, these represent enzymes like reverse transcriptase that in reality would appear to be DNA that has lost the rest of the viral genome.
This component of DNA is able to reproduce itself and seems to be “simply good at getting duplicated”. This type of DNA is referred to as “selfish DNA”. This DNA as well as so called microsatelite DNA, again a polymorphic type of DNA, turns out to be particularly important in DNA analysis. Each individual has a unique collection of this relatively describable DNA-, which allows for DNA testing in the legal sense. However, this extra DNA in some ways seems to control the genome or at least tie up the manufacturing process. So defenses have evolved with time to turn off these genes. One example is the methylation of a gene, which can turn off unused segments of DNA. An undifferentiated cell will use methylation to allow differentiation to occur into specific cells such as liver, kidney, etc. Interestingly cancer cells often become embryonic in character and loose control – one of the things they loose is this methylation process.
The human genome project has evolved for the last 15 years since Nobele Laureate Delbecco and others called on the scientific establishment to define the genome in an effort to attack cancer. The genome project is quite unique; instead of individual labs it certainly initially was viewed as a pooled project with significant government funding. The initial funding came from the Department of Energy, which soon went on to recruit the National institutes of Health as well as other biomedical concerns such as the Welcome Trust and the Whitehead Institute. Financial issues have brought in other players. The now well-know company Celera has used its own methods to decode the genome since before 1997 and now we have small players such as Double Twist using supercomputers and published DNA sequences to separate out actual genes. This privately held group has programmed the computers to remove various types of introns and use the typical cell signals in the stop-go sequences to identify at least 60 thousand genes to date. Estimates of an additional 40 thousand genes based on residual genome material are present. An interesting sidelight has been the unique approach of Celera in addressing the sequencing process. The original sequencing approach was a much slower process using serial sequencing, which includes the polymorphic DNA. Celera has used the more interesting shotgun approach. This approach has used RNA templates since RNA represents genes that are actually functional. They work backwards from the RNA template to a DNA status and then with computer applications try to link defined sequences into potential components of the genome itself. An important analogy of the human genome project is probably that of the periodic table. In 1869 Mendeleyev set himself the task of discovering a way to bring order to the numerous elements, which had recently been discovered. He awoke from a dream with the realization that the periodic table of elements had a special form and very quickly “wrote down his dream”. The step led to a remarkable explosion in basic chemistry with true atomic advances being possible on the basis on predictions from the periodic table, many of which occurred within the ensuing twenty years. The human genome project may well represent a biologic periodic table. Just simply defining the table however will only allow scrutiny of the roll of DNA in life and this will certainly ensue in the next twenty years of life. In a recent conference the Dean from Harvard Medical School referred to this technology as “ the most important new technology in the past 100 years with no exception”. At this point the workers in the human genome project are extrapolating potential applications. There is much more than the single sequencing of a figurative example of a genome. Once this genome has been defined, sequencing and resequencing will be necessary to define any other components and relations. Certainly individual genes will need to be readdressed and resequenced but the genome-wide questions will include cross species and special relations. Branch points will need to be identified and much discussion will occur in pursuing evolutionary molecular biology. The actual diversity of nature may be addressed since nature has managed to have an economy of genes yet a remarkable diversity of end products. There seems remarkably little in the actual number of genes separating an elephant from a gazelle and the mechanism of this diversity will certainly warrant very careful study. As the process is refined one must think of the various common variants in genes- the allele. Where a gene is an element, one can think of the alleles as the isotopes. In this light specific, more complicated disorders can be considered. Certainly the seemingly uninteresting protein in apolipoprotein E turns out to have remarkable roles in the genetic basis of Alzheimer’s disease as well as various cardiovascular diseases. Application of the genome will also have focused on population genetics. The availability of the genome for study will bring its application to mainstream biomedical research as a critical tool. An example will be what seems to be unfolding in Iceland. The Icelandic project, which is coordinated by a private company (a former Harvard professor has founded deCode). This project is based on the unique genetic profile postulated in Icelandic peoples. The 270,000 Icelandic residents are fairly unique in that they have had little genetic mixing since the country was populated approximately 1,000 years ago. This fairly homogenous genome will be directly studied since the Icelandic government has kept a remarkable genealogy and medical history on every resident of Island for upwards of 100 years. In fact, the first result describing a new gene associated with strokes was just reported last month after studying 1,200 stroke victims from Iceland and identifying a common gene. The ramifications of privacy as well as economic are remarkable given the horizons of this project. Hoffman-Laroche has already advanced $200 million to the country of Iceland for presumed future benefits from this licensing arrangement. Well-known geneticist, Paul Berg, stated 10 years ago that all disease is genetic. Application of population gene evaluation will likely be most interesting indeed.
At this point, I’d like to diverge to speak of some very exciting but specific examples in molecular biology that the genome project likely will be able to address and are almost certainly to have major impact on basic science as well as clinical medicine in the future. In just the last few years, the Ubiquitin system has jumped from an interesting novelty to an almost cerdent fundamental role in a cell cycle that’s quite important in every day cellular function. The process of cellular housecleaning includes identifying proteins that need to be discarded or turned off. This is associated with a proteolysis system, which allows identified proteins to literally be recycled within the cell. This fundamental recycling process turns out to be critical in controlling various protein levels and allowing the cell to function. One might consider it as important as ones own ability to clean a refrigerator of various not so useful items whether it’s based on the sense of smell or the presence of furry green residue on the cheese. This housecleaning function involves several steps. There is an activating cascade involving 3 types of enzymes-E1, E2, E3. These enzymes serially activate, conjugate, and ligate proteins that are designated for disposal. Ubiquitin turns out to be a highly conserved 76 amino acid polypeptide, which when attached to a protein gives a signal to the cell to destroy this protein. A large protease, which is a 26 S sub-unit, recognizes any Ubiquitined protein and rapidly will destroy that protein. Multiple cell functions require the Ubiquitin system to establish real time control of very complex cellular functions. These would include such diverse functions as the metaphase/ anaphase cell cycle role. Additionally control in vacuoles and lysosomes would appear quite important. There is a tremendous role in muscle catabolism, especially induced at times of sepsis and degradation of associated released proteins from muscle actin and myosin. Various cell channel control mechanisms, including permeases, are Ubiquitin dependent and in fact, the gene transcription system and DNA repair enzymes are regulated with Ubiquitin monitoring. The final aspects of cell differentiation as well as embryo genesis are under Ubiquitin control and in fact the actual process of programmed cell death is under Ubiquitin control. The process of Apoptosis (Greek for falling leaves) has tremendous impact in cancer management as well as the process of aging itself. Specific diseases have already been associated with Ubiquitin regulating proteins including cystic fibrosis, Huntington’s chorea, various cancers including colon as well as Alzheimer’s and even the mad cow disease (a prion encelopathy) has a relationship to uncontrolled protein management. Certainly, the story of the Ubiquitin system is markedly evolved since the initial description of the so-called RING finger marker seen on many proteins. RING is an acronym for Really Interesting New Gene and it turns out that this component of a protein has been identified in over 400 major regulatory and functional proteins. It seems to be the binding site for Ubiquitin itself and dysfunction in the RING finger component can lead to major cellular dysfunction. It has already been shown that lots of ubiquitinzation is associated with a number of genetic disorders and has a role with such basic control mechanisms as angiogenesis- a strong requirement for tumor growth. In fact the P-53 regulation system also has been found to be interrelated to the Ubiquitin system. For some years, the P-53 regulation system has been found as the final step in control or loss of control, which leads to cancer. With this critical piece of information, the genome has been search and already, 400 proteins identified to have the RING finger so important in cellular control mechanisms. The now famous BRAC-1 breast cancer gene is related to the Ubiquitin dysfunction and in fact the processes involved in colon cancer evolution (see below) are also interrelated. The Ubiquitin system also is involved with viruses such as herpes simplex which can fool the system with false signals and prevent removal of the virus from the cell. I use Ubiquitin as a prime example of a potential application to the human genome project in molecular medicine. The use of genomics will give tremendous insight potentially in developing agents that can control or reestablish control in ubiquitinization. One can almost certainly extrapolate that in the next 5 years; specific agents aimed at Ubiquitin dysfunction will be applied in cancer management.
An additional genetic based disorder involving a field closer to my own expertise is that of colon and rectal cancer. Colon and rectal cancer involves approximately 130,000 diagnoses per year in the United States and 56,000 deaths. This common cancer has a lifetime risk of at least 5% and has required vary primitive techniques for management, i.e. surgery and/or endoscopic surgery. The specific risk factors are just now becoming better defined whether on the basis of age itself, content of the genome or exposure to environmental factors including diet or potential modulating drugs. Colon cancers can be separated into three broad categories including hereditary colon cancer, sporadic colon cancer, and colitis associated colon cancer. It has become apparent that the cumulative lifetime genetic alterations will play a role in progression into an actual cancer. Examination of unusual but important genetic diseases associated with colon cancer have lead to interesting application of genetic factors to the cumulative risk process. Familial adenomatous polyposis is a disorder of multiple adenomatous polyps and has a specific gene dysfunction (APC) associated with the presence of this many polyps. The hereditary nonpolyposis colorectal cancer syndrome is also now associated with DNA mismatch- repair gene. If one takes this in balance with various autocrine regulators such as epidermal growth factor and transforming growth factor, the complex arrangement leading to cancer becomes even clearer. When placed in relation to the final loss of P-53 function the far more common form of sporadic colon cancer can finally be better defined. In fact, the excessive expression of these growth factors or inappropriate expression or dysfunction of the P-53 subunit as well as lack of regulation in the genetic forms of cancer are all probably interrelated to the ubiquitin system and its dysfunction. Certainly the advent of chemoprotection will have a means of modulating the changes associated with adenomatous polyp development or degeneration to dysplasia and post colitis related cancers. However, the cyclo-oxygenase blocking agents as well as various environmental/dietary components such as calcium and folate will have much less specificity as a means of control than modulators that effect the ubiquitin system. Application of chemotherapy and radiation therapy to cancer management may have a specific role since these agents tend to activate the ubiquitin system at the time of DNA release and cell damage and/or death. The actual mechanism of killing cancerous cells at the time of chemotherapy and radiation may in fact have a final end point in reactivation of the ubiquitin system and allowing for controlled cell death or apoptosis. The ability to discern whether a cancer cell is carrying functional or dysfunctional regulators and in turn sensitive to various chemotherapy and/or radiation moralities may turn out to be excellent predictors as to which patients can be treated with such agents successfully. Genetic analysis of the tumor itself may therefore be an excellent predictor of response to various treatments or suggest the needs for more specific genomic-based treatment or eventually gene therapy.
Another interesting molecular application of the genome project would be in regards to telomerase and its potential role in colon cancer as well as in cellular aging. Any given cell must be copied anywhere from 20 to hundreds of times in the lifetime of an individual. This, for all intents and purposes, represents a photocopy type process of the chromosome itself. It was in 1972 that James Watson initially made the interesting observation that polymerase in coping DNA cannot actually start at the tip of the chromosome. They always have to start several words into the chromosome thus each copy will lose some of the tip. The tip is felt to be a “meaningless text” and in fact in all life forms, essentially represents the same repeating sequences TTAGGG and may repeat as many as 2,000 times. Once a cell differentiates, it looses the ability to replace this so-called meaningless text at the tip. As multiple copies are made, eventually the genome for protein producing DNA may become involved thus leading to ultimate cell senescence and in fact cell death. At the time of differentiation, a cell loses the ability to make telomerase and a molecular “stop watch” kicks in leading to a fairly controlled but limited expected life span of that cell before inforced cell death will occur. Interestingly, cancer cells regenerate the ability to make telomerase (much as I mentioned for the methylation process as noted above). Using genetic techniques, telomerase can be reintroduced into cells and it will be most interesting to see if these cells turn out to live longer or die of cancer. In fact, there is a family of “knock-out mice” that have been made with extra telomerase and everyone is waiting to see if they will have a longer life span or a more rapid cancer risk. Again, the opportunity for blocking agents based on knowledge of this control function will be amplified because the genome project allows basic understanding of the component parts and offers potential genomic or genetic therapy.
The process of genomics is a neologism suggesting that genetic information can predictably allow new form of drug treatments. It was estimated that only 500 specific sites exist for management with all the known medicines that we currently have. Certainly, any given drug often has counterparts each by various drug companies but addressing the same site, i.e. Tagamet, Axid, Pepcid, Zantac, … It is suggested with genomics, one could design upwards of 10,000 sites in short order and from this develop drugs that would have potential therapeutic benefits. Some of the potential examples have already been listed above. An interesting corollary to this production of new drugs has been the use of DNA material inserted into alternate organisms in an effort to produce compounds or drugs that would otherwise be difficult to harvest. This has been referred to as Pharming in which hard to process compounds are inserted into the production apparatus of other organisms. This is certainly lead to products already present and already evolving controversy in genetically modified foods (GM foods). The latter is potentially a whole other story. Certainly new ways of delivering pesticides or toxins have already been applied in agribusiness. I think the story of the funnel spider is worth retelling. This particular spider on the most poisonous and produces a host of neurotoxins. A gene for some of these neurotoxins has been placed in a baculo virus, which is known to infect various types of moths. This is actually a reapplication of work that was tried with scorpion toxin but didn’t work because the gene allowed production of the protein but the protein did not fold correctly and was essentially denatured. Certainly the regulator proteins had not been added in the correct sequence. In any event, the use of this toxin is to be that of a biopesticide for a so-called “natural” killing of moths and subsequent control of a pest species. Obviously, concern does exist that the claim of targets specificity is not adequate. The same toxins may in fact affect desirable butterflies. There has already been a major fall out over the use of BT toxin in corn and its effect on the monarch butterfly (the Bambi of the insect world). Additionally, genetic information of the toxin could in fact jump species to a different virus and gain a whole new set of potential hosts and targets.
Future issues in regards to the genome project and its various tools are numerous in number and ever changing as new techniques and potential ramifications are identified. Issues in genetic discrimination are very important particularly at the recent NIH conference on genes and society that I attended. Various legal applications including the death row project using DNA probes have already gained national attention. The issues of patents will result in much litigation in coming years and in fact, a supreme court justice attended this conference and speculated that ethical and financialists interests will need to be scrutinized carefully in regards to the interests of society vs. the individual. Genetic modification that might effect the germ cell will have marked implications for not just the individual in question but also future generations and the issues of informed consent or lack of informed consent for these generations has already been discussed but not solved. I also see remarkable hope but also potential angst in regards to the lifting of ban on stem cell research. Application of toti potential cells and potential medical roles brings with it tremendous hope for victims of burns or those in need of a specific organ transplant. It is most likely that adult stem cells will be the main focus because of controversy in regards to embryonic stem cells (abortion products). Additionally, autotransplantation with an adult stem cell has obvious ease in application if the dedifferentiation process required can be suitably controlled. Means of inserting genetic material using viral vectors has already been performed in humans. I see potential, remarkable ramifications if the recently patented technique for artificial chromosomes were to be broadly applied given the opportunity of replacement for large numbers of genes at a time with the control mechanisms already in place in the chromosome. Lastly, the opportunity for bioterrorism at a level one could not even imagine at this point could certainly be given whether application of toxins to a highly infectious virus in its release or other forms of bioterrorism, i.e. a potato that contains prions. The future holds much and reminds us of Tiresias, the blind seer of Thebes. It seems that he had seen Athena bathing and she blinded him. When Athena repented, she gave him the gift of soothsaying. Subsequently, however, Tiresias told Oedipus that seeing the future, in fact, is a terrible fate “it is but sorrow to be wise when wisdom profits not”. Hopefully, as a society, we can be wise and also control the modifications that will be performed so that individuals and society can profit from this most important technological breakthrough in the past 100 years.
Roger Orth, M.D.