The Human Genome Project is a multinational scientific investigation aimed at describing the entire humane genome. Scientists from around the world are working toward evaluating and identifying every base pair of nucleotides (thus, every gene) contained in human DNA. An estimate of the size (the number of base pairs) of the human genome is at least 3 billion. If this approximation is correct, it would take a researcher, sequencing one base pair every second, about a thousand years to complete the task. This is truly a Herculean challenge.
Why are we engaged in such a study? To many this may seem like a lot of work that may lead nowhere. Others imagine the information garnered from this study as a cure for many of the genetic disorders that disable humanity; they even suggest that it may provide cures for all disease. If this view is just the slightest bit correct, all work involved in mapping the human genome would be justified. Let's examine some of the work involved in evaluating and identifying the base pair sequences in human DNA.
The immense size (the number of base pairs) of the human genome, as we've mentioned, would require thousands of scientists working years to evaluate. To do this task one base pair at a time seems unreasonable. Ultimately, however, this is the goal. The output from all the collective work of all the labs and all the researchers is supposed to be a complete listing of every possible base pair that occurs within the human genome. The base pairs of nucleotides are arranged in a long sequential line and make up a DNA molecule. We have developed laboratory methods that can identify any individual nucleotide. There are even some methods that will tell us whether a specific pair or triplet of nucleotides are present. However, we need to know the actual sequence of the nucleotides. This requirement poses a formidable task. For example, if we had a sequence that looked something like this: 1-2-3-4-1-2-3-4-1-2-3-4, we could easily discern that there were only four specific numbers in this sequence. If we were to treat these numbers as if they were chemical nucleotides we might break them apart and analyze the soup that resulted from their separation. We would find a mixture of four 1s, four 2s, four 3s and four 4s. Of course, we may not know for sure that there are four of each number. We may only know that they are present in equal proportions. We might develop some tool or enzyme that would break the sequence at some specific spots (i.e. between every occurrence of the numbers 4-3 but not the numbers 3-4. This tool would not be very useful for this analysis. But we might be lucky enough to have an enzyme that did indeed break our sequence between the numbers 3-4. Using this enzyme on our sequence would yield some fragments that looked like these: 1-2-3, 4-1-2-3, 4-1-2-3, 4-1-2-3, and 4. Testing the resultant soup of fragments we would find three types of fragments. One of these fragments (the 4) would be only one number in size and as such we could identify it as the number 4. This information would tell us that the sequence of numbers that we are studying must begin or end with the number 4. Now we might develop some other chemical or enzyme that reacted only with the number 4. The result of this reaction might be the attachment of a string of zeros onto the 4. We know that our original sequence either ends or begins with a number 4, therefore, we will make a new sequence that looks like one of these: 0-0-0-0- (original sequence) or (original sequence)-0-0-0-0. If we had some other tool that would break the sequence between some other number combinations (2-3 or 5-4 or 1-2, etc.) we could test them one at a time and analyze the results. The test with the 5-4 enzyme would be negative because we would find no number 5s in our sequence. Likewise the test with the 1-2 enzyme would yield a group of fragments that looked like these: 1, 2-3-4-1, 2-3-4-1, 2-3-4-1, 2-3-4-0-0-0-0. We could test to see which of these fragments contained our marker sequence of zeros and then extract only those fragments that contained zeros for the next step in our analysis. If we could break up this small sequence attached to the zeros we would find that it contained 2s, 3s, 4s and zeros. We know from previous experiments that the 4s come from one end or the other. We now also know that attached to the terminal 4s we find either a 2-3 or 3-2 combination. Such testing and experimentation continues until we have identified ever single number (base pair) found in the original sequence.
Short-cuts immediately come to mind. Perhaps we could slice up the original sequence into several shorter sequences and divide the work up among several researchers. Whenever all of the separate researchers have figured out the sequence of numbers in their individual short sections, we might be able to put all this information together and come up with the complete sequence of the entire original length of numbers. This concept is workable if we could somehow introduce a little bit of over-lapping or some sort of marking vector that would tell us exactly where in the original sequence, a particular short section belongs.
From this not so brief explanation of sequencing methodology, you have perhaps realized that even for a short sequence of base pairs, a lot of work is required to uncover the exact nature and identity of the sequential components. Our little demonstration was for a sequence of only 16 numbers or base pairs. Picture the work required for the DNA in the human genome with its billions base pairs.
For the past several years the Human Genome Project has progressed in a piecemeal fashion. One research group working on a small part of one chromosome, another group working on another piece or another chromosome, etc. This chromosome by chromosome, bit by bit, approach has not progress the project through more than one-third of the material in the human chromosomes.
Special techniques and bioengineering tools have been developed to help the project. The polymerase chain reaction (PCR) enables tiny fragments of DNA to be amplified and multiplied into large quantities. These large quantities of DNA are then studied to uncover the sequences of their base pairs. Yeast artificial chromosome (YAC) methodology utilizes portions of yeast chromosomes to act as markers and splicing vectors to help keep track of the various pieces of DNA that are sliced from the large DNA sequence. The use of YACs has made the study of many portions of the human genome almost a practical matter. The combination of YAC and PCR technologies has become the mainstay of the Human Genome Project. Variants of these methodologies have also come into play. Something called mega-YACs are being tested. These YACs seem to be able to hold onto huge slices of DNA. With these scientists are able to process far more DNA at one time than ever before. The speed with which the project seems to be progressing is increasing with each year. The introduction of errors or extraneous sequences also seems to be increasing as we go along.
The information about various sections of specific chromosomes and the sequences of identified base pairs is recorded in large linkage maps. These genetic linkage maps are special cross- referenced tables and listings used by researchers to find their way along the human genome. Maps such as these have helped researchers locate and evaluate the genes responsible for diseases such as cystic fibrosis (a lung disorder,) neuro- fibromatosis (a nerve disorder) and fragile X-linked mental retardation. As studies progress, the details represented on the genetic linkage maps is updated.
An interesting side note to all of this intense study, the search for functional genes and the understanding of our complete genetic make-up is that our genome seems to contain a lot of useless (or at least non-functional) material. There are also a lot of redundant sequences (sequences that are repeated many times yet have no apparent function) for which we find no corresponding expression.
There is no doubt that understanding the human genome will lead to an understanding of and treatment for many genetic diseases. In fact gene therapies have been developed for some diseases already based upon the information and techniques generated by the Human Genome Project; 13 such therapies are underway at this time. There is speculation that the precise boundary between genetically influenced behavior and environmentally influenced behavior will be described. This will put an end to the old 'nature versus nurture' debate. Are we as we are because of our genes or do we behave in response to external forces? Understanding the complete human genome will help untie this knotty question.
Beyond the scope of this short review, we should consider some of the far-reaching possible consequences of the Human Genome Project. Upon completion we will have a detailed blueprint, a plan in chemical terms, for a human being. Perhaps we will also have the technical ability to manipulate this blueprint, to change the design, at will. Since the beginning of time mankind has sort of stumbled about the surface of the Earth and reproduced in a basically random manner. Likewise, through disease and wars, mankind has left this life and removed genes from the total gene pool in a random manner. There have been attempts to apply eugenic concepts to humanity. These have for the most part been condemned. However, with the information gleaned from the human genome project, we may have to face the reality of being able to alter the human being, not just as an individual but as a species. You may debate whether this is good or bad for many years but it is certain that it will be possible.