DNA


The abbreviation of DEOXYRIBONUCLEIC ACID, organic chemical of complex molecular structure that is found in all prokaryotic and eukaryotic cells and in many viruses. DNA codes genetic information for the transmission of inherited traits.
A brief treatment of DNA follows. For full treatment, see Genetics and Heredity, Principles of; Nucleic Acids.

DNA was first discovered in 1869, but its role in genetic inheritance was not demonstrated until 1943. In 1953, James Watson and Francis Crick determined that the structure of DNA is a double-helix polymer, a spiral consisting of two DNA strands wound around each other. Each strand is composed of a long chain of monomer nucleotides. The nucleotide of DNA consists of a deoxyribose sugar molecule to which is attached a phosphate group and one of four nitrogenous bases: two purines (adenine and guanine) and two pyrimidines (cytosine and thymine). The nucleotides are joined together by covalent bonds between the phosphate of one nucleotide and the sugar of the next, forming a phosphate-sugar backbone from which the nitrogenous bases protrude. One strand is held to another by hydrogen bonds between the bases; the sequencing of this bonding is specific--i.e., adenine bonds only with thymine, and cytosine only with guanine.

The configuration of the DNA molecule is highly stable, allowing it to act as a template for the replication of new DNA molecules, as well as for the production (transcription) of the related RNA (ribonucleic acid) molecule. A segment of DNA that codes for the cell's synthesis of a specific protein is called a gene.

DNA replicates by separating into two single strands, each of which serves as a template for a new strand. The new strands are copied by the same principle of hydrogen-bond pairing between bases that exists in the double helix. Two new double-stranded molecules of DNA are produced, each containing one of the original strands and one new strand. This "semiconservative" replication is the key to the stable inheritance of genetic traits.

Within a cell, DNA is organized into dense protein-DNA complexes called chromosomes. In eukaryotes, the chromosomes are located in the nucleus, although DNA also is found in mitochondria and chloroplasts. In prokaryotes, which do not have a membrane-bound nucleus, the DNA is found as a single circular chromosome in the cytoplasm. Some prokaryotes, such as bacteria, and a few eukaryotes have extrachromosomal DNA known as plasmids, which are autonomous, self-replicating genetic material. Plasmids have been used extensively in recombinant DNA technology to study gene expression.

The genetic material of viruses may be single- or double-stranded DNA or RNA. Retroviruses carry their genetic material as single-stranded RNA and produce the enzyme reverse transcriptase, which can generate DNA from the RNA strand.Since the 1970s, biologists have made major advances in understanding the molecular nature of genes and their functioning through the use of the powerful experimental techniques of recombinant DNA. The term recombinant DNA literally means the joining or recombining of two pieces of DNA from two different species. Recombinant DNA techniques allow an investigator to biologically purify (clone) a gene from one species by inserting it into the DNA of another species, where it is replicated along with the host DNA. Actually, the term includes a variety of molecular manoeuvres, including cleaving DNA by microbial enzymes called endonucleases, splicing or recombining fragments of DNA, inserting eucaryotic DNA into bacteria so that large quantities of the foreign genetic material can be produced, determining the nucleotide sequence of a segment of DNA, and even chemically synthesizing DNA.

Gene cloning ranks as one of the most significant accomplishments involving recombinant DNA. This procedure has enabled researchers to use E. coli to produce virtually limitless copies of donor genes from other organisms, including human beings. To perform gene cloning, researchers first use a class of bacterial enzymes called restriction endonucleases to remove from the donor cell a fragment of double-stranded DNA that contains the genes of interest. Restriction endonucleases can be thought of as "biological scissors"; each of these enzymes cleaves DNA at a specific site defined by a sequence of four or more nucleotides. (see also Index: clone)

Once the desired DNA fragment has been removed from the donor cell, it must somehow be inserted into the bacterial cell. This is usually done by first inserting the donor DNA into a plasmid, one of the small, circular pieces of DNA that are found in E. coli and many other bacteria. Plasmids generally remain separate from the bacterial chromosome (although some plasmids do occasionally become incorporated into the chromosome), but they carry genes that can be expressed in the bacterium. Furthermore, plasmids generally replicate and are passed on to daughter cells along with the chromosome. By treating a plasmid with the same restriction endonuclease that was used to cleave the donor DNA, it is possible to incorporate the foreign DNA fragment into the plasmid ring. This can occur because the restriction enzyme cleaves double-stranded DNA in such a way as to leave chemically "sticky" end pieces. It is thus possible for the sticky-ended fragment of foreign DNA to attach to the complementary sticky ends of the cut-open plasmid ring. This laboratory procedure, called "gene splicing," is the major operation of recombinant DNA technology.

The molecular biologist then uses the plasmids as vectors to carry the foreign gene into bacteria. This is accomplished by exposing bacteria to the plasmids. Plasmids are highly infective, and so many of the bacteria will take up the particles; to insure maximum uptake the bacteria are often treated with calcium salts, which makes their membranes more permeable. The incorporation of the plasmids into the bacterial cells marks the transfer of the genes of one species into the genome of another. Alternatively, bacteriophages are sometimes used as vectors to carry the foreign DNA into the bacteria. As a result of the high infectivity of plasmids and the rapid growth of E. coli, investigators can quickly culture large numbers of bacteria, many of which will have incorporated the foreign (often human) DNA. (As many as 1 109 bacteria can grow in one millilitre of medium overnight.) Researchers can select the bacteria that contain the foreign DNA by attaching to the fragment of DNA a gene that confers resistance to an antibiotic such as tetracycline. By treating the culture with tetracycline, all bacteria that have not incorporated the gene for resistance will be killed. The remaining cells can be grown in enormous numbers, most of which will contain the cloned fragment of foreign DNA.

The cloned DNA can be removed from the bacterial culture as follows. First, the bacteria are broken apart and the DNA content is separated by centrifugation. The DNA fraction is then heated, which causes the double-stranded molecules to separate into single strands. Upon cooling, each single strand will reanneal, or hybridize, to another single strand to which it is complementary (adenine opposite thymine, cytosine opposite guanine). This form of molecular hybridization has made possible the use of the complementary DNA (cDNA) as a probe for picking out the desired gene.

Investigators also use DNA to pick out a specific gene from a large piece of genomic DNA. In some cases where a cell makes large amounts of specific mRNA (such as globin mRNA in human red cells), the extracted mRNA can be treated with reverse transcriptase to produce cDNA. When labelled with a radioisotope, this then becomes a cDNA probe for the human globin gene. The technique for isolating and hybridizing the fragment of interest is called "southern blot" analysis.

The huge number of copies attained by gene cloning enables researchers to analyze the cloned DNA exhaustively, down to its nucleotide sequence. Nucleotide sequencing is accomplished by performing a series of biochemical manoeuvres on small, endonuclease-produced fragments (oligonucleotides) and then placing them in the correct order. Remarkably, molecular biologists have automated the entire procedure so that a "gene machine" can determine the nucleotide sequence of a gene in a relatively short time. In fact, if the amino-acid sequence of a protein is known, researchers can formulate the nucleotide sequence that produced it and then synthesize the gene. This has been done for insulin.

As has been discussed, a given restriction endonuclease can produce a very large number of discrete DNA fragments, which can be inserted into vector DNA incised by the same endonuclease. Researchers can clone the fragments as described above to produce a so-called library of genomic DNA. This library can be used to study the natural gene whenever a new probe is obtained. A given restriction enzyme generally will produce fragments that are the same for all individuals. However, different people occasionally vary in the size of specific fragments. This is due to the fact that in any string of several hundred bases in human DNA there occur single base changes, usually harmless substitutions that either change or remove the enzyme site. These fragment variations, known as restriction-fragment-length polymorphisms, are inherited and hence form genetic markers that can be used to trace mutant genes to which they are linked. If the fragments are separated by agarose gel electrophoresis and overlaid with a radioactively labelled cDNA probe, only the fragment whose DNA is complementary to that of the probe will hybridize with it. This hybridization can be detected by exposing the DNA fragments to photographic film; the resultant image is called an autoradiograph. When restriction-fragment-length polymorphisms are present, different-sized fragments will hybridize with the same probe. Linkage studies between a disease-producing mutant gene and a polymorphism will locate the gene to either the polymorphic or "wild-type" (i.e., normal) fragment. If large family studies show that the gene is linked closely enough to the fragment so that recombination is rare, this technique can be used for diagnosing the presence of a genetic disease for which the biochemical defect is unknown (see below).

One further result of the ability to analyze gene structure at the molecular level has been the discovery of its remarkable plasticity. Investigators have found sequences of nucleotides that have the capacity to move from one position on the chromosome to another, often carrying neighbouring sequences with them and thus rearranging the DNA. These "jumping genes"--known as transposable elements, or transposons--have been found in both procaryotes and eucaryotes. In the higher mammals, including humans, they are the source of the tremendous diversity necessary for antibody production by the immune system (see below Immunogenetics). It is also possible that some forms of cancer may develop as a result of these rearrangements.

In addition to producing copies of genes for molecular analysis and for use in medical diagnosis, recombinant DNA procedures have been used to convert bacteria into "factories" for the synthesis of foreign proteins. This is a tricky operation, for not only must the foreign DNA be inserted into the host bacterium, but it also must be incorporated into an operon so that its product will be expressed. Despite the technical difficulties, investigators have achieved the expression of foreign genes within E. coli. This fact has tremendous potential in medicine, as the "engineered" bacteria can be used to produce therapeutically valuable human proteins. Insulin, growth hormone, and antihemophilic globulin (the clotting factor missing in persons with hemophilia) are three such proteins that have been commercially "manufactured" via recombinant DNA in E. coli. As a result of this "engineering," the host bacterium has been provided with new genetic properties. Both the scientific and lay communities have expressed concern over the creation of microorganisms with new genetic properties. Perhaps this genetic tailoring of infectious agents like E. coli could visit new and devastating epidemics on the population or could introduce cancer-causing genes into infected people. In the United States, federal agencies, with the assistance of molecular biologists, have laid down stringent guidelines to ensure the control of microorganisms containing the recombinant plasmids. The most effective measure has been the requirement to use strains of E. coli that have been modified so that they can survive in the laboratory but not in nature (and hence are not infectious). In addition, the guidelines require a physical containment system that securely seals off the laboratory, thereby preventing the escape of the bacteria from the facility. Molecular biologists have also called attention to the fact that recombinant processes are constantly occurring in nature, albeit at a slower rate.

DNA TYPING, in genetics, method of isolating and making images of sequences of DNA (deoxyribonucleic acid). The technique was developed in 1984 by the British geneticist Alec Jeffreys, after he noticed the existence of certain sequences of DNA (called minisatellites) that do not contribute to the function of a gene but are repeated within the gene and in other genes of a DNA sample. Jeffreys also determined that each organism has a unique pattern of these minisatellites, the only exception being multiple individuals from a single zygote (e.g., identical twins).The procedure for creating a DNA fingerprint consists of first obtaining a sample of cells containing DNA (e.g., from skin, blood, or hair), extracting the DNA, and purifying it. The DNA is then cut at specific points along the strand with substances called restriction enzymes. This produces fragments of varying lengths that are sorted by placing them on a gel and then subjecting the gel to an electric current ( electrophoresis): the shorter the fragment the more quickly it will move toward the positive pole (anode). The sorted, double-stranded DNA fragments are then subjected to a blotting technique in which they are split into single strands and transferred to a nylon sheet. The fragments undergo autoradiography in which they are exposed to DNA probes--pieces of synthetic DNA that have been made radioactive and that bind to the minisatellites. A piece of X-ray film is then exposed to the fragments, and a dark mark is produced at any point where a radioactive probe has become attached. The resultant pattern of these marks can then be analyzed.

An early use of DNA fingerprinting was in legal disputes, notably to help solve crimes and determine paternity. The technique was challenged, however, over concerns about sample contamination, faulty preparation procedures, and erroneous interpretation of the results. Efforts have been made to improve reliability. (see also Index: criminal investigation)

If only a small amount of DNA is available for fingerprinting, a polymerase chain reaction (PCR) may be used to create thousands of copies of a DNA segment. PCR is an automated procedure in which certain oligonucleotide primers are used to repeatedly duplicate specific segments of DNA. Once an adequate amount of DNA has been produced, the exact sequence of nucleotide pairs in a segment of DNA can be determined using one of several biomolecular sequencing methods. New automated equipment has greatly increased the speed of DNA sequencing and made available many new practical applications, including pinpointing segments of genes that cause genetic diseases, mapping the human genome, engineering drought-resistant plants, and producing biological drugs from genetically altered bacteria.

DNA AS AN INFORMATION CARRIER: TRANSCRIPTION AND TRANSLATION OF THE GENETIC CODE
As has been stated, the Watson-Crick model provides an explanation of how a gene can carry hereditary information in the form of a chemical code. This section will describe the genetic code and explain how it governs the biochemical processes of the cell. (see also Index: translation)Before turning to the language of the code, it is necessary to explain what it is that the code specifies. It is now known that genes encode instructions for the production of proteins, which are largely responsible for the structure and function of the organism. Proteins are large, complex molecules consisting of one or more polypeptide chains that, in turn, are composed of amino acids linked together by peptide bonds. Proteins play many roles in organisms. Some proteins make up structural components of the organism; an example is the protein collagen in vertebrate animals. Others perform particular functions; for example, the protein hemoglobin transports oxygen in the blood of mammals, and the proteins of the immune system (immunoglobulins) protect against diseases in many members of the animal kingdom. Still other proteins regulate the rate of specific biochemical reactions in cells. This latter class of proteins, called enzymes, functions as biological catalysts. Enzymes permit chemical reactions to occur with extreme rapidity at temperatures normal to living cells. Without these proteins, the molecular interactions would require much longer periods of time and much higher temperatures, and they would lose their specificity. It is certainly no exaggeration to say that life depends on enzymes.

Among eucaryotes, DNA never leaves the cell nucleus, despite the fact that protein synthesis takes place on the ribosomes, structures that lie in the cytoplasm (i.e., in the portion of the cell outside of the nucleus). Even among procaryotes, which have no membrane-enclosed nucleus, the DNA does not directly carry its instructions to the ribosomes. In both kinds of organisms, this function is performed by a type of RNA that copies the DNA message and carries it to the site of protein synthesis. Aptly enough, this RNA is called messenger RNA, or mRNA for short. The copying of the DNA instructions into messenger RNA is called the transcription function of DNA, to distinguish it from the replication function discussed above.The sequence of the genetic letters, A (adenine), T (thymine), C (cytosine), and G (guanine), in the DNA is first transcribed into the corresponding sequence of the letters A, U (uracil), C, and G in the messenger RNA. This occurs through the action of the enzyme RNA polymerase. This enzyme synthesizes RNA in a test tube from a mixture of the A, U, C, and G bases, but it does so only in the presence of a primer DNA. The sequence of the bases in the primer is copied in the RNA. The steps involved in this process are as follows: (1) the DNA double helix unwinds by breaking the hydrogen bonds between the corresponding bases in the paired strands; (2) the RNA polymerase forms the bonds between the RNA bases that are complementary to the bases in the DNA; and (3) the messenger RNA thus formed passes into the cytoplasm and becomes attached to a ribosome. Ribosomes consist of proteins and another type of RNA, ribosomal RNA (rRNA).

The process of protein synthesis is represented diagrammatically in Figure 6. The information contained in the sequence of the bases (letters) in the messenger RNA is then translated into a sequence of amino acids in a protein. This requires the presence of still another molecule that is capable of recognizing the code for a specific amino acid and selectively making the amino acid available at the right point in the protein synthesis, a soluble RNA fraction within cells that can bind amino acids. Soluble, or transfer, RNA (SRNA, or tRNA) is a single-stranded molecule that forms about 20 percent of the total cellular RNA. If amino acids and a source of energy (usually ATP) are added to a mixture of transfer RNA's, reversible binding of the amino acids to the RNA molecules occurs. Furthermore, each amino acid is bonded to a specific transfer RNA molecule by a specific activating enzyme. There are at least 20 different kinds of transfer RNA's and activating enzymes that correspond to the 20 amino acids commonly found in proteins. The amino acid-transfer RNA complex becomes attached to the ribosome with its messenger RNA molecule; the addition of the amino acid to the growing polypeptide chain then occurs. A sequence of three nitrogenous bases (anticodon) on the transfer RNA molecule pairs with a complementary sequence (codon) on the messenger RNA molecule, which is held in the correct position by the ribosome. Once the recognition has occurred, a peptide bond is formed between the amino acid bound to the transfer RNA and the growing polypeptide chain.The accuracy of the model described in Figure 6 has been confirmed by the achievement of protein synthesis in the test tube. This synthesis requires a DNA template (primer DNA), precursor nucleotide molecules, ribosomes, transfer RNA's, amino acids, and a set of enzymes and certain other factors.

The processes of transcription and translation, as described above, can be represented thus: DNA  RNA protein. Soon after its elucidation, this understanding of the genetic control of protein synthesis became known as "the central dogma" of molecular genetics. Included as part of the dogma was the belief that reverse transfer of information does not occur; in other words, there is no storage of information in the protein molecules and no transcription of protein back into nucleic acids or of RNA back into DNA.The central dogma has since been modified to accommodate the discovery that reverse transcription of RNA to DNA does occur, as first demonstrated in some viruses. These viruses, called retroviruses, have a genome composed of RNA. When retroviruses enter a host cell, they produce an enzyme called reverse transcriptase. This enzyme permits the transcription of the viral RNA into DNA, which then may become incorporated into the genetic material of the host cell.

A second modification was necessitated by the discovery that not all DNA codes for protein synthesis. As discussed below, some of the noncoding DNA is involved in regulating the biochemical processes of the cell. The amount of noncoding DNA is small in procaryotes, but in eucaryotes it may be most of the cell's DNA.

Reading the code.
It is necessary to understand how the four letters--A, T, C, and G--specify, or code, for 20 different amino acids. If a single letter coded for an amino acid, only four amino acids could be specified. If two bases were needed to specify an amino acid, then 16 different combinations could be constructed, again an insufficient number (20 amino acids must be accounted for). Combinations of three letters allow 64 different words to be constructed, more than the necessary minimal number. A three-letter, triplet, code could be constructed in at least three different ways: (1) with words overlapping; (2) with words not overlapping and punctuated; and (3) with words not overlapping and not punctuated. An overlapping code is composed of words that overlap each other--i.e., the letters of any given word may belong to one, two, or three words. The DNA might contain, for example, the sequence AGCGTTACG; the first word is AGC, the second CGT, and so on. This type of code is improbable, because of the restrictions it would place upon the possible sequence of amino acids in protein. As the example above shows, if the first word is AGC, the second word must begin with C, etc. Examination of amino-acid sequences in a protein such as hemoglobin indicates that any amino acid can follow any other--a possibility not allowed for by an overlapping code.
If the code is nonoverlapping, a problem of distinguishing words from each other arises. DNA contains no spaces separating the words as in written sentences; therefore, there must be other indications of specific starting points for messenger RNA synthesis. The base sequence AGC AGC AGC . . . could be punctuated by the presence of a fourth base, T, between each AGC triplet. This would reduce the number of possible triplets to 27. That a punctuated code of this type is not realized is seen from the evidence of the degeneracy in the code for some amino acids. The degeneracy means that some amino acids are coded for by more than one triplet, and a punctuated code does not allow enough words. A second objection to this type of code comes from a consideration of the effects of mutation on the coding sequence. If one of the punctuation marks mutates to another base, or a coding base mutates to a punctuation mark, the resulting sequence will be complete nonsense functionally.

The third possibility is a nonoverlapping, nonpunctuated code, in which the reading starts from a specific point. In all organisms studied in this respect this is the method of coding used. A knowledge of the base sequence in the messenger RNA and the resulting amino-acid sequence in protein reveals the code for each amino acid. The triplet UUU, for example, is the code for the amino acid phenylalanine, corresponding to the sequence AAA in the DNA. Poly-A (AAA) and poly-C (CCC) are messenger RNA's codes for lysine and proline, respectively.

Other triplets were tested for their coding abilities by synthesizing messenger RNA molecules with varying proportions of the two bases. If, for example, a mixture of the two bases U and C in a 5 : 1 proportion are synthesized into RNA, the possible triplets and their probable frequency in the synthetic messenger RNA can be easily determined. The triplet UUU will be most common and will appear with the frequency 5/6  5/6 5/6; the triplets UUC, UCU, and CUU will appear in the frequencies of 5/6  5/6 1/6; the triplets UCC, CUC, and CCU will be the next most frequent and will appear with a frequency of 5/6 1/6 1/6; while the triplet CCC should appear only 1/216 of the time. A messenger RNA of this composition should result in the incorporation into protein of eight different amino acids. In fact, only four amino acids were present in the protein produced; this means that several of these triplets encode for the same amino acid and therefore that the code is degenerate.

The RNA code triplets (or codons) and the amino acids for which they stand are shown in Table 3. Triplets have been discovered that encode for starting and for stopping the synthesis of protein chains in E. coli. Many proteins of E. coli begin with the amino acid methionine. Two different transfer RNA's for methionine are known to exist, only one of which functions to initiate protein synthesis. After synthesis of the protein, an enzyme may remove a portion of the beginning of the chain to eliminate the obligatory methionine molecule. The second transfer RNA for methionine allows this amino acid to be incorporated into the middle of a polypeptide.

Termination of the synthesis of a polypeptide chain is signalled by three different RNA codons that do not specify an amino acid: UAA, UAG, and UGA. These triplets were discovered as nonsense mutations that produced premature cessation of protein synthesis in many different genes. Specific proteins called release factors can read these codons and release the polypeptide chain from the ribosome.