Children's books      05/23/2020

Devices for reading the genetic code. Biology at the Lyceum. Deoxyribonucleic acid. General information

Every living organism has a special set of proteins. Certain compounds of nucleotides and their sequence in the DNA molecule form the genetic code. It conveys information about the structure of the protein. In genetics, a certain concept has been adopted. According to her, one gene corresponded to one enzyme (polypeptide). It should be said that research on nucleic acids and proteins has been carried out for a fairly long period. Further in the article, we will take a closer look at the genetic code and its properties. A brief chronology of research will also be given.

Terminology

The genetic code is a way of encoding the amino acid protein sequence using the nucleotide sequence. This method of forming information is characteristic of all living organisms. Proteins - natural organic matter with high molecular weight. These compounds are also present in living organisms. They consist of 20 types of amino acids, which are called canonical. Amino acids are arranged in a chain and connected in a strictly established sequence. It determines the structure of the protein and its biological properties. There are also several chains of amino acids in the protein.

DNA and RNA

Deoxyribonucleic acid is a macromolecule. She is responsible for the transmission, storage and implementation of hereditary information. DNA uses four nitrogenous bases. These include adenine, guanine, cytosine, thymine. RNA consists of the same nucleotides, except for the one that contains thymine. Instead, a nucleotide containing uracil (U) is present. RNA and DNA molecules are nucleotide chains. Thanks to this structure, sequences are formed - the "genetic alphabet".

Implementation of information

The synthesis of a protein encoded by a gene is realized by combining mRNA on a DNA template (transcription). There is also a transfer genetic code to the amino acid sequence. That is, the synthesis of the polypeptide chain on mRNA takes place. To encode all amino acids and signal the end of the protein sequence, 3 nucleotides are enough. This chain is called a triplet.

Research History

The study of protein and nucleic acids has been carried out for a long time. In the middle of the 20th century, the first ideas about the nature of the genetic code finally appeared. In 1953, it was found that some proteins are made up of sequences of amino acids. True, at that time they could not yet determine their exact number, and there were numerous disputes about this. In 1953, Watson and Crick published two papers. The first declared the secondary structure of DNA, the second spoke of its admissible copying by means of matrix synthesis. In addition, emphasis was placed on the fact that a particular sequence of bases is a code that carries hereditary information. American and Soviet physicist Georgy Gamov admitted the coding hypothesis and found a method to test it. In 1954, his work was published, during which he put forward a proposal to establish correspondences between amino acid side chains and diamond-shaped "holes" and use this as a coding mechanism. Then it was called rhombic. Explaining his work, Gamow admitted that the genetic code could be triplet. The work of a physicist was one of the first among those that were considered close to the truth.

Classification

After several years, it was proposed various models genetic codes, which are of two types: overlapping and non-overlapping. The first one was based on the occurrence of one nucleotide in the composition of several codons. The triangular, sequential and major-minor genetic code belongs to it. The second model assumes two types. Non-overlapping include combinational and "code without commas". The first variant is based on the encoding of an amino acid by nucleotide triplets, and its composition is the main one. According to the "no comma code", certain triplets correspond to amino acids, while the rest do not. In this case, it was believed that if any significant triplets were arranged sequentially, others located in a different reading frame would turn out to be unnecessary. Scientists believed that it was possible to select a nucleotide sequence that would meet these requirements, and that there were exactly 20 triplets.

Although Gamow et al questioned this model, it was considered the most correct over the next five years. At the beginning of the second half of the 20th century, new data appeared that made it possible to detect some shortcomings in the "code without commas". Codons have been found to be able to induce protein synthesis in vitro. Closer to 1965, they comprehended the principle of all 64 triplets. As a result, redundancy of some codons was found. In other words, the sequence of amino acids is encoded by several triplets.

Distinctive features

The properties of the genetic code include:

Variations

For the first time, the deviation of the genetic code from the standard was discovered in 1979 during the study of mitochondrial genes in the human body. Further similar variants were identified, including many alternative mitochondrial codes. These include the deciphering of the stop codon UGA used as the definition of tryptophan in mycoplasmas. GUG and UUG in archaea and bacteria are often used as starting variants. Sometimes genes code for a protein from a start codon that differs from the one normally used by that species. Also, in some proteins, selenocysteine ​​and pyrrolysine, which are non-standard amino acids, are inserted by the ribosome. She reads the stop codon. It depends on the sequences found in the mRNA. Currently, selenocysteine ​​is considered the 21st, pyrrolizan - the 22nd amino acid present in proteins.

General features of the genetic code

However, all exceptions are rare. In living organisms, in general, the genetic code has a number of common features. These include the composition of the codon, which includes three nucleotides (the first two belong to the determining ones), the transfer of codons by tRNA and ribosomes into an amino acid sequence.

GENETIC CODE(Greek, genetikos referring to origin; syn.: code, biological code, amino acid code, protein code, nucleic acid code) - a system for recording hereditary information in the nucleic acid molecules of animals, plants, bacteria and viruses by alternating the sequence of nucleotides.

Genetic information (Fig.) from cell to cell, from generation to generation, with the exception of RNA-containing viruses, is transmitted by reduplication of DNA molecules (see Replication). The implementation of DNA hereditary information in the process of cell life is carried out through 3 types of RNA: informational (mRNA or mRNA), ribosomal (rRNA) and transport (tRNA), which are synthesized on DNA as on a matrix using the RNA polymerase enzyme. At the same time, the sequence of nucleotides in a DNA molecule uniquely determines the sequence of nucleotides in all three types of RNA (see Transcription). The information of a gene (see) encoding a proteinaceous molecule is carried only by mRNA. The end product of the implementation of hereditary information is the synthesis of protein molecules, the specificity of which is determined by the sequence of their amino acids (see Translation).

Since only 4 different nitrogenous bases are present in DNA or RNA [in DNA - adenine (A), thymine (T), guanine (G), cytosine (C); in RNA - adenine (A), uracil (U), cytosine (C), guanine (G)], the sequence of which determines the sequence of 20 amino acids in the protein, the problem of G. to., i.e., the problem of translating a 4-letter alphabet of nucleic acids into the 20-letter alphabet of polypeptides.

For the first time, the idea of ​​matrix synthesis of protein molecules with the correct prediction of the properties of a hypothetical matrix was formulated by N.K. Koltsov in 1928. In 1944, Avery et al. established that DNA molecules are responsible for the transfer of hereditary traits during transformation in pneumococci . In 1948, E. Chargaff showed that in all DNA molecules there is a quantitative equality of the corresponding nucleotides (A-T, G-C). In 1953, F. Crick, J. Watson and Wilkins (M. H. F. Wilkins), based on this rule and data from X-ray diffraction analysis (see), came to the conclusion that a DNA molecule is a double helix, consisting of two polynucleotide strands linked together by hydrogen bonds. Moreover, only T can be located against A of one chain in the second, and only C against G. This complementarity leads to the fact that the nucleotide sequence of one chain uniquely determines the sequence of the other. The second significant conclusion that follows from this model is that the DNA molecule is capable of self-reproduction.

In 1954, G. Gamow formulated the problem of G. to. in its modern form. In 1957, F. Crick expressed the Adapter Hypothesis, assuming that amino acids interact with the nucleic acid not directly, but through intermediaries (now known as tRNA). In the years that followed, all the principal links in the general scheme for the transmission of genetic information, initially hypothetical, were confirmed experimentally. In 1957 mRNAs were discovered [A. S. Spirin, A. N. Belozersky et al.; Folkin and Astrakhan (E. Volkin, L. Astrachan)] and tRNA [Hoagland (M. V. Hoagland)]; in 1960, DNA was synthesized outside the cell using existing DNA macromolecules as a template (A. Kornberg) and DNA-dependent RNA synthesis was discovered [Weiss (S. V. Weiss) et al.]. In 1961, a cell-free system was created, in which, in the presence of natural RNA or synthetic polyribonucleotides, protein-like substances were synthesized [M. Nirenberg and Matthaei (J. H. Matthaei)]. The problem of cognition of G. to. consisted of a study common properties code and its actual decoding, i.e., finding out which combinations of nucleotides (codons) encode certain amino acids.

The general properties of the code were elucidated regardless of its decoding and mainly before it by analyzing the molecular patterns of the formation of mutations (F. Crick et al., 1961; N. V. Luchnik, 1963). They come down to this:

1. The code is universal, i.e. identical, at least in the main, for all living beings.

2. The code is triplet, that is, each amino acid is encoded by a triple of nucleotides.

3. The code is non-overlapping, i.e. a given nucleotide cannot be part of more than one codon.

4. The code is degenerate, that is, one amino acid can be encoded by several triplets.

5. Information about the primary structure of the protein is read from mRNA sequentially, starting from a fixed point.

6. Most of the possible triplets have "meaning", i.e., encode amino acids.

7. Of the three "letters" of the codon, only two (obligate) are of primary importance, while the third (optional) carries much less information.

Direct decoding of the code would consist in comparing the nucleotide sequence in the structural gene (or the mRNA synthesized on it) with the amino acid sequence in the corresponding protein. However, this way is still technically impossible. Two other ways were used: protein synthesis in a cell-free system using artificial polyribonucleotides of known composition as a matrix and analysis of the molecular patterns of mutation formation (see). The first brought positive results earlier and historically played a big role in deciphering G. to.

In 1961, M. Nirenberg and Mattei used as a matrix a homo-polymer - a synthetic polyuridyl acid (i.e., artificial RNA of the composition UUUU ...) and received polyphenylalanine. From this it followed that the codon of phenylalanine consists of several U, i.e., in the case of a triplet code, it stands for UUU. Later, along with homopolymers, polyribonucleotides consisting of different nucleotides were used. In this case, only the composition of the polymers was known, while the arrangement of nucleotides in them was statistical, and therefore the analysis of the results was statistical and gave indirect conclusions. Quite quickly, we managed to find at least one triplet for all 20 amino acids. It turned out that the presence of organic solvents, changes in pH or temperature, some cations and especially antibiotics make the code ambiguous: the same codons begin to stimulate the inclusion of other amino acids, in some cases one codon began to encode up to four different amino acids. Streptomycin affected the reading of information both in cell-free systems and in vivo, and was effective only on streptomycin-sensitive bacterial strains. In streptomycin-dependent strains, he "corrected" the reading from codons that had changed as a result of the mutation. Similar results gave reason to doubt the correctness of G.'s decoding to. with the help of a cell-free system; confirmation was required, and primarily by in vivo data.

The main data on G. to. in vivo were obtained by analyzing the amino acid composition of proteins in organisms treated with mutagens (see) with a known mechanism of action, for example, nitrogenous to-one, which causes the replacement of C by U and A by G. Useful information also provide an analysis of mutations caused by non-specific mutagens, a comparison of differences in the primary structure of related proteins in different types, the correlation between the composition of DNA and proteins, etc.

G.'s decoding to. on the basis of data in vivo and in vitro gave the coinciding results. Later, three other methods for deciphering the code in cell-free systems were developed: binding of aminoacyl-tRNA (i.e., tRNA with an attached activated amino acid) with trinucleotides of a known composition (M. Nirenberg et al., 1965), binding of aminoacyl-tRNA with polynucleotides starting with a certain triplet (Mattei et al., 1966), and the use of polymers as mRNA, in which not only the composition, but also the order of nucleotides is known (X. Korana et al., 1965). All three methods complement each other, and the results are consistent with the data obtained in experiments in vivo.

In the 70s. 20th century there were methods of especially reliable check of results of decoding G. to. It is known that the mutations arising under the influence of proflavin consist in loss or an insertion of separate nucleotides that leads to a shift of a reading frame. In the T4 phage, a number of mutations were induced by proflavin, in which the composition of lysozyme changed. This composition was analyzed and compared with those codons that should have been obtained by a shift in the reading frame. There was a complete match. Additionally, this method made it possible to establish which triplets of the degenerate code encode each of the amino acids. In 1970, Adams (J. M. Adams) and his collaborators managed to partially decipher G. to. by a direct method: in the R17 phage, the base sequence was determined in a fragment of 57 nucleotides in length and compared with the amino acid sequence of its shell protein. The results were in complete agreement with those obtained by less direct methods. Thus, the code is deciphered completely and correctly.

The results of decoding are summarized in a table. It lists the composition of codons and RNA. The composition of tRNA anticodons is complementary to mRNA codons, i.e. instead of U they contain A, instead of A - U, instead of C - G and instead of G - C, and corresponds to the codons of the structural gene (that strand of DNA, with which information is read) with the only difference being that uracil takes the place of thymine. Of the 64 triplets that can be formed by a combination of 4 nucleotides, 61 have "sense", i.e., encode amino acids, and 3 are "nonsense" (devoid of meaning). There is a fairly clear relationship between the composition of triplets and their meaning, which was discovered even when analyzing the general properties of the code. In some cases, triplets encoding a specific amino acid (eg, proline, alanine) are characterized by the fact that the first two nucleotides (obligate) are the same, and the third (optional) can be anything. In other cases (when encoding, for example, asparagine, glutamine), two similar triplets have the same meaning, in which the first two nucleotides coincide, and any purine or any pyrimidine takes the place of the third.

Nonsense codons, 2 of which have special names corresponding to the designation of phage mutants (UAA-ocher, UAG-amber, UGA-opal), although they do not encode any amino acids, but they have great importance when reading information, encoding the end of the polypeptide chain.

Information is read in the direction from 5 1 -> 3 1 - to the end of the nucleotide chain (see Deoxyribonucleic acids). In this case, protein synthesis proceeds from an amino acid with a free amino group to an amino acid with a free carboxyl group. The start of synthesis is encoded by the AUG and GUG triplets, which in this case include a specific starting aminoacyl-tRNA, namely N-formylmethionyl-tRNA. The same triplets, when localized within the chain, encode methionine and valine, respectively. The ambiguity is removed by the fact that the beginning of reading is preceded by nonsense. There is evidence that the boundary between mRNA regions encoding different proteins consists of more than two triplets and that the secondary structure of RNA changes in these places; this issue is under investigation. If a nonsense codon occurs within a structural gene, then the corresponding protein is built only up to the location of this codon.

The discovery and decoding of the genetic code - an outstanding achievement of molecular biology - had an impact on all biol, sciences, in some cases laying the foundation for the development of special large sections (see Molecular genetics). G.'s opening effect to. and the researches connected with it compare with that effect which was rendered on biol, sciences by Darwin's theory.

The universality of G. to. is a direct proof of the universality of the basic molecular mechanisms of life in all representatives of the organic world. Meanwhile, the large differences in the functions of the genetic apparatus and its structure during the transition from prokaryotes to eukaryotes and from unicellular to multicellular ones are probably associated with molecular differences, the study of which is one of the tasks of the future. Since the research of G. to. is only a matter recent years, the significance of the results obtained for practical medicine is only indirect, allowing us to understand the nature of diseases, the mechanism of action of pathogens and medicinal substances. However, the discovery of such phenomena as transformation (see), transduction (see), suppression (see), indicates the fundamental possibility of correcting pathologically altered hereditary information or its correction - the so-called. genetic engineering (see).

Table. GENETIC CODE

First nucleotide of the codon

Second nucleotide of the codon

Third, codon nucleotide

Phenylalanine

J Nonsense

tryptophan

Histidine

Glutamic acid

Isoleucine

Aspartic

Methionine

Asparagine

Glutamine

* Encodes the end of the chain.

** Also encodes the beginning of the chain.

Bibliography: Ichas M. Biological code, trans. from English, M., 1971; Archer N.B. Biophysics of cytogenetic defeats and a genetic code, L., 1968; Molecular genetics, trans. from English, ed. A. N. Belozersky, part 1, M., 1964; Nucleic acids, trans. from English, ed. A. N. Belozersky. Moscow, 1965. Watson J.D. Molecular biology gene, trans. from English, M., 1967; Physiological Genetics, ed. M. E. Lobasheva S. G., Inge-Vechtoma-va, L., 1976, bibliogr.; Desoxyribonucleins&ure, Schlttssel des Lebens, hrsg. v „E. Geissler, B., 1972; The genetic code, Gold Spr. Harb. Symp. quant. Biol., v. 31, 1966; W o e s e C. R. The genetic code, N. Y. a. o., 1967.

is a way of encoding the amino acid sequence of proteins using the sequence of nucleotides in the DNA molecule, characteristic of all living organisms.

The implementation of genetic information in living cells (that is, the synthesis of a protein encoded in DNA) is carried out using two matrix processes: transcription (that is, mRNA synthesis on a DNA matrix) and translation (synthesis of a polypeptide chain on an mRNA matrix).

DNA uses four nucleotides - adenine (A), guanine (G), cytosine (C), thymine (T). These "letters" make up the alphabet of the genetic code. RNA uses the same nucleotides, except for thymine, which is replaced by uracil (U). In DNA and RNA molecules, nucleotides line up in chains and, thus, sequences of “letters” are obtained.

In the nucleotide sequence of DNA there are code "words" for each amino acid of the future protein molecule - the genetic code. It consists in a certain sequence of nucleotides in the DNA molecule.

Three consecutive nucleotides encode the "name" of one amino acid, that is, each of the 20 amino acids is encrypted by a significant code unit - a combination of three nucleotides called a triplet or codon.

At present, the DNA code has been completely deciphered, and we can talk about certain properties that are characteristic of this unique biological system, which provides the translation of information from the "language" of DNA to the "language" of protein.

The carrier of genetic information is DNA, but since mRNA, a copy of one of the DNA strands, is directly involved in protein synthesis, the genetic code is most often written in the "RNA language".

Amino acid Coding RNA triplets
Alanine GCU GCC GCA GCG
Arginine TsGU TsGTs TsGA TsGG AGA AGG
Asparagine AAU AAC
Aspartic acid GAU GAC
Valine GUU GUTS GUA GUG
Histidine CAU CAC
Glycine GSU GGC GGA GYY
Glutamine CAA CAG
Glutamic acid GAA GAG
Isoleucine AAU AUC AUA
Leucine TSUU TSUT TSUA TSUG UUA UUG
Lysine AAA AAG
Methionine AUG
Proline CCC CCC CCA CCG
Serene UCU UCC UCA UCG ASU AGC
Tyrosine UAU UAC
Threonine ACC ACC ACA ACG
tryptophan UGG
Phenylalanine uuu uuc
Cysteine UGU UHC
STOP UGA UAG UAA

Properties of the genetic code

Three consecutive nucleotides (nitrogenous bases) encode the "name" of one amino acid, that is, each of the 20 amino acids is encrypted by a significant code unit - a combination of three nucleotides called triplet or codon.

Triplet (codon)- a sequence of three nucleotides (nitrogenous bases) in a DNA or RNA molecule, which determines the inclusion of a certain amino acid in the protein molecule during its synthesis.

  • Unambiguity (discreteness)

One triplet cannot encode two different amino acids; it encodes only one amino acid. A certain codon corresponds to only one amino acid.

Each amino acid can be defined by more than one triplet. Exception - methionine And tryptophan. In other words, several codons can correspond to the same amino acid.

  • non-overlapping

The same base cannot be present in two adjacent codons at the same time.

Some triplets do not encode amino acids, but are a kind of "road signs" that determine the beginning and end of individual genes (UAA, UAG, UGA), each of which means the cessation of synthesis and is located at the end of each gene, so we can talk about the polarity of the genetic code.

In animals and plants, in fungi, bacteria and viruses, the same triplet encodes the same type of amino acid, that is, the genetic code is the same for all living beings. In other words, universality - the ability of the genetic code to work in the same way in organisms different levels complexity from viruses to humans.The universality of the DNA code confirms the unity of pthe origin of all life on our planet. Genetic engineering methods are based on the use of the universality property of the genetic code.

From the history of the discovery of the genetic code

For the first time the idea of ​​existence genetic code formulated by A. Down and in 1952 - 1954. Scientists have shown that a nucleotide sequence that uniquely determines the synthesis of a particular amino acid must contain at least three links. Later it was proved that such a sequence consists of three nucleotides, called codon or triplet .

The questions of which nucleotides are responsible for incorporating a certain amino acid into a protein molecule and how many nucleotides determine this inclusion remained unresolved until 1961. The theoretical analysis showed that the code cannot consist of one nucleotide, since in this case only 4 amino acids can be encoded. However, the code cannot be a doublet either, that is, a combination of two nucleotides from a four-letter "alphabet" cannot cover all amino acids, since only 16 such combinations are theoretically possible (4 2 = 16).

Three consecutive nucleotides are enough to encode 20 amino acids, as well as a “stop” signal, which means the end of the protein sequence, when the number of possible combinations is 64 (4 3 = 64).

Chapter USE: 2.6. Genetic information in a cell. Genes, genetic code and its properties. Matrix nature of biosynthetic reactions. Biosynthesis of protein and nucleic acids

More than 6 billion people live on Earth. Except for 25-30 million pairs of identical twins, then genetically all people are different. This means that each of them is unique, has unique hereditary characteristics, character traits, abilities, temperament and many other qualities. What determines such differences between people? Of course, the differences in their genotypes , i.e. set of genes in an organism. Each person is unique, just as the genotype of an individual animal or plant is unique. But genetic traits this person are embodied in proteins synthesized in his body. Consequently, the structure of the protein of one person differs, although quite a bit, from the protein of another person. That's why the problem of organ transplants arises, that's why there are allergic reactions to foods, insect bites, plant pollen, and so on. This does not mean that people do not have exactly the same proteins. Proteins that perform the same functions may be the same or very slightly differ by one or two amino acids from each other. But there are no people on Earth (with the exception of identical twins) in whom all proteins would be the same.

Information about the primary structure of a protein is encoded as a sequence of nucleotides in a region of the DNA molecule - the gene. Gene is a unit of hereditary information of an organism. Each DNA molecule contains many genes. The totality of all the genes of an organism makes up its genotype.

Hereditary information is encoded using genetic code . The code is similar to the well-known Morse code, which encodes information with dots and dashes. Morse code is universal for all radio operators, and the differences are only in the translation of signals to different languages. The genetic code is also universal for all organisms and differs only in the alternation of nucleotides that form the genes and code for the proteins of specific organisms.

Properties of the genetic code : triplet, specificity, universality, redundancy and non-overlapping.

So what is the genetic code? Initially, it consists of triplets ( triplets ) DNA nucleotides combined in different sequences. For example, AAT, HCA, ACH, THC, etc. Each triplet of nucleotides encodes a specific amino acid that will be built into the polypeptide chain. So, for example, the CHT triplet encodes the amino acid alanine, and the AAG triplet encodes the amino acid phenylalanine. There are 20 amino acids, and there are 64 possibilities for combinations of four nucleotides in groups of three. Therefore, four nucleotides is enough to encode 20 amino acids. That is why one amino acid can be encoded by several triplets. Some of the triplets do not encode amino acids at all, but start or stop protein biosynthesis.

The actual genetic code is sequence of nucleotides in an mRNA molecule, because it removes information from DNA ( transcription process ) and translates it into a sequence of amino acids in the molecules of synthesized proteins ( translation process ). The composition of mRNA includes nucleotides of ACGU. The nucleotide triplets of mRNA are called codons. The already given examples of DNA triplets on mRNA will look like this - the CHT triplet on mRNA will become the GCA triplet, and the DNA triplet - AAG - will become the UUC triplet. It is the codons of mRNA that reflect the genetic code in the record. So, the genetic code is triplet, universal for all organisms on earth, degenerate (each amino acid is encrypted by more than one codon). Between the genes there are punctuation marks - these are triplets, which are called stop codons . They signal the end of the synthesis of one polypeptide chain. There are tables of the genetic code that you need to be able to use to decipher mRNA codons and build chains of protein molecules (complementary DNA in brackets).