Transcription is the first step of gene expression, in which a particular segment of DNA is copied into RNA by the enzyme RNA polymerase. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as acomplementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes. During transcription, a DNA sequence is read by an RNA polymerase, which produces a complementary,antiparallel RNA strand called a primary transcript. As opposed to DNA replication, transcription results in an RNA complement that includes the nucleotide uracil (U) in all instances where thymine (T) would have occurred in a DNA complement. Also unlike DNA replication where DNA is synthesised, transcription does not involve an RNA primer to initiate RNA synthesis.
Transcription can be reduced to the following steps, each moving like a wave along the DNA.
- One or more sigma factors initiate transcription of a gene by enabling binding of RNA polymerase to promoter DNA.
- RNA polymerase moves a transcription bubble, like the slider of azipper, which splits the double helix DNA molecule into two strands of unpaired DNA nucleotides, by breaking the hydrogen bonds between complementary DNA nucleotides.
- RNA polymerase adds matching RNA nucleotides that are paired with complementary DNA nucleotides of one DNA strand.
- RNA sugar-phosphate backbone forms with assistance from RNA polymerase to form an RNA strand.
- Hydrogen bonds of the untwisted RNA + DNA helix break, freeing the newly synthesized RNA strand.
- If the cell has a nucleus, the RNA may be further processed (with the addition of a 3'UTR poly-A tail and a 5'UTR cap) and exits to the cytoplasm through the nuclear pore complex.
The stretch of DNA transcribed into an RNA molecule is called a transcription unit and encodes at least one gene. If the gene transcribed encodes aprotein, the result of transcription is messenger RNA (mRNA), which will then be used to create that protein via the process of translation. Alternatively, the transcribed gene may encode for either non-coding RNA genes (such as microRNA, lincRNA, etc.) or ribosomal RNA (rRNA) or transfer RNA (tRNA), other components of the protein-assembly process, or other ribozymes.[1]
A DNA transcription unit encoding for a protein contains not only the sequence that will eventually be directly translated into the protein (the coding sequence) but also regulatory sequences that direct and regulate the synthesis of that protein. The regulatory sequence before (i.e., upstream from) the coding sequence is called the five prime untranslated region (5'UTR), and the sequence following (downstream from) the coding sequence is called thethree prime untranslated region (3'UTR).[1]
Transcription has some proofreading mechanisms, but they are fewer and less effective than the controls for copying DNA; therefore, transcription has a lower copying fidelity than DNA replication.[2]
As in DNA replication, DNA is read from 3' end → 5' end during transcription. Meanwhile, the complementary RNA is created from the 5' end → 3' end direction. This means its 5' end is created first in base pairing. Although DNA is arranged as two antiparallel strands in a double helix, only one of the two DNA strands, called the template strand, is used for transcription. This is because RNA is only single-stranded, as opposed to double-stranded DNA. The other DNA strand (the non-template strand) is called the coding strand, because its sequence is the same as the newly created RNA transcript (except for the substitution of uracil for thymine). The use of only the 3' end → 5' end strand eliminates the need for the Okazaki fragments seen in DNA replication.[1]
In virology, the term may also be used when referring to mRNA synthesis from a RNA molecule (i.e. RNA replication). For instance, the genome of an negative-sense single-stranded RNA (ssRNA -) virus may serve as a template to transcribe a positive-sense single-stranded RNA (ssRNA +) molecule, since the positive-sense strand contains the information needed to translate the viral proteins for viral replication afterwards. This process is catalysed by a viral RNA replicase.