Team:HokkaidoU Japan/Promoter

From 2013.igem.org

(Difference between revisions)
Line 72: Line 72:
-
<h2>Theoretic Prediction of Promoter Strength Distribution</h2>
+
 
-
<p>The study by Brewster et al. [5] made it possible to theoretically predict the transcription efficiency using the promoter sequence, at least to a certain extent. To predict it, we need to follow these 3 steps.</p>
+
 
 +
<h1>MODELING</h1>
 +
 
 +
<p>We tried to theoretically predict the strength distribution of 4096 promoters, which were artificially created by random mutation. We followed these 3 steps, referring the study by Brewster <span class="italic">et al.</span><sup><a href="#cite-1">[1]</a></sup>.</p>
<ol>
<ol>
-
  <li>Calculate the binding energy of promoter and &sigma; factor using the sequence</li>
+
<li>Calculate the binding energy of each promoter and &sigma;-factor using the sequence</li>
-
  <li>Convert the binding energy to the probability that RNAP binds promoter</li>
+
<li>Convert the binding energy to the probability that RNAP binds promoter using the method of statistical mechanics</li>
-
  <li>Convert the binding probability to the transcription efficiency</li>
+
<li>Utilizing the binding probability as the transcription efficiency</li>
</ol>
</ol>
-
<p>Using this theory, we tried to find the strength distribution of 4096 promoters, which were artificially created by random mutation.
+
<h2>STEP 1: Calculation of Binding Energy</h2>
-
  </p><p>As the first step, we must find the binding energy of each promoter. As we mutated only -35 region, we only use this region for calculations. The binding energy is the energy needed for two bodies to bind. This is formulated below.
+
<p>First, we found the binding energy of RNAP and our promoters. As we mutated only -35 region, we only use this region for calculations. Here we define the binding energy &epsilon; as the energy <span class="italic">released</span> by RNAP’s binding to promoter. Simply saying, the higher is the binding energy, the stronger is the binding. We referred the data in Kenney <span class="italic">et al.</span><a href="#cite-2"><sup>[2]</a></sup> to calculate each binding energy.
-
</p>
+
 
-
\[
+
<p>The distribution of computed 4096 promoters' binding energies is shown below. The horizontal axis stands for &epsilon; (at 0.05 kT intervals) and the vertical axis sample number.</p>
-
\varepsilon_{\mathrm{bind}} = \Delta G = G_{\mathrm{bound} } - G_{\mathrm{unbound}}
+
-
\]
+
-
<p>Provided that G stands for Gibbs free energy. This means that the lower is the binding energy, the higher is the binding strength.  We referred the data in Kenney et al. [6] to calculate each binding energy.
 
-
</p>
 
-
<p>The distribution of computed 4096 promoters' binding energies is shown below. The horizontal axis stands for $\varepsilon_{-35}$: the binding energy of -35 region and RNAP (at $0.05k_{B}T$ intervals) and the vertical axis sample number.</p>
 
<div class="fig fig400 para">
<div class="fig fig400 para">
   <img src="https://static.igem.org/mediawiki/2013/b/bb/HokkaidoU2013_promoter_Modeling_fig1.png">
   <img src="https://static.igem.org/mediawiki/2013/b/bb/HokkaidoU2013_promoter_Modeling_fig1.png">
-
   <div><span class="bold">M-Fig. 1 Visualized data. A portion enclosed with red square is randomized -35 region.</span></div>
+
   <div><span class="bold">Fig. 1 Visualized data.</span> A portion enclosed with red square is randomized -35 region.</span></div>
</div>
</div>
<div class="fig fig400 para">
<div class="fig fig400 para">
   <img src="https://static.igem.org/mediawiki/2013/1/16/HokkaidoU2013_promoter_Modeling_fig2.png">
   <img src="https://static.igem.org/mediawiki/2013/1/16/HokkaidoU2013_promoter_Modeling_fig2.png">
-
   <div><span class="bold">M-Fig. 2 The result is an approximate normal distribution.</span></div>
+
   <div><span class="bold">Fig. 2 Promoters distribution.</span> The result is an approximate normal distribution.</div>
</div>
</div>
-
<p>Next, we found RNAP's binding probability using this binding energy. To simplify the calculation, we assumed the following.</p>
+
<h2>STEP 2: Conversion from Binding Energy to Binding Probability</h2>
 +
 
 +
 
 +
<p>Next, we estimated the binding probability. On this step, we used the method of statistical mechanics. So we assumed the following.</p>
<ul>
<ul>
-
   <li>The environment is a closed system</li>
+
   <li>The cell is a closed system</li>
-
   <li>P RNAPs bind somewhere on DNA</li>
+
   <li>There are <span class="italic">P</span> RNAPs bound somewhere on DNA</li>
-
   <li>There are $N_{\mathrm{NS}}$ non-specific binding sites and one specific binding site (=promoter) on DNA</li>
+
   <li>The number of bases is <span class="italic">N</span> (bp) and 1 of <span class="italic">N</span> bases is +1 position of the promoter</li>
-
  <li>Define $\varepsilon_{\mathrm{NS}}$ as binding energy of RNAP and non-specific binding site</li>
+
-
  <li>Define $\varepsilon_{\mathrm{S}}$ as binding energy of RNAP and promoter</li>
+
</ul>
</ul>
-
<p>According to statistical mechanics, there is a relation between $p_i$, the probability of state $i$ and $E_i$, the energy of this state as the following.</p>
+
<p>The principle of statistical mechanics is very easy; any state emerges with the same probability. So we counted up the number of state. A state stands for any information of all the particles in the system, so the number is enormous. <span class="italic">W</span> represents this number. Here <span class="italic">W</span> can be separated as the following.
 +
\[
 +
W=W_{\mathrm{unbound}}+W_{\mathrm{bound}}
 +
\]
 +
 +
$W_{\mathrm{bound}}$ represents the number of state where the promoter is occupied and $W_{\mathrm{unbound}}$ unoccupied.</p>
 +
 +
<p>The purpose of this step is to find the ratio $W_{\mathrm{unbound}}:W_{\mathrm{bound}}$. Concerning the position of RNAP,
\[
\[
-
p_i \propto \exp\left(-\frac{E_i}{k_{\mathrm{B}}T}\right)
+
W_{\mathrm{unbound}}:W_{\mathrm{unbound}}&=&{\frac{N-1!}{P!(N-P-1)!} \times W_{\mathrm{R}}(E)}
\]
\]
-
<p>This fact gives the following calculation result.</p>
+
 
 +
</p>
 +
 
 +
 
 +
 
<div class="fig fig800">
<div class="fig fig800">

Revision as of 08:32, 27 October 2013

Maestro E.coli

Promoter

Overview

Proteins are expressed in mainly 2 steps. First mRNA is polymerized using DNA as a template. Then ribosome binds mRNA and translates it into protein.

Promoter is a DNA sequence initiating transcription from DNA to mRNA. If transcriptional efficiency is defined as "promoter strength", stronger promoter has ability to transcribe more mRNA. This should lead in stronger expression of proteins.

We have created several promoters by randomization of -35 sequence followed by selection. In promoters -35 region is responsible for supporting binding of RNA polymerase (RNAP). This interaction results in closed complex which is rate-limiting step. We focused on this rather transparent function to introduce variability in promoter strength.

Overview about Transcription

We explain the importance of promoter sequence. But before that let's look how RNA binds to a promoter with the help of fig.1.

Fig. 1 mRNA transcription starts with promoter engagement, continues to initiation, elongation, and then it comes to termination (omitted in the figure).

First transcription complex must be formed. Transcription complex polymerizes mRNA in 2 steps. Initiation step starts polymerization followed by elongation step. Promoter serves crucial role on engagement and initiation. After closed complex formation DNA double helix pulled apart to form transcription bubble. During this closed complex changes into open complex. This marks the beginning of mRNA polymerization. Transcription bubble exposes deoxyribonucleotides to form new hydrogen bonds with ribonucleotides. In short DNA serves as template to make mRNA.

Transcription factors related to Promtoer

RNA complex consist of 5 core enzymes and a σ factor. σ factor plays crucial role in promoter recognition. It recognizes and binds to promoter region on DNA sequence and helps to assemble the core enzyme and start transcription. σ factor has several analogs, E. coli which is widely used bacteria by iGEMers is using σ70 for house-keeping gene expression at exponential growth. Bacterial promoter can be roughly divided into three regions; -10 region, spacer and -35 region. Bases in promoter are numbered in descending order from transcription start base which is defined as +1.

-10 region
The -10 region is structurally very important because it is initiates promoter melting in RNAP-promoter complex. This is essential to form open complex. Promoter consensus sequence is TATAAT at -12 to -7 position.
Spacer
Spacer is thought to increase flexibility of σ factor binding requirements.
-35 region
-35 region is second in importance to -10. It does not energetically contribute to promoter melting. There reports on promoters without -35 region. In those case TG motif at about -16 is thought as alternative. -35 consensus sequence is TTGACA at from -36 to -31.

Promoters function to bind RNAP is a reason it is genetically well preserved. Most frequently conserved residues in the sequence make a "consensus sequence". In 1983, -35 and -10 consensus was showed to be TTGACA and TATAAT respectively [Fig 2]. Horizontal axis of the figures represents the position upstream of translation ignition point. Letter at the top of the figure signifies more than over 39% occurrence of that letter at that position. Larger occurrence over 54% is represented as upper case letter. Consensus sequence published by Marjan De Mey et al. (2007) shows that -10 and -35 region is highly preserved [Fig 3]. There other less preserved regions. The tetramer (TRTG) upstream from -10 region is called TG motif. Upstream of -35 region is UP element and downstream of -10 region is discriminator region. These sequences are thought to bind core enzymes. So these sequences are also well conserved. Each sequence is important to control promoter strength.

Fig. 2 Consensus sequence shown in review article in 1983 [3].
Fig. 3 Consensus sequence prepared in 2007 [4].

So we went and designed "consensus promoter". It should have strongest binding energy to RNAP. By adding mutations to -35 we sought to construct promoters with various binding energies. There are three reasons why we used -35 region.

First, -35 region is just supporting binding with σ factor. It has less vital role compared to -10 region, which energetically contributes to formation of open complex. Having this in mind we changed -35 region to easily change promoter binding strength without severe errors in promoter function.

Second, RNAP and promoter binding orchestrated by σ factor binding. Complex formation is thought to be rate-limited step. We thought that -35 region performs a simpler function. For this reason, mutations at -35 region can be thought as more structurally transparent.

Recently published research reported the making of promoter family by randomizing both -35 and -10 regions, changing spacer length. However it would be too much of the task for us to make some many changes. By changing hexamer sequence of -35 region there are 4096 variation. This number is a lot smaller compared to mutating every promoter position. So we can get result with a smaller library size.

With these 3 reasons we went on to construct our promoter family.

MODELING

We tried to theoretically predict the strength distribution of 4096 promoters, which were artificially created by random mutation. We followed these 3 steps, referring the study by Brewster et al.[1].

  1. Calculate the binding energy of each promoter and σ-factor using the sequence
  2. Convert the binding energy to the probability that RNAP binds promoter using the method of statistical mechanics
  3. Utilizing the binding probability as the transcription efficiency

STEP 1: Calculation of Binding Energy

First, we found the binding energy of RNAP and our promoters. As we mutated only -35 region, we only use this region for calculations. Here we define the binding energy ε as the energy released by RNAP’s binding to promoter. Simply saying, the higher is the binding energy, the stronger is the binding. We referred the data in Kenney et al.[2] to calculate each binding energy.

The distribution of computed 4096 promoters' binding energies is shown below. The horizontal axis stands for ε (at 0.05 kT intervals) and the vertical axis sample number.

Fig. 1 Visualized data. A portion enclosed with red square is randomized -35 region.
Fig. 2 Promoters distribution. The result is an approximate normal distribution.

STEP 2: Conversion from Binding Energy to Binding Probability

Next, we estimated the binding probability. On this step, we used the method of statistical mechanics. So we assumed the following.

  • The cell is a closed system
  • There are P RNAPs bound somewhere on DNA
  • The number of bases is N (bp) and 1 of N bases is +1 position of the promoter

The principle of statistical mechanics is very easy; any state emerges with the same probability. So we counted up the number of state. A state stands for any information of all the particles in the system, so the number is enormous. W represents this number. Here W can be separated as the following. \[ W=W_{\mathrm{unbound}}+W_{\mathrm{bound}} \] $W_{\mathrm{bound}}$ represents the number of state where the promoter is occupied and $W_{\mathrm{unbound}}$ unoccupied.

The purpose of this step is to find the ratio $W_{\mathrm{unbound}}:W_{\mathrm{bound}}$. Concerning the position of RNAP, \[ W_{\mathrm{unbound}}:W_{\mathrm{unbound}}&=&{\frac{N-1!}{P!(N-P-1)!} \times W_{\mathrm{R}}(E)} \]

M-Fig. 3 Quoted from [5].

Therefore, the binding probability is

\begin{align*} p&=\frac{W_{\mathrm{bound}}}{W_{\mathrm{unbound}}+W_{\mathrm{bound}}} \\[6pt] &=\frac{ \frac{P}{N_{\mathrm{NS}}} \exp\left(-\frac{\varepsilon_{\mathrm{S}} - \varepsilon_{\mathrm{NS}}}{k_{\mathrm{B}}T} \right) }{1+\frac{P}{N_{\mathrm{NS}}} \exp\left(-\frac{\varepsilon_{\mathrm{S}} - \varepsilon_{\mathrm{NS}}}{k_{\mathrm{B}}T} \right) } \\[6pt] \mathrm{suppose\ that} &\frac{P}{N_{\mathrm{NS}}} \exp\left(-\frac{\varepsilon_{\mathrm{S}} - \varepsilon_{\mathrm{NS}}}{k_{\mathrm{B}}T} \right) \ll 1 \\[6pt] &\approx \frac{P}{N_{\mathrm{NS}}} \exp\left(-\frac{\varepsilon_{\mathrm{S}} - \varepsilon_{\mathrm{NS}}}{k_{\mathrm{B}}T} \right) \\[6pt] &\propto \exp\left(-\frac{\varepsilon_{-35}}{k_{\mathrm{B}}T} \right) \end{align*}

The binding energy of -35 region is exponentially proportional to the binding probability.

The last step is to convert the binding probability to the transcription efficiency. Let us assume these suppositions.

  • RNAP bound to promoter promptly initiate transcription
  • There is no "traffic jam" of RNAPs on DNA (i. e., RNAP's transcription initiation is rate-limiting)

These assumptions mean that we can directly use the value of binding probability as transcription energy in an arbitrary unit. In this way, we get following conclusive result.

M-Fig. 4 The horizontal axis stands for the transcription efficiency.

As you can see in this figure, the strengths of our promoter families vary about 1000 fold!