1. Overview
This model aims at predicting the final output of a tandem promoter system, which can be constituted of any number of and any type of sub-promoter(including sub-tandem promoter) in any order and any species. The Key idea of the model is that the strength of a promoter system is proportional to the probability of at least one RNA Polymerase (mentioned as RNAP latter) binding on the promoter.
2. Symbol table, Assumption and reasons.
-
1.It’s assumed that the promoter strength is measured in the same species, with identical environment and growing stage. This ensure the assumption that the concentration of all subunits of RNAP, all subunits of ribosome, all RNA degradation enzymes, all kind of proteases and all transportation protein are thermodynamically identical. Otherwise, the model may fail to work properly.
-
2.In all measurement, the contexts of the promoter are the same. i.e. same RBS, terminator, protein sequence, up stream element, down stream element and DNA supercoiling.
-
3.All transcriptional factors are not considered in this version of the model, but can be included in the model with some modification to the equations.
-
4.The promoter region is accessible for RNAP(and all kinds of its subunits), which means it’s not in heterochromatin region or any other condition that hamper a normal RNAP-DNA interaction.
-
5.The probability of RNAP binding on the region between two sub-promoter within the tandem promoter system is neglected. As it contributes too little to final ptot.
-
6.The RNAP-DNA binding is assumed to stay on equilibrium in the model. This is reasonable because the open complex formation is a slow rate limiting step of transcription. So in the time scale of open complex formation, RNAP-DNA binding can always reach its equilibrium in neglectable time[1][2]. It’s also observed that the inactive RNAP-DNA complex can be detected on the DNA[3].
-
7.We assume different RNAP-Promoter complexes have a transcription rate α for simplicity. Because if they do not, the difference of α can be incorporated in pi. For derivation, see section 4.2 and 4.3.
3. Modeling result
We found that the strength of a tandem promoter system can be interpreted by a simple equation:
Where qi is the probability of a RNAP(with all of its subunit) not forming a RNAP-with complex with the ith sub-promoter, n the number of sub-promoters, j the coordinative factor, and ξ the strength constant.
If we define the highest possible expression level of a promoter in certain species is 1. Then the equation 1 become normalized.
Figure 1. Model fitting result
Y-axis represent the normalized promoter strength, X-axis the number of sub-promoter
The blue dot is data extracted from ref.[4] fig.2, the red line is the prediction made by our model, the red dotted line is the 95% prediction bound
This model explains 99% of the tandem promoter strength variation caused by
-
1.number of sub-promoter,
-
2.kind of sub-promoter,
-
3.order of sub-promoter .
(With a R-square=0.992 and confidence bond of 95% when fitted with our data)
(wait for latter data)
4.Model derivation
The promoter strength may be influenced by various factors. We need to simplify the system into some reasonable toy model by wiping out all relatively trivial factor.
4.1 Expression level Measurement
We use the fluorescence strength to indicate the strength of the promoter(Normalized by a inner reference fluorescence protein(FP) - mCherry. Please check details at the experiment part 网址). Because when the exciting light is fixed, the fluorescence is proportional to the concentration of FP. And FP can be lighted up in a short time after they are synthesis.
4.2 Translation and transcription
According to the Central Dogma
So we can write down the following ODE, which is similar to the equations in [5].
Where α means the mRNA producing constant, λ the mRNA degradation constant, v the protein synthesizing, k the protein degradation constant, and [RP] is the concentration of RNAP-promoter promoter.
In equation 4, the protein increasing speed is determined by [mRNA] and v. With same RBS, v relates to the efficiency and concentration of ribosome and concentration of amino acids in the cell, which can be considered identical under the experiment condition of comparing different promoter. The protein degradation speed is determined by [protein] and k. k relates to protease system in the cell, which can also be considered as identical in measurements between different promoter.
In equation 3, the mRNA increasing speed is determined by [RP] and α, and its degradation depends on [mRNA] and λ. Both α and λ can be treated as constant in the experimental condition of comparing different promoter. As α depends on the transcription initiation efficiency, which is assumed to be identical for any RNAP-DNA complex for simplicity. This is reasonable because if α varies, the difference of α can be incorporated in [RP] (and finally in pi, see latter derivation). Though this part of the equation varies from the equations in [5], it is justified by the phenomenon that when [RNAP] and [DNA] is hold in a constant, the UTP incorporation is a zero order reaction [2]. And λ depends on the concentration of RNase which doesn’t varies in different promoter measurement.
Therefore, because we are interested in the steady state of the protein expression. We can set,
We can consider [protein]eq as the indicator of the promoter strength, and let vα/ λk=ξ
So the strength of the promoter is directly related to the concentration of the RNAP-DNA complex of this promoter.
4.3 RNAP binding and transcription initiation
The open complex formation reaction is as follow.
Where RPc is the inactive complex, RPi is the intermediate complex and RPo the open complex.
The reaction can be combined with Central Dogma to be:
Because K1 happens in a much smaller time scale. The probability of finding the polymerase
on the promoter will be given by its equilibrium constant K1.[1]
To evaluate the probability of polymerase binding (pi) we must sum the Boltzmann weights over all possible states of P polymerase molecules on DNA.
This equation calculate the total Boltzmann weight of no RNAP binding to the target promoter, with N represent the number of non-specific sites on the DNA, P the effective RNAP number, ε^NS the non-specific binding energy, kb the Boltzmann constant and T the temperature.
This equation calculate the total Boltzmann weight of one RNAP binding to promoter i, with ε^Si means the specific binding energy of promoter i.
So the probability of a RNAP binding to promoter i is,
With Ztot represent the sum of all Boltzmann weight of all different condition.
So the probability of RNAP binding to both promoter i and j is,
when
we have
So we can say, the probability of RNAP binding to two promoter at the same time, equals to the product of the probabilities of RNAP binding to the two promoter respectively.
As only one RNAP is needed to initiate the transcription in a tandem promoter system (the other RNAP will be blocked by the RNAP closest to the transcription initiation point). So the probability of at least one RNAP binding to the promoter is
For a kind of promoter with u copies in a cell (all separated and function independently), the strength of a promoter is, according to equation 5.
the maximum strength possible can be reached when ptot=1,
However, we found this model can not fully explain our data. The fitting result, though has a satisfactory R-square(0.948), fail to explain the great difference between our model prediction and the data when there’s only one promoter in the “tandem promoter system”. This means that the pi we found by curve fitting is not the real pi.
Figure 2.Model fitting result of the simpler model
Figure 3. Curve fitting residual plot of the simpler model
Data analysis shows that the data increase in y much quicker than our prediction, which indicate there will be some kind of cooperation among sub-promoters. This results in pij>pipj. The cooperation can be explained by the fact that the binding possibility of each sub-promoter is actually not completely independent. The Clustering of promoter make a RNAP that falls out from one promoter has a slightly great possibility to bind with the promoter surrounding the former promoter. This phenomenon has not been catched by the Boltzmann factor we used to calculate the relationship between pij and pipj. So in order to fix this failure, it’s alright to add a cooperative term into the model. Therefore equation 2 comes out, with nj as the cooperative factor.
As we’ve showed in figure 1. This model successfully captures the essence of tandem promoter system. With the residual plot as follow.
Figure 4. Curve fitting residual plot of the final model