Team:WHU-China/templates/standardpage modelingCas9
From 2013.igem.org
(Difference between revisions)
IgnatzZeng (Talk | contribs) |
IgnatzZeng (Talk | contribs) |
||
Line 70: | Line 70: | ||
<tr><td class="topstike">d/aCas9</td><td class="topstike">Deactivated Cas9, no matter whether it’s an activator or inhibitor</td></tr> | <tr><td class="topstike">d/aCas9</td><td class="topstike">Deactivated Cas9, no matter whether it’s an activator or inhibitor</td></tr> | ||
</table> | </table> | ||
- | <center><em>Table 1. Symbol table of Cas9Off Model</em></br></center> | + | <center><em><b>Table 1. Symbol table of Cas9Off Model</b></em></br></center> |
</div> | </div> | ||
Line 111: | Line 111: | ||
<tr><td class="topstike"></td><td class="topstike"></td><td class="topstike">Ave. G/C 12.4</td><td class="topstike"></td></tr> | <tr><td class="topstike"></td><td class="topstike"></td><td class="topstike">Ave. G/C 12.4</td><td class="topstike"></td></tr> | ||
</table> | </table> | ||
- | <center><em>< | + | <center><em><b>Table 2. The relation of G/C frequence and single mismatch tolerance</b></em></br></center></br> |
The mismatch tolerance is roughly determined from the data of the references, for details(pdf version) please <a href="https://static.igem.org/mediawiki/2013/2/22/WHUTargeting_analysis.pdf">click here</a>. A low tolerance sequence with single mismatch on at least 7 positions has significant performance drop. A high tolerance sequence with single mismatch at more than 16 position can perform as well as the original sequence in guiding Cas9.</br></br> | The mismatch tolerance is roughly determined from the data of the references, for details(pdf version) please <a href="https://static.igem.org/mediawiki/2013/2/22/WHUTargeting_analysis.pdf">click here</a>. A low tolerance sequence with single mismatch on at least 7 positions has significant performance drop. A high tolerance sequence with single mismatch at more than 16 position can perform as well as the original sequence in guiding Cas9.</br></br> | ||
The relationship of G/C frequence with single mismatch tolerance can be explained by the fact that the abundance in G/C make the gRNA binds to DNA more stable, and single mismatch is not strong enough to disturb the binding. This suggests that the F() may be a sigmoid function. But to determine this sigmoid function ( and to determine b) requires more specific experiment data of d/aCas9 binding kinetics, which is not available. </br></br> | The relationship of G/C frequence with single mismatch tolerance can be explained by the fact that the abundance in G/C make the gRNA binds to DNA more stable, and single mismatch is not strong enough to disturb the binding. This suggests that the F() may be a sigmoid function. But to determine this sigmoid function ( and to determine b) requires more specific experiment data of d/aCas9 binding kinetics, which is not available. </br></br> | ||
- | This | + | This guess is also supported by later analysis in the comparison of the Cas9 binding model, which assumed a normal proportional relationship between △G(i) and △G’. The results shows that the Cas9-gRNA not only is not sensitive to energy change in DNA-gRNA binding when △G’ surpass some threshold, but also not sensitive to such energy flux when △G’ is lower than some threshold. </br></br> |
+ | |||
+ | A rough ω is calculated from Fig S7C and Fig.5D of [3] (both cases are intolerant to single mismatch). </br> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/2/23/WHU2013Refineca2.png" /> | ||
+ | </br> | ||
+ | The performance of these parameters and the model are checked by compared the predicted value with the data from Fig.2B of [4] and Fig.5C of [3]. Noticed that all the following figure using data normalize by the activity of “wildtype” gRNA, thus the b is not required for the prediction.</br></br> | ||
- | |||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/5/58/Dcas9%E5%BB%BA%E6%A8%A12.jpg" /></div> | <img src="https://static.igem.org/mediawiki/2013/5/58/Dcas9%E5%BB%BA%E6%A8%A12.jpg" /></div> | ||
- | <center><em>< | + | <center><em><b>Figure 2. Model prediction compared with data from Fig.2B of [4]</b></em></br></center></br> |
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/4/4b/Dcas9%E5%BB%BA%E6%A8%A11.jpg" /></div> | <img src="https://static.igem.org/mediawiki/2013/4/4b/Dcas9%E5%BB%BA%E6%A8%A11.jpg" /></div> | ||
- | <center><em>< | + | <center><em><b>Figure 3. Model prediction compared with data from Fig.5CB of [3]</b></em></br></center></br> |
These data are collected from 1’ end truncation or consecutive mutation experiment of gRNA. In Both figure, as the column number grows, the end truncation/end mutations become more serious, and the total energy of DNA-gRNA binding drops. The prediction of the model is near-linear, but the data show great non-lineality. Obvious platforms formed in the 4-8 column of Fig.2 and column 3-9 of Fig.3, which suggest the gRNA-Cas9 complex is not sensitive for the energy loss cause by the continuous mismatch / truncation at these stage.</br></br> | These data are collected from 1’ end truncation or consecutive mutation experiment of gRNA. In Both figure, as the column number grows, the end truncation/end mutations become more serious, and the total energy of DNA-gRNA binding drops. The prediction of the model is near-linear, but the data show great non-lineality. Obvious platforms formed in the 4-8 column of Fig.2 and column 3-9 of Fig.3, which suggest the gRNA-Cas9 complex is not sensitive for the energy loss cause by the continuous mismatch / truncation at these stage.</br></br> | ||
- | On the las part of the model. Kinetic analysis reveals that both concentration and reaction time are important for off-target control.</br></br></br> | + | On the las part of the model. Kinetic analysis reveals that both concentration and reaction time are important for off-target control.</br> |
+ | <div style="text-align:center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/c/c0/WHUTheor.png" /></div> | ||
+ | <em> | ||
+ | <center><b>Figure 4. Theoretical curves from the Cas9 cleaving reaction</b></br> | ||
+ | The curves displaying changes of two different cleaved products. Boundary conditions were set as [A0]=1.0, [B0]=[C0]=0, ka=0.2 min-1,kb=0.1 min-1 for red line; And [A0]=1.0, [B0]=[C0]=0, ka=0.1 min-1,kb=0.05 min-1 for blue line. | ||
+ | </center></br> | ||
+ | </em> | ||
+ | </br></br> | ||
<a name="derivation"></a> | <a name="derivation"></a> | ||
Line 138: | Line 150: | ||
<img src="https://static.igem.org/mediawiki/2013/0/04/WHUCas9gRNA.png" /> | <img src="https://static.igem.org/mediawiki/2013/0/04/WHUCas9gRNA.png" /> | ||
</div> | </div> | ||
- | <center><em>Figure | + | <center><em>Figure 5. schematic picture of Cas9 digestion, modified from [1]</em></center></br> |
<b>Step1.</b> Set up the binding sequence</br> | <b>Step1.</b> Set up the binding sequence</br> | ||
The input will be the 21nt of the target prior to the GG of the PAM, and the corresponding 21nt of the potential off-target sequence. The reason for why we need a 21nt sequence rather than 20nt is that the NN model using the adjacent 2nt as inputs. In order to completely consider the impact of the 20nt targeting sequence of gRNA, we need to consider the 21st base to make the calculation comprehensive. Hereby we explain our way to process inputs using an example. Mismatch base pairs are highlighted in red. </br></br> | The input will be the 21nt of the target prior to the GG of the PAM, and the corresponding 21nt of the potential off-target sequence. The reason for why we need a 21nt sequence rather than 20nt is that the NN model using the adjacent 2nt as inputs. In order to completely consider the impact of the 20nt targeting sequence of gRNA, we need to consider the 21st base to make the calculation comprehensive. Hereby we explain our way to process inputs using an example. Mismatch base pairs are highlighted in red. </br></br> | ||
Line 185: | Line 197: | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/5/59/WHUDeltaGiCas9.png" /></br></div></br> | <img src="https://static.igem.org/mediawiki/2013/5/59/WHUDeltaGiCas9.png" /></br></div></br> | ||
- | where N is the total number of phosphates in the duplex, and [Na+] is the total concentration of monovalent cations from all sources (the same equation works for | + | where N is the total number of phosphates in the duplex, and [Na+] is the total concentration of monovalent cations from all sources (the same equation works for sodium, potassium, and ammonium )over a range of monovalent concentration of 0.05 to1M.</br> </br> |
- | sodium, potassium, and ammonium )over a range of monovalent concentration of 0.05 to1M.</br> </br> | + | |
<b>Step6.</b> Calculate △G’ We assume △G’ takes up a form of <img src="https://static.igem.org/mediawiki/2013/d/dc/WHUVectorW.png"/> . Where “a” is an 1×19 vector that contain △G(1) to △G(19) as its value, ω is the weight vector. Only the impact of DNA-gRNA interaction (“a”) is counting as a variable, and the △G contributed by other interaction(eg. protein-DNA interaction) are considered as a constant b. This is also why this model cannot predict Cas9 off-target rate of a target without PAM(NGG), which interact with Cas9 rather than gRNA. (Assumption 4) </br></br> | <b>Step6.</b> Calculate △G’ We assume △G’ takes up a form of <img src="https://static.igem.org/mediawiki/2013/d/dc/WHUVectorW.png"/> . Where “a” is an 1×19 vector that contain △G(1) to △G(19) as its value, ω is the weight vector. Only the impact of DNA-gRNA interaction (“a”) is counting as a variable, and the △G contributed by other interaction(eg. protein-DNA interaction) are considered as a constant b. This is also why this model cannot predict Cas9 off-target rate of a target without PAM(NGG), which interact with Cas9 rather than gRNA. (Assumption 4) </br></br> | ||
Line 246: | Line 257: | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/0/0e/WHURFPFluores.png" style="width:100%;height:auto;"></br></div> | <img src="https://static.igem.org/mediawiki/2013/0/0e/WHURFPFluores.png" style="width:100%;height:auto;"></br></div> | ||
- | <center><em>Figure | + | <center><em><b>Figure 6. dCas9 regulation on promoter J23119 (extracted from [3])</b></em></center></br> |
Notice that, on Figure 5, the RFP started to decrease exponentially 10min after the adding of inducer. This is only possible, when v[mRNA] is hold as a constant. So d[mRNA]/dt=0, which means [TF] is a constant. In this equation, [TF] means the concentration of transcription factor that binding to the promoter, while dCas9 is the only transcription factor in this experiment. According to table 2.1 and 2.2 in [9], the typical mRNA lifetime in E.coli is 2-5 min, the time for protein (Cas9) transcription and translation is 5 min. So the Cas9-DNA binding can achieve equilibrium within (10-5-5~10-5-2) 0~3 min in vivo. So the time needed to achieve equilibrium is much shorter than the experiment time-scale both in vivo and in vitro. </br></br> | Notice that, on Figure 5, the RFP started to decrease exponentially 10min after the adding of inducer. This is only possible, when v[mRNA] is hold as a constant. So d[mRNA]/dt=0, which means [TF] is a constant. In this equation, [TF] means the concentration of transcription factor that binding to the promoter, while dCas9 is the only transcription factor in this experiment. According to table 2.1 and 2.2 in [9], the typical mRNA lifetime in E.coli is 2-5 min, the time for protein (Cas9) transcription and translation is 5 min. So the Cas9-DNA binding can achieve equilibrium within (10-5-5~10-5-2) 0~3 min in vivo. So the time needed to achieve equilibrium is much shorter than the experiment time-scale both in vivo and in vitro. </br></br> | ||
Line 259: | Line 270: | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/7/78/WHUDeltaG1prime.png" /></br></div> | <img src="https://static.igem.org/mediawiki/2013/7/78/WHUDeltaG1prime.png" /></br></div> | ||
- | + | Therefore ω can be calculated (See result).</br></br> | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | Therefore | + | |
In order to predict the off-target rate of d/aCas9. Following equation can be derived. At equilibrium,</br> | In order to predict the off-target rate of d/aCas9. Following equation can be derived. At equilibrium,</br> | ||
Line 292: | Line 298: | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/5/5e/WHUKm.png" /></br></div> | <img src="https://static.igem.org/mediawiki/2013/5/5e/WHUKm.png" /></br></div> | ||
- | It’s also hard to fit into present data, as there is no kinetic data for Cas9 available now. | + | It’s also hard to fit into present data, as there is no kinetic data for Cas9 available now. But we can employ this function to draw figure 4. |
- | + | ||
- | But this function | + | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
The Figure shows that even the ka and kb of the on-target binding is twice as large as ka and kab of the off-target binding, the off target rate will still grows drastically as the time goes on. </br></br> | The Figure shows that even the ka and kb of the on-target binding is twice as large as ka and kab of the off-target binding, the off target rate will still grows drastically as the time goes on. </br></br> | ||
Line 307: | Line 304: | ||
So in addition to control the concentration of Cas9, control the expressing time of Cas9 is also important for off-target rate control. Cas9’s expression can be stoped as soon as possible when acceptable theoretical editing rate is reached, in order to reduce off-target rate. </br></br> | So in addition to control the concentration of Cas9, control the expressing time of Cas9 is also important for off-target rate control. Cas9’s expression can be stoped as soon as possible when acceptable theoretical editing rate is reached, in order to reduce off-target rate. </br></br> | ||
- | + | <a name="discussion"></a> | |
+ | <h1 style="font-size:20px;"><b> | ||
+ | 5. Discussion</br></b></h1></br> | ||
+ | <div style="text-align:left"> | ||
+ | There may be three reason for the correlation variation throughout output 1-19.</br></br> | ||
- | + | First, the Cas9 has a mismatch tolerance for the 5’ end of gRNA. This is backed by all studies [1,2,3,4,7,8]. </br></br> | |
- | + | ||
- | + | Second, there are flaws in the calculation of terminal energy. As all terminal mismatch of RNA and most for DNA are stabilizing [5,6]. The NN model may fail to catch all these stabilizing effect. So improvement of the energy calculation rules may help to fix the negative correlation. </br></br> | |
- | </br></br> | + | |
- | + | Three, the NN model is derived from the binding energy database of free binding DNA double strands, while we employed it to calculate Cas9 influenced RNA-DNA binding. We considered this as the prime source of error in our model. And it may contribute to the funny correlation valley in △G(8)~△G(12). Or, maybe the valley means that Cas9 is indeed relatively insensitive to energy changes in these position. </br></br> | |
+ | |||
+ | </div> | ||
<a name="addendum"></a> | <a name="addendum"></a> | ||
<h1 style="font-size:20px;"><b> | <h1 style="font-size:20px;"><b> | ||
- | + | 6. Addendum</br></b></h1></br> | |
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/6/6e/WHUDpdt.png" /></br></div> | <img src="https://static.igem.org/mediawiki/2013/6/6e/WHUDpdt.png" /></br></div> | ||
- | In a typical endonuclease environment, | + | In a typical endonuclease environment, <img src="https://static.igem.org/mediawiki/2013/2/28/WHU2013Refineca3.png" /> and <img src="https://static.igem.org/mediawiki/2013/d/d5/WHU2013Refineca4.png" /> are always hold. Even in Pattanayak’s paper[1], though the total DNA concentration is 200nM, the concentration every single kind of DNA(with certain sequence) is lower than 0.1nM, which is much lower than KM of any typical restriction enzyme.</br></br> |
But still, the MM equation remains valid. Because, first, under these conditions, [E] (free E concentration) doesn't change much, because most "enzymes" are in free form and they don't do anything; second, some time after enzyme and substrate are mixed the concentrations of free enzyme sites and of substrate complexed will reach a steady state.[17] </br> | But still, the MM equation remains valid. Because, first, under these conditions, [E] (free E concentration) doesn't change much, because most "enzymes" are in free form and they don't do anything; second, some time after enzyme and substrate are mixed the concentrations of free enzyme sites and of substrate complexed will reach a steady state.[17] </br> | ||
Line 328: | Line 331: | ||
<img src="https://static.igem.org/mediawiki/2013/3/38/WHUAbccascade.png" /></br> | <img src="https://static.igem.org/mediawiki/2013/3/38/WHUAbccascade.png" /></br> | ||
</div> | </div> | ||
+ | |||
+ | |||
+ | Pattanayak’s in vitro experiment can reveal the off-target rate in vivo. Because in the experiment the DNA and gRNA-Cas9 concentration is 200nM and 100nM respectively. Every single kind of DNA has a abundance equals to or less than 0.1% (which is approximately the abundance of wild type sequence, the most abundant one), so the concentration of a specific DNA is on the same power(or less than) 0.1nM. Therefore, </br></br> | ||
+ | |||
+ | <div style="text-align:center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/3/35/WHUDNAcas9.png" /></br></div> | ||
+ | Nucleolus size according to [16], in vivo protein concentration of mammalian cell from [9] | ||
+ | </br></br> | ||
+ | The DNA-Cas9 ratio is of the same order, so it’s reasonable to use the experimental data to predict the Cas9 behavior in vivo. </br></br> | ||
+ | |||
+ | |||
<a name="reference"></a> | <a name="reference"></a> |
Revision as of 13:35, 28 October 2013
1. Overview
For a pdf version of the tandem promoter modeling part,click here
This model aims at predicting the off-target rate of any Cas9-based system in vivo. It has the following key ideas.
2. Symbol table, Assumption and reasons.
Symbol | |
[ ] | The symbol of concentration, i.e. [A] means the concentration of A |
△G’ | Difference in Modified Gibbs Free Energy. It’s assumed to determine the binding constant between gRNA-Cas9 and DNA |
△G(i) | The calculated △G for the ith position of gRNA-DNA interaction |
a | The input of △G’, a vector consist of △G’(1)-(21) |
b | The constant representing all interaction in the binding process other than the gRNA-DNA interaction. |
ω | The weight vector |
F() | Relation function |
Ka | Association constant of gRNA-Cas9 and DNA |
Kd | Dissociation constant of gRNA-Cas9 and DNA |
[A]0 | the concentration of certain sequence in the pre-selection library |
[A]tot | the concentration of all DNA sequence in the pre-selection library |
[C] | the concentration of certain sequence in the post-selection library |
[C]tot | the concentration of all DNA sequence in the post-selection library |
A’ | the number of certain sequence we sampled from the pre-selection library |
Atot’ | the number of all sequence we sampled from the pre-selection library |
P’ | the number of certain sequence we sampled from the post-selection library |
Ptot’ | the number of all sequence we sampled from the post-selection library |
θ | Cas9 targeting efficiency |
S | Substrate, DNA |
E | Enzyme, gRNA-Cas9 |
P | Product, double strands broken DNA |
A | The intact DNA duplex |
B | the DNA molecule in which one of the two strands has been cleaved at the recognition site for the restriction enzyme |
C | the DNA molecules in which both strands have been cleaved at the recognition site |
ka,kb | The two apparent first-order reaction constant of the two steps of cleaving of Cas9 |
k1,k-1,kcat | Reaction constants |
KM | MM constant |
R | Gas constant |
T | Absolute temperature |
pb | Binding probability |
pc | Cutting probablity |
Abbreviation | |
dCas9 | Deactivated Cas9, Cas9 inhibitor, a Cas9 with two mutations D10A and H841A |
aCas9 | Cas9 activator, a dCas9 that fused with a activator domain like VP64, TAL and omega subunit of RNAP. |
d/aCas9 | Deactivated Cas9, no matter whether it’s an activator or inhibitor |
3. Modeling result
We employ a NN nearest neighbor model to calculate the △G(i) between gRNA and DNA on each NN position. From the first nucleotide of the target area of gRNA to the 20th, △G(i) of totally 21 position are calculated. We first proved the feasibility of our idea by calculating the correlation between △G(i) and cutting efficiency (employing data from [1] CLTA1,2,3).Sequence | Single Mismatch tolerance | G/C | Ref |
TCATGCTGTTTCATATGATC | low | 7 | [4] |
AACTTTCAGTTTAGCGGUCU | low | 8 | [3] |
TGTGAAGAGCTTCACTGAGT | low | 9 | [1] |
GATGCCGTTCTTCTGCTTGT | low | 10 | [8] |
AGTCCTCATCTCCCTCAAGC | low | 10 | [1] |
GAGATGATCGCCCCTTCTTC | low | 11 | [2] |
CTCCCTCAAGCAGGCCCCGC | low | 15 | [1] |
Ave. G/C 10 | |||
GCAGATGTAGTGTTTCCACA | high | 9 | [1] |
GGTGGTGCAGATGAACTTCA | high | 10 | [8] |
GGGGCCACTAGGGACAGGAT | high | 13 | [2] |
GTCCCCTCCACCCCACAGTG | high | 14 | [2] |
GGGCACGGGCAGCTTGCCGG | high | 16 | [8] |
Ave. G/C 12.4 |
4. Model derivation
4.1. Calculation of △G’ of DNA-gRNA binding The calculation method of △G(i) and △G’ is modified from the NN nearest neighbor model introduced in [2].5. Discussion
There may be three reason for the correlation variation throughout output 1-19.
First, the Cas9 has a mismatch tolerance for the 5’ end of gRNA. This is backed by all studies [1,2,3,4,7,8].
Second, there are flaws in the calculation of terminal energy. As all terminal mismatch of RNA and most for DNA are stabilizing [5,6]. The NN model may fail to catch all these stabilizing effect. So improvement of the energy calculation rules may help to fix the negative correlation.
Three, the NN model is derived from the binding energy database of free binding DNA double strands, while we employed it to calculate Cas9 influenced RNA-DNA binding. We considered this as the prime source of error in our model. And it may contribute to the funny correlation valley in △G(8)~△G(12). Or, maybe the valley means that Cas9 is indeed relatively insensitive to energy changes in these position.