Team:WHU-China/templates/standardpage modelingCas9
From 2013.igem.org
(Difference between revisions)
IgnatzZeng (Talk | contribs) |
IgnatzZeng (Talk | contribs) |
||
Line 17: | Line 17: | ||
1. Overview</b></h1></br> | 1. Overview</b></h1></br> | ||
- | This model aims at predicting the off-target rate of any Cas9-based system in vivo. This model has | + | This model aims at predicting the off-target rate of any Cas9-based system in vivo. The model is constructed on the principles of nucleic acids thermodynamics and kinetic analysis. Data from six papers were analysis and/or used for model fitting. This model has the following key ideas. </br></br> |
- | </br></br> | + | <div style="text-align:center"> |
+ | <img src="https://static.igem.org/mediawiki/2013/2/25/WHUCas9total.png" /></br></div> | ||
+ | The Cas9 cleaving process is divided into two separated reactions - the reversible binding reaction and the irreversible cleaving reaction.</br></br> | ||
- | First, | + | First, The probability of Cas9-DNA binding is majorly determined by the affinity of the gRNA and DNA. A △G’ is assumed to indicate this affinity. The △G’ is determined by △G(i), which is calculated by NN nearest neighbor model of nucleic acid thermodynamics. </br></br> |
- | </br></br> | + | |
- | Second, by analyzing binding equilibrium, dCas9 inhibition data and aCas9 activation data, the model to predict the possibility of gRNA-d/aCas9 binding to certain target in vivo can be constructed. The fitting result of this model | + | Second, by analyzing binding equilibrium, dCas9 inhibition data and aCas9 activation data, the model to predict the possibility of gRNA-d/aCas9 binding to certain target in vivo can be constructed. The fitting result of this model reveals the equation to calculate △G’ from △G(i). </br></br> |
- | </br></br> | + | |
- | Finally, By | + | Finally, By analyzing the Cas9 cleaving process, the link between Cas9-DNA binding probability and editing efficiency can be established.</br></br> |
- | </br></br> | + | |
- | The data for Cas9 editing model fitting is generously provided by Vikram Pattanayak and Prof. David Liu, who has published the paper - High-throughput profiling of off-target DNA cleavage reveals RNA- programmed Cas9 nuclease specificity - on Nature Biotechnology, 11 Aug 2013.[1] The data for Cas9 binding model fitting is extracted from the following figures, Fig 2C, S7B, | + | The data for Cas9 editing model fitting is generously provided by Vikram Pattanayak and Prof. David Liu, who has published the paper - High-throughput profiling of off-target DNA cleavage reveals RNA- programmed Cas9 nuclease specificity - on Nature Biotechnology, 11 Aug 2013.[1] The data for Cas9 binding model derivation and fitting is extracted from the following figures, Fig 2C, S7B, S7E of [2], Fig 5C of [3], Fig 2AB of [4]. The software used to extract high fidelity data is GetData Graph Digitizer V2.22. </br></br> |
- | </br></br> | + | |
Line 42: | Line 40: | ||
<img src="https://static.igem.org/mediawiki/2013/8/8e/WHUTable1bCas9.png"/></div> | <img src="https://static.igem.org/mediawiki/2013/8/8e/WHUTable1bCas9.png"/></div> | ||
</div> | </div> | ||
+ | <center><em>Table 1. Symbol table of Cas9Off Model</em></br></center> | ||
1. As Cas9 need the guiding of gRNA to cut DNA, the unbounded gRNA and Cas9 are ignored in the analysis, and other gRNA and Cas9 are considered to constantly bind to each other. </br> | 1. As Cas9 need the guiding of gRNA to cut DNA, the unbounded gRNA and Cas9 are ignored in the analysis, and other gRNA and Cas9 are considered to constantly bind to each other. </br> | ||
2. The model does not take the 3D structure of DNA, gRNA and DNA-gRNA complex into consideration. As the data is not sufficient to take these factor into consideration. </br> | 2. The model does not take the 3D structure of DNA, gRNA and DNA-gRNA complex into consideration. As the data is not sufficient to take these factor into consideration. </br> | ||
- | 3. The model is based on NN nearest neighbor model of base pairing energy[5]. This model was built for thermodynamic energy calculation of DNA strand interaction. But we employ it to model the gRNA-DNA interaction. This will bring in some inherent flaws. The most prominent one will be when the RNA side is a U and the DNA side is a G. In the NN model it’s considered as a T-G pair, which is not as energetically favorable as U-G [6]. However, there is no model available for DNA-RNA interaction energy calculation yet. So it’s assumed that the energy (△G(i)) calculated from the NN model is to some degree consistent with reality. We believe by employing a better model of energy prediction, the whole Cas9 off-target model will be improved. </br> | + | 3. The model is based on NN nearest neighbor model of base pairing energy[5]. This model was built for thermodynamic energy calculation of DNA strand interaction. But we employ it to model the gRNA-DNA interaction. This will bring in some inherent flaws. The most prominent one will be when the RNA side is a U and the DNA side is a G. In the NN model it’s considered as a T-G pair, which is not as energetically favorable as U-G [6]. However, there is no model available for DNA-RNA interaction energy calculation yet. So it’s assumed that the energy (△G(i)) calculated from the NN model is to some degree consistent with reality. In fact, [7] suggested a rough sort of the tolerance of base mismatch: CC<UC<AG<AA<GA<CA<UG<CT<GG<UT <AC<GT, while the model suggested that CC<AC<TC<AA<TT<GA<GT<GG. </br> |
- | + | 4. We believe by employing a better model of energy prediction, the whole Cas9 off-target model will be improved. </br> | |
- | + | 5. We assume △G’ takes up a form of <img src="https://static.igem.org/mediawiki/2013/d/dc/WHUVectorW.png"/> . Where “a” is an 1×21 vector that contain △G(1) to △G(21) as its value, ω is the weight vector. Only the impact of DNA-gRNA interaction (“a”) is counting as a variable, and the △G contributed by other interaction(eg. protein-DNA interaction) are considered as a constant b. This is also why this model cannot predict Cas9 off-target rate of a target without PAM(NGG), which interact with Cas9 rather than gRNA. F() is the function that relate with △G’. </br> | |
- | + | 6. Both cleaving steps of Cas9 are assumed as classic Michaelis-Menten enzyme reaction. </br> | |
+ | 7. The dCas9, aCas9 and normal Cas9 are assumed to share a same Ka with DNA, given they are guided by the same gRNA. This is reasonable as the only changes in the DNA binding domain of </br> | ||
Line 60: | Line 60: | ||
</div> | </div> | ||
<center><em>Figure 1. Correlation map between △G(i) and Cas9 cutting efficiency</em></center></br> | <center><em>Figure 1. Correlation map between △G(i) and Cas9 cutting efficiency</em></center></br> | ||
+ | The result shows that roughly the closer the position to PAM the larger the correlation. This discovery is consistent with previous studies [1,2,3,4,7,8]. Therefore we confirm that △G(i) do influence the targeting efficiency of Cas9.</br></br> | ||
+ | |||
There may be two reason for the negative correlation between △G(1),△G(2).△G(3) and cutting efficiency. </br></br> | There may be two reason for the negative correlation between △G(1),△G(2).△G(3) and cutting efficiency. </br></br> | ||
- | First, the Cas9 has a great mismatch tolerance for the 5’ end of gRNA. This reason is also backed by the activation result of Prashant’s paper[2], as their data revealed some mutations in the targeting region of the gRNA can promote the activating ability of aCas9. However, though only this paper use aCas9 to measure mismatch tolerance of Cas9, these date are not consistent with the data of other papers using Cas9 or dCas9 for the same purpose. The other papers didn’t report that many gRNA with single mismatches can “increase” the targeting efficiency[1,3,4, | + | |
+ | First, the Cas9 has a great mismatch tolerance for the 5’ end of gRNA. This reason is also backed by the activation result of Prashant’s paper[2], as their data revealed some mutations in the targeting region of the gRNA can promote the activating ability of aCas9. However, though only this paper use aCas9 to measure mismatch tolerance of Cas9, these date are not consistent with the data of other papers using Cas9 or dCas9 for the same purpose. The other papers didn’t report that many gRNA with single mismatches can “increase” the targeting efficiency[1,3,4,8]. So the support of Prashant’s paper for this reason is not 100% solid. </br></br> | ||
+ | |||
Second, there are flaws in the calculation of terminal energy. As all terminal mismatch of RNA and most for DNA are stabilizing [5,6]. The NN model may fail to catch all these stabilizing effect. So improvement of the energy calculation rules may help to fix the negative correlation. </br></br> | Second, there are flaws in the calculation of terminal energy. As all terminal mismatch of RNA and most for DNA are stabilizing [5,6]. The NN model may fail to catch all these stabilizing effect. So improvement of the energy calculation rules may help to fix the negative correlation. </br></br> | ||
+ | |||
+ | |||
+ | The data from [1,2,3,4,8] revealed that the sum of △G(i) is not proportional to targeting efficiency. The following table can be concluded. </br></br> | ||
+ | <div style="text-align:center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/9/95/WHURefsequence.png" /></div> | ||
+ | <center><em></em>Table 2. The relation of G/C frequence and single mismatch tolerance</br></center></br> | ||
+ | The mismatch tolerance is roughly determined from the data of the references, for details please click here. A low tolerance sequence with single mismatch on at least 7 positions has significant performance drop. A high tolerance sequence with single mismatch at more than 16 position can perform as well as the original sequence in guiding Cas9. The distribution means the number of G/C on position 1-10 verses the number of G/C on position 11-20.</br></br> | ||
+ | |||
+ | The relationship of G/C frequence with single mismatch tolerance can be explained by the fact that the abundance in G/C make the gRNA binds to DNA more stable, and single mismatch is not strong enough to disturb the binding. This suggests that the F() may be a reversed sigmoid function. But to determine this sigmoid function ( and to determine b) requires more specific experiment data of d/aCas9 binding kinetics, which is not available. </br></br> | ||
+ | |||
+ | This the sigmoid function guess is also supported by later analysis in the comparison of the Cas9 binding model, which assumed a normal proportional relationship between △G(i) and △G’. The results shows that the Cas9-gRNA not only is not sensitive to energy change in DNA-gRNA binding when △G’ surpass some threshold, but also not sensitive to such energy flux when △G’ is lower than some threshold. </br></br> | ||
+ | |||
+ | For a ω derived from Fig S7C and Fig.5D of [3], we checked the performance of these parameters and the model by compared the predicted value with the data from Fig.2B of [4] and Fig.5C of [3]. Noticed that all the following figure using data normalize by the activity of “wildtype” gRNA, thus the b is not required for the prediction. </br></br> | ||
+ | <div style="text-align:center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/c/c4/WHUFig2Cas9.png" /></div> | ||
+ | <center><em></em>Figure 2. Model prediction compared with data from Fig.2B of [4]</br></center></br> | ||
+ | <div style="text-align:center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/5/55/WHUFig3cas9.png" /></div> | ||
+ | <center><em></em>Figure 3. Model prediction compared with data from Fig.5CB of [3]</br></center></br> | ||
+ | These data are collected from 1’ end truncation or consecutive mutation experiment of gRNA. In Both figure, as the column number grows, the end truncation/end mutations become more serious, and the total energy of DNA-gRNA binding drops. The prediction of the model is near-linear, but the data show great non-lineality. Obvious platforms formed in the 4-8 column of Fig.2 and column 3-9 of Fig.3, which suggest the gRNA-Cas9 complex is not sensitive for the energy loss cause by the continuous mismatch / truncation at these stage.</br> | ||
+ | On the las part of the model. Kinetic analysis reveals that both concentration and reaction time are important for off-target control.</br></br></br> | ||
<a name="derivation"></a> | <a name="derivation"></a> | ||
Line 74: | Line 99: | ||
The calculation method of △G(i) and △G’ is modified from the NN nearest neighbor model introduced in [2]. </br> | The calculation method of △G(i) and △G’ is modified from the NN nearest neighbor model introduced in [2]. </br> | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
- | <img src="https://static.igem.org/mediawiki/2013/0/04/WHUCas9gRNA.png" | + | <img src="https://static.igem.org/mediawiki/2013/0/04/WHUCas9gRNA.png" /> |
</div> | </div> | ||
- | <center><em>Figure | + | <center><em>Figure 4. schematic picture of Cas9 digestion, modified from [1]</em></center></br> |
<b>Step1.</b> Set up the binding sequence</br> | <b>Step1.</b> Set up the binding sequence</br> | ||
The input will be the 21nt of the target prior to the GG of the PAM, and the corresponding 21nt of the potential off-target sequence. The reason for why we need a 21nt sequence rather than 20nt is that the NN model using the adjacent 2nt as inputs. In order to completely consider the impact of the 20nt targeting sequence of gRNA, we need to consider the 21st base to make the calculation comprehensive. Hereby we explain our way to process inputs using an example. Mismatch base pairs are highlighted in red. </br></br> | The input will be the 21nt of the target prior to the GG of the PAM, and the corresponding 21nt of the potential off-target sequence. The reason for why we need a 21nt sequence rather than 20nt is that the NN model using the adjacent 2nt as inputs. In order to completely consider the impact of the 20nt targeting sequence of gRNA, we need to consider the 21st base to make the calculation comprehensive. Hereby we explain our way to process inputs using an example. Mismatch base pairs are highlighted in red. </br></br> | ||
Line 121: | Line 146: | ||
Empirical salt correction equations have been derived, </br> | Empirical salt correction equations have been derived, </br> | ||
</br> | </br> | ||
+ | <div style="text-align:center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/5/59/WHUDeltaGiCas9.png" /></br></div></br> | ||
where N is the total number of phosphates in the duplex, and [Na+] is the total concentration of monovalent cations from all sources (the same equation works for</br> | where N is the total number of phosphates in the duplex, and [Na+] is the total concentration of monovalent cations from all sources (the same equation works for</br> | ||
sodium, potassium, and ammonium )over a range of monovalent concentration of 0.05 to1M. </br> | sodium, potassium, and ammonium )over a range of monovalent concentration of 0.05 to1M. </br> | ||
Line 154: | Line 181: | ||
If △G’ really determine the probability of Cas9 digest certain DNA. There must be some kind of correlation between each △G(i) and the Cas9 targeting efficiency θ. We can calculate the △G’ and θ of all sequence contained in the library, and calculate the Pearson's product-moment coefficient of △G’ and θ. So we analyzed CLTA1,2,3 one-mutation pre-selection library and “v2.1 gRNA 100nM Cas9” post-selection library, and get Figure 1. | If △G’ really determine the probability of Cas9 digest certain DNA. There must be some kind of correlation between each △G(i) and the Cas9 targeting efficiency θ. We can calculate the △G’ and θ of all sequence contained in the library, and calculate the Pearson's product-moment coefficient of △G’ and θ. So we analyzed CLTA1,2,3 one-mutation pre-selection library and “v2.1 gRNA 100nM Cas9” post-selection library, and get Figure 1. | ||
</br></br> | </br></br> | ||
+ | </br> | ||
- | |||
- | |||
- | |||
<a name="prediction"></a> | <a name="prediction"></a> | ||
<b>4.3. Derivation of Cas9 binding model, for off targe prediction of d/aCas9</b></br></br> | <b>4.3. Derivation of Cas9 binding model, for off targe prediction of d/aCas9</b></br></br> | ||
Line 184: | Line 209: | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/0/0e/WHURFPFluores.png" style="width:100%;height:auto;"></br></div> | <img src="https://static.igem.org/mediawiki/2013/0/0e/WHURFPFluores.png" style="width:100%;height:auto;"></br></div> | ||
- | <center><em>Figure | + | <center><em>Figure 5. dCas9 regulation on promoter J23119 (extracted from [3])</em></center></br> |
- | Notice that, on Figure | + | Notice that, on Figure 5, the RFP started to decrease exponentially 10min after the adding of inducer. This is only possible, when v[mRNA] is hold as a constant. So d[mRNA]/dt=0, which means [TF] is a constant. In this equation, [TF] means the concentration of transcription factor that binding to the promoter, while dCas9 is the only transcription factor in this experiment. According to table 2.1 and 2.2 in [9], the typical mRNA lifetime in E.coli is 2-5 min, the time for protein (Cas9) transcription and translation is 5 min. So the Cas9-DNA binding can achieve equilibrium within (10-5-5~10-5-2) 0~3 min in vivo. So the time needed to achieve equilibrium is much shorter than the experiment time-scale both in vivo and in vitro. </br></br> |
So we can consider the equations are in steady state. </br> | So we can consider the equations are in steady state. </br> | ||
Line 193: | Line 218: | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/e/e2/WHUMeasurement.png" /> </br></div> | <img src="https://static.igem.org/mediawiki/2013/e/e2/WHUMeasurement.png" /> </br></div> | ||
+ | </div> | ||
+ | These data can never give us the exact value of △G’ as they only indicate the difference between △G’. So we can assume that the exact match gRNA result in a △G’norm=0 to calculate ω. (The norm shall be reset for every different set of data)</br> | ||
+ | <div style="text-align:center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/7/78/WHUDeltaG1prime.png" /></br></div> | ||
+ | As we now assume F() as a reversed step function, i.e. </br> | ||
+ | <div style="text-align:center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/0/01/WHUFxcas9.png" /></br></div> | ||
+ | So we will first choose the data with smaller △G’ for regression, i.e. the data for he target with less G/C. This will ensure x<q. This data are extracted from Fig S7E of [2], Fig 5C of [3], Fig 2AB of [4]. </br></br> | ||
+ | |||
+ | Therefore we can link △G’ with the data of [2,3,4], and run the regression to calculate the relationship of △G(i) and △G’ (See result).</br></br> | ||
+ | |||
+ | In order to predict the off-target rate of d/aCas9. Following equation can be derived. At equilibrium,</br> | ||
+ | <div style="text-align:center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/f/fa/WHUKdpb.png" /> </br></div> | ||
+ | So at equilibrium, the probability of a substrate binding with a Cas9 is [E0]/([E0]+Kd). If we set pbw as the probability of d/aCas9 binding to the wrong target, pbr as the probability of d/aCas9 binding to the right target. The off-target rate will be,</br> | ||
+ | <div style="text-align:center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2013/e/e7/WHUPbv.png" /></br></div></br> | ||
+ | This equation can also be employed to calculate the best enzyme concentration of gRNA-Cas9 for an ideal balance between regulation and off-target.</br></br> | ||
+ | |||
+ | |||
+ | |||
+ | _______-------------------------------- | ||
Therefore we can link △G’ with the data of [2,3,4], and calculate the relation between △G(i) and △G’. </br></br></br> | Therefore we can link △G’ with the data of [2,3,4], and calculate the relation between △G(i) and △G’. </br></br></br> | ||
In order to predict the off-target rate of d/aCas9. Following equation can be derived. At equilibrium, </br> | In order to predict the off-target rate of d/aCas9. Following equation can be derived. At equilibrium, </br> | ||
Line 218: | Line 265: | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/5/5e/WHUKm.png" /></br></div> | <img src="https://static.igem.org/mediawiki/2013/5/5e/WHUKm.png" /></br></div> | ||
- | It’s also hard to fit into present data, as there is no kinetic data for Cas9 available now | + | It’s also hard to fit into present data, as there is no kinetic data for Cas9 available now. </br></br> |
But this function can tell us that the product concentration will increase in following patterns. </br> | But this function can tell us that the product concentration will increase in following patterns. </br> | ||
Line 224: | Line 271: | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/c/c0/WHUTheor.png" /></br></div> | <img src="https://static.igem.org/mediawiki/2013/c/c0/WHUTheor.png" /></br></div> | ||
- | <center><em>Figure | + | <center><em>Figure 6. Theoretical curves from the Cas9 cleaving reaction |
The curves displaying changes of two different cleaved products. Boundary conditions were set as [A0]=1.0, [B0]=[C0]=0, ka=0.2 min-1,kb=0.1 min-1 for red line; | The curves displaying changes of two different cleaved products. Boundary conditions were set as [A0]=1.0, [B0]=[C0]=0, ka=0.2 min-1,kb=0.1 min-1 for red line; | ||
And [A0]=1.0, [B0]=[C0]=0, ka=0.1 min-1,kb=0.05 min-1 for blue line. | And [A0]=1.0, [B0]=[C0]=0, ka=0.1 min-1,kb=0.05 min-1 for blue line. | ||
- | </em></center> | + | </em></br></center></br> |
+ | The Figure shows that even the ka and kb of the on-target binding is twice as large as ka and kab of the off-target binding, the off target rate will still grows drastically as the time goes on. </br></br> | ||
- | + | So in addition to control the concentration of Cas9, control the expressing time of Cas9 is also important for off-target rate control. Cas9’s expression can be stoped as soon as possible when acceptable theoretical editing rate is reached, in order to reduce off-target rate. </br></br> | |
- | + | ||
Pattanayak’s in vitro experiment can reveal the off-target rate in vivo. Because in the experiment the DNA and gRNA-Cas9 concentration is 200nM and 100nM respectively. Every single kind of DNA has a abundance equals to or less than 0.1% (which is approximately the abundance of wild type sequence, the most abundant one), so the concentration of a specific DNA is on the same power(or less than) 0.1nM. Therefore, </br></br> | Pattanayak’s in vitro experiment can reveal the off-target rate in vivo. Because in the experiment the DNA and gRNA-Cas9 concentration is 200nM and 100nM respectively. Every single kind of DNA has a abundance equals to or less than 0.1% (which is approximately the abundance of wild type sequence, the most abundant one), so the concentration of a specific DNA is on the same power(or less than) 0.1nM. Therefore, </br></br> | ||
Line 241: | Line 288: | ||
The DNA-Cas9 ratio is of the same order, so it’s reasonable to use the experimental data to predict the Cas9 behavior in vivo. </br></br> | The DNA-Cas9 ratio is of the same order, so it’s reasonable to use the experimental data to predict the Cas9 behavior in vivo. </br></br> | ||
- | |||
- | <a name=" | + | <a name="addendum"></a> |
<h1 style="font-size:20px;"><b> | <h1 style="font-size:20px;"><b> | ||
- | 5. | + | 5. Addendum</br></b></h1></br> |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/6/6e/WHUDpdt.png" /></br></div> | <img src="https://static.igem.org/mediawiki/2013/6/6e/WHUDpdt.png" /></br></div> | ||
- | + | In a typical endonuclease environment, and are always hold. Even in Pattanayak’s paper[1], though the total DNA concentration is 200nM, the concentration every single kind of DNA(with certain sequence) is lower than 0.1nM, which is much lower than KM of any typical restriction enzyme.</br> | |
- | In a typical endonuclease environment, | + | But still, the MM equation remains valid. Because, first, under these conditions, [E] (free E concentration) doesn't change much, because most "enzymes" are in free form and they don't do anything; second, some time after enzyme and substrate are mixed the concentrations of free enzyme sites and of substrate complexed will reach a steady state.[17] </br> |
- | But still, the MM equation remains valid. Because, first, under these conditions, [E] (free E concentration) doesn't change much, because most "enzymes" are in free form and they don't do anything;second, some time after enzyme and substrate are mixed the concentrations of free enzyme sites and of substrate complexed will reach a steady state.[ | + | |
<div style="text-align:center"> | <div style="text-align:center"> |
Revision as of 11:29, 27 September 2013
Cas9 Off-target Prediction Model.(Abbreviation: Cas9Off Model)
For a pdf version of the tandem promoter modeling part,click here