Team:WHU-China/templates/standardpage modelingCas9
From 2013.igem.org
(Difference between revisions)
IgnatzZeng (Talk | contribs) |
IgnatzZeng (Talk | contribs) |
||
(5 intermediate revisions not shown) | |||
Line 75: | Line 75: | ||
1. As Cas9 need the guiding of gRNA to cut DNA, the unbounded gRNA and Cas9 are ignored in the analysis, and other gRNA and Cas9 are considered to constantly bind to each other. </br> | 1. As Cas9 need the guiding of gRNA to cut DNA, the unbounded gRNA and Cas9 are ignored in the analysis, and other gRNA and Cas9 are considered to constantly bind to each other. </br> | ||
2. The model does not take the 3D structure of DNA, gRNA and DNA-gRNA complex into consideration. As the data is not sufficient to take these factor into consideration. </br> | 2. The model does not take the 3D structure of DNA, gRNA and DNA-gRNA complex into consideration. As the data is not sufficient to take these factor into consideration. </br> | ||
- | 3. The model is based on NN nearest neighbor model of base pairing energy[5]. This model was built for thermodynamic energy calculation of DNA strand interaction. But we employ it to model the gRNA-DNA interaction. This will bring in some inherent flaws. The most prominent one will be when the RNA side is a U and the DNA side is a G. In the NN model it’s considered as a T-G pair, which is not as energetically favorable as U-G [6]. However, there is no model available for DNA-RNA interaction energy calculation yet. So it’s assumed that the energy (△G(i)) calculated from the NN model is to some degree consistent with reality. In fact, [7] suggested a rough sort of the tolerance of base mismatch: CC | + | 3. The model is based on NN nearest neighbor model of base pairing energy[5]. This model was built for thermodynamic energy calculation of DNA strand interaction. But we employ it to model the gRNA-DNA interaction. This will bring in some inherent flaws. The most prominent one will be when the RNA side is a U and the DNA side is a G. In the NN model it’s considered as a T-G pair, which is not as energetically favorable as U-G [6]. However, there is no model available for DNA-RNA interaction energy calculation yet. So it’s assumed that the energy (△G(i)) calculated from the NN model is to some degree consistent with reality. In fact, [7] suggested a rough sort of the tolerance of base mismatch: CC<UC<AG<AA<GA<CA<UG<CT<GG<UT<AC<GT, while the model suggested that CC<AC<TC<AA<TT<GA<GT<GG. </br> |
4. We believe by employing a better model of energy prediction, the whole Cas9 off-target model will be improved. </br> | 4. We believe by employing a better model of energy prediction, the whole Cas9 off-target model will be improved. </br> | ||
5. We assume △G’ takes up a form of <img src="https://static.igem.org/mediawiki/2013/d/dc/WHUVectorW.png"/> . Where “a” is an 1×21 vector that contain △G(1) to △G(21) as its value, ω is the weight vector. Only the impact of DNA-gRNA interaction (“a”) is counting as a variable, and the △G contributed by other interaction(eg. protein-DNA interaction) are considered as a constant b. This is also why this model cannot predict Cas9 off-target rate of a target without PAM(NGG), which interact with Cas9 rather than gRNA. F() is the function that relate <img src="https://static.igem.org/mediawiki/2013/b/bf/Refineca1.png" />with △G’. </br> | 5. We assume △G’ takes up a form of <img src="https://static.igem.org/mediawiki/2013/d/dc/WHUVectorW.png"/> . Where “a” is an 1×21 vector that contain △G(1) to △G(21) as its value, ω is the weight vector. Only the impact of DNA-gRNA interaction (“a”) is counting as a variable, and the △G contributed by other interaction(eg. protein-DNA interaction) are considered as a constant b. This is also why this model cannot predict Cas9 off-target rate of a target without PAM(NGG), which interact with Cas9 rather than gRNA. F() is the function that relate <img src="https://static.igem.org/mediawiki/2013/b/bf/Refineca1.png" />with △G’. </br> | ||
Line 88: | Line 88: | ||
We employ a NN nearest neighbor model to calculate the △G(i) between gRNA and DNA on each NN position. From the first nucleotide of the target area of gRNA to the 20th, △G(i) of totally 21 position are calculated. We first proved the feasibility of our idea by calculating the correlation between △G(i) and cutting efficiency (employing data from [1] CLTA1,2,3). </br> | We employ a NN nearest neighbor model to calculate the △G(i) between gRNA and DNA on each NN position. From the first nucleotide of the target area of gRNA to the 20th, △G(i) of totally 21 position are calculated. We first proved the feasibility of our idea by calculating the correlation between △G(i) and cutting efficiency (employing data from [1] CLTA1,2,3). </br> | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
- | <img src="https://static.igem.org/mediawiki/2013/ | + | <img src="https://static.igem.org/mediawiki/2013/8/8f/WHU2013Cutting_efficiency.png" width=700px height=300px /></br> |
</div> | </div> | ||
<center><em><b>Figure 1. Correlation map between △G(i) and Cas9 cutting efficiency</b></em></center></br> | <center><em><b>Figure 1. Correlation map between △G(i) and Cas9 cutting efficiency</b></em></center></br> | ||
Line 94: | Line 94: | ||
But the data from [1,2,3,4,8] also revealed that △G(i) is not proportional to targeting efficiency. In most high single mismatch tolerance cases, the correlation between △G(i) and targeting efficiency is not significant. The following table can be concluded.</br></br> | But the data from [1,2,3,4,8] also revealed that △G(i) is not proportional to targeting efficiency. In most high single mismatch tolerance cases, the correlation between △G(i) and targeting efficiency is not significant. The following table can be concluded.</br></br> | ||
- | <table> | + | <div style="width:100%;text-align:center;"> |
+ | <table style="text-align:left;width:100%;"> | ||
<td class="topstike">Sequence</td><td class="topstike">Single Mismatch tolerance</td><td class="topstike">G/C</td><td class="topstike">Ref</td> | <td class="topstike">Sequence</td><td class="topstike">Single Mismatch tolerance</td><td class="topstike">G/C</td><td class="topstike">Ref</td> | ||
<tr><td class="topstike">TCATGCTGTTTCATATGATC</td><td class="topstike">low</td><td class="topstike">7</td><td class="topstike">[4]</td></tr> | <tr><td class="topstike">TCATGCTGTTTCATATGATC</td><td class="topstike">low</td><td class="topstike">7</td><td class="topstike">[4]</td></tr> | ||
Line 110: | Line 111: | ||
<tr><td class="topstike">GGGCACGGGCAGCTTGCCGG</td><td class="topstike">high</td><td class="topstike">16</td><td class="topstike">[8]</td></tr> | <tr><td class="topstike">GGGCACGGGCAGCTTGCCGG</td><td class="topstike">high</td><td class="topstike">16</td><td class="topstike">[8]</td></tr> | ||
<tr><td class="topstike"></td><td class="topstike"></td><td class="topstike">Ave. G/C 12.4</td><td class="topstike"></td></tr> | <tr><td class="topstike"></td><td class="topstike"></td><td class="topstike">Ave. G/C 12.4</td><td class="topstike"></td></tr> | ||
- | </table> | + | </table></div> |
<center><em><b>Table 2. The relation of G/C frequence and single mismatch tolerance</b></em></br></center></br> | <center><em><b>Table 2. The relation of G/C frequence and single mismatch tolerance</b></em></br></center></br> | ||
The mismatch tolerance is roughly determined from the data of the references, for details(pdf version) please <a href="https://static.igem.org/mediawiki/2013/2/22/WHUTargeting_analysis.pdf">click here</a>. A low tolerance sequence with single mismatch on at least 7 positions has significant performance drop. A high tolerance sequence with single mismatch at more than 16 position can perform as well as the original sequence in guiding Cas9.</br></br> | The mismatch tolerance is roughly determined from the data of the references, for details(pdf version) please <a href="https://static.igem.org/mediawiki/2013/2/22/WHUTargeting_analysis.pdf">click here</a>. A low tolerance sequence with single mismatch on at least 7 positions has significant performance drop. A high tolerance sequence with single mismatch at more than 16 position can perform as well as the original sequence in guiding Cas9.</br></br> | ||
Line 124: | Line 125: | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
- | <img src="https://static.igem.org/mediawiki/2013/ | + | <img src="https://static.igem.org/mediawiki/2013/e/ea/WHU2013Refineca5.png" /></div> |
<center><em><b>Figure 2. Model prediction compared with data from Fig.2B of [4]</b></em></br></center></br> | <center><em><b>Figure 2. Model prediction compared with data from Fig.2B of [4]</b></em></br></center></br> | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
- | <img src="https://static.igem.org/mediawiki/2013/ | + | <img src="https://static.igem.org/mediawiki/2013/3/34/WHU2013Refineca6.png" /></div> |
<center><em><b>Figure 3. Model prediction compared with data from Fig.5CB of [3]</b></em></br></center></br> | <center><em><b>Figure 3. Model prediction compared with data from Fig.5CB of [3]</b></em></br></center></br> | ||
These data are collected from 1’ end truncation or consecutive mutation experiment of gRNA. In Both figure, as the column number grows, the end truncation/end mutations become more serious, and the total energy of DNA-gRNA binding drops. The prediction of the model is near-linear, but the data show great non-lineality. Obvious platforms formed in the 4-8 column of Fig.2 and column 3-9 of Fig.3, which suggest the gRNA-Cas9 complex is not sensitive for the energy loss cause by the continuous mismatch / truncation at these stage.</br></br> | These data are collected from 1’ end truncation or consecutive mutation experiment of gRNA. In Both figure, as the column number grows, the end truncation/end mutations become more serious, and the total energy of DNA-gRNA binding drops. The prediction of the model is near-linear, but the data show great non-lineality. Obvious platforms formed in the 4-8 column of Fig.2 and column 3-9 of Fig.3, which suggest the gRNA-Cas9 complex is not sensitive for the energy loss cause by the continuous mismatch / truncation at these stage.</br></br> | ||
On the las part of the model. Kinetic analysis reveals that both concentration and reaction time are important for off-target control.</br> | On the las part of the model. Kinetic analysis reveals that both concentration and reaction time are important for off-target control.</br> | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
- | <img src="https://static.igem.org/mediawiki/2013/ | + | <img src="https://static.igem.org/mediawiki/2013/f/fc/WHU2013concentration.png" width=600px /></div> |
<em> | <em> | ||
<center><b>Figure 4. Theoretical curves from the Cas9 cleaving reaction</b></br> | <center><b>Figure 4. Theoretical curves from the Cas9 cleaving reaction</b></br> |
Latest revision as of 01:27, 29 October 2013
1. Overview
For a pdf version of the tandem promoter modeling part,click here
This model aims at predicting the off-target rate of any Cas9-based system in vivo. It has the following key ideas.
2. Symbol table, Assumption and reasons.
Symbol | |
[ ] | The symbol of concentration, i.e. [A] means the concentration of A |
△G’ | Difference in Modified Gibbs Free Energy. It’s assumed to determine the binding constant between gRNA-Cas9 and DNA |
△G(i) | The calculated △G for the ith position of gRNA-DNA interaction |
a | The input of △G’, a vector consist of △G’(1)-(21) |
b | The constant representing all interaction in the binding process other than the gRNA-DNA interaction. |
ω | The weight vector |
F() | Relation function |
Ka | Association constant of gRNA-Cas9 and DNA |
Kd | Dissociation constant of gRNA-Cas9 and DNA |
[A]0 | the concentration of certain sequence in the pre-selection library |
[A]tot | the concentration of all DNA sequence in the pre-selection library |
[C] | the concentration of certain sequence in the post-selection library |
[C]tot | the concentration of all DNA sequence in the post-selection library |
A’ | the number of certain sequence we sampled from the pre-selection library |
Atot’ | the number of all sequence we sampled from the pre-selection library |
P’ | the number of certain sequence we sampled from the post-selection library |
Ptot’ | the number of all sequence we sampled from the post-selection library |
θ | Cas9 targeting efficiency |
S | Substrate, DNA |
E | Enzyme, gRNA-Cas9 |
P | Product, double strands broken DNA |
A | The intact DNA duplex |
B | the DNA molecule in which one of the two strands has been cleaved at the recognition site for the restriction enzyme |
C | the DNA molecules in which both strands have been cleaved at the recognition site |
ka,kb | The two apparent first-order reaction constant of the two steps of cleaving of Cas9 |
k1,k-1,kcat | Reaction constants |
KM | MM constant |
R | Gas constant |
T | Absolute temperature |
pb | Binding probability |
pc | Cutting probablity |
Abbreviation | |
dCas9 | Deactivated Cas9, Cas9 inhibitor, a Cas9 with two mutations D10A and H841A |
aCas9 | Cas9 activator, a dCas9 that fused with a activator domain like VP64, TAL and omega subunit of RNAP. |
d/aCas9 | Deactivated Cas9, no matter whether it’s an activator or inhibitor |
3. Modeling result
We employ a NN nearest neighbor model to calculate the △G(i) between gRNA and DNA on each NN position. From the first nucleotide of the target area of gRNA to the 20th, △G(i) of totally 21 position are calculated. We first proved the feasibility of our idea by calculating the correlation between △G(i) and cutting efficiency (employing data from [1] CLTA1,2,3).Sequence | Single Mismatch tolerance | G/C | Ref |
TCATGCTGTTTCATATGATC | low | 7 | [4] |
AACTTTCAGTTTAGCGGUCU | low | 8 | [3] |
TGTGAAGAGCTTCACTGAGT | low | 9 | [1] |
GATGCCGTTCTTCTGCTTGT | low | 10 | [8] |
AGTCCTCATCTCCCTCAAGC | low | 10 | [1] |
GAGATGATCGCCCCTTCTTC | low | 11 | [2] |
CTCCCTCAAGCAGGCCCCGC | low | 15 | [1] |
Ave. G/C 10 | |||
GCAGATGTAGTGTTTCCACA | high | 9 | [1] |
GGTGGTGCAGATGAACTTCA | high | 10 | [8] |
GGGGCCACTAGGGACAGGAT | high | 13 | [2] |
GTCCCCTCCACCCCACAGTG | high | 14 | [2] |
GGGCACGGGCAGCTTGCCGG | high | 16 | [8] |
Ave. G/C 12.4 |
4. Model derivation
4.1. Calculation of △G’ of DNA-gRNA binding The calculation method of △G(i) and △G’ is modified from the NN nearest neighbor model introduced in [2].5. Discussion
There may be three reason for the correlation variation throughout output 1-19.
First, the Cas9 has a mismatch tolerance for the 5’ end of gRNA. This is backed by all studies [1,2,3,4,7,8].
Second, there are flaws in the calculation of terminal energy. As all terminal mismatch of RNA and most for DNA are stabilizing [5,6]. The NN model may fail to catch all these stabilizing effect. So improvement of the energy calculation rules may help to fix the negative correlation.
Three, the NN model is derived from the binding energy database of free binding DNA double strands, while we employed it to calculate Cas9 influenced RNA-DNA binding. We considered this as the prime source of error in our model. And it may contribute to the funny correlation valley in △G(8)~△G(12). Or, maybe the valley means that Cas9 is indeed relatively insensitive to energy changes in these position.