Team:WHU-China/templates/standardpage modelingCas9
From 2013.igem.org
(Difference between revisions)
Falseplanet (Talk | contribs) |
IgnatzZeng (Talk | contribs) |
||
Line 14: | Line 14: | ||
</br></br> | </br></br> | ||
- | This model aims at predicting the off-target rate of any Cas9-based system in vivo. | + | This model aims at predicting the off-target rate of any Cas9-based system in vivo. It has the following key ideas. </br></br> |
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/2/25/WHUCas9total.png" /></br></div> | <img src="https://static.igem.org/mediawiki/2013/2/25/WHUCas9total.png" /></br></div> | ||
Line 32: | Line 32: | ||
2. Symbol table, Assumption and reasons.</b></h1></br> | 2. Symbol table, Assumption and reasons.</b></h1></br> | ||
<div style="text-align:center;width:100%;"> | <div style="text-align:center;width:100%;"> | ||
- | + | <table> | |
- | < | + | <tr><td class="topstike">Symbol</td><td class="topstike"></td></tr> |
- | < | + | <tr><td class="topstike">[ ]</td><td class="topstike">The symbol of concentration, i.e. [A] means the concentration of A</td></tr> |
- | < | + | <tr><td class="topstike">△G’</td><td class="topstike"Difference in Modified Gibbs Free Energy. It’s assumed to determine the binding constant between gRNA-Cas9 and DNA ></td></tr> |
- | </ | + | <tr><td class="topstike">△G(i)</td><td class="topstike">The calculated △G for the ith position of gRNA-DNA interaction</td></tr> |
+ | <tr><td class="topstike">a</td><td class="topstike">The input of △G’, a vector consist of △G’(1)-(21)</td></tr> | ||
+ | <tr><td class="topstike">b</td><td class="topstike">The constant representing all interaction in the binding process other than the gRNA-DNA interaction.</td></tr> | ||
+ | <tr><td class="topstike">ω</td><td class="topstike">The weight vector</td></tr> | ||
+ | <tr><td class="topstike">F()</td><td class="topstike">Relation function</td></tr> | ||
+ | <tr><td class="topstike">Ka</td><td class="topstike">Association constant of gRNA-Cas9 and DNA</td></tr> | ||
+ | <tr><td class="topstike">Kd</td><td class="topstike">Dissociation constant of gRNA-Cas9 and DNA</td></tr> | ||
+ | <tr><td class="topstike">[A]0</td><td class="topstike">the concentration of certain sequence in the pre-selection library</td></tr> | ||
+ | <tr><td class="topstike">[A]tot</td><td class="topstike">the concentration of all DNA sequence in the pre-selection library</td></tr> | ||
+ | <tr><td class="topstike">[C]</td><td class="topstike">the concentration of certain sequence in the post-selection library</td></tr> | ||
+ | <tr><td class="topstike">[C]tot</td><td class="topstike">the concentration of all DNA sequence in the post-selection library</td></tr> | ||
+ | <tr><td class="topstike">A’</td><td class="topstike">the number of certain sequence we sampled from the pre-selection library</td></tr> | ||
+ | <tr><td class="topstike">Atot’</td><td class="topstike">the number of all sequence we sampled from the pre-selection library</td></tr> | ||
+ | <tr><td class="topstike">P’</td><td class="topstike">the number of certain sequence we sampled from the post-selection library</td></tr> | ||
+ | <tr><td class="topstike">Ptot’</td><td class="topstike">the number of all sequence we sampled from the post-selection library</td></tr> | ||
+ | <tr><td class="topstike">θ</td><td class="topstike">Cas9 targeting efficiency</td></tr> | ||
+ | <tr><td class="topstike">S</td><td class="topstike">Substrate, DNA</td></tr> | ||
+ | <tr><td class="topstike">E</td><td class="topstike">Enzyme, gRNA-Cas9</td></tr> | ||
+ | <tr><td class="topstike">P</td><td class="topstike">Product, double strands broken DNA</td></tr> | ||
+ | <tr><td class="topstike">A</td><td class="topstike">The intact DNA duplex</td></tr> | ||
+ | <tr><td class="topstike">B</td><td class="topstike">the DNA molecule in which one of the two strands has been cleaved at the recognition site for the restriction enzyme</td></tr> | ||
+ | <tr><td class="topstike">C</td><td class="topstike">the DNA molecules in which both strands have been cleaved at the recognition site</td></tr> | ||
+ | <tr><td class="topstike">ka,kb</td><td class="topstike">The two apparent first-order reaction constant of the two steps of cleaving of Cas9 </td></tr> | ||
+ | <tr><td class="topstike">k1,k-1,kcat</td><td class="topstike">Reaction constants</td></tr> | ||
+ | <tr><td class="topstike">KM</td><td class="topstike">MM constant</td></tr> | ||
+ | <tr><td class="topstike">R</td><td class="topstike">Gas constant</td></tr> | ||
+ | <tr><td class="topstike">T</td><td class="topstike">Absolute temperature</td></tr> | ||
+ | <tr><td class="topstike">pb</td><td class="topstike">Binding probability</td></tr> | ||
+ | <tr><td class="topstike">pc</td><td class="topstike">Cutting probablity</td></tr> | ||
+ | <tr><td class="topstike">Abbreviation</td><td class="topstike"></td></tr> | ||
+ | <tr><td class="topstike">dCas9</td><td class="topstike">Deactivated Cas9, Cas9 inhibitor, a Cas9 with two mutations D10A and H841A </td></tr> | ||
+ | <tr><td class="topstike">aCas9</td><td class="topstike">Cas9 activator, a dCas9 that fused with a activator domain like VP64, TAL and omega subunit of RNAP.</td></tr> | ||
+ | <tr><td class="topstike">d/aCas9</td><td class="topstike">Deactivated Cas9, no matter whether it’s an activator or inhibitor</td></tr> | ||
+ | </table> | ||
<center><em>Table 1. Symbol table of Cas9Off Model</em></br></center> | <center><em>Table 1. Symbol table of Cas9Off Model</em></br></center> | ||
Line 43: | Line 76: | ||
3. The model is based on NN nearest neighbor model of base pairing energy[5]. This model was built for thermodynamic energy calculation of DNA strand interaction. But we employ it to model the gRNA-DNA interaction. This will bring in some inherent flaws. The most prominent one will be when the RNA side is a U and the DNA side is a G. In the NN model it’s considered as a T-G pair, which is not as energetically favorable as U-G [6]. However, there is no model available for DNA-RNA interaction energy calculation yet. So it’s assumed that the energy (△G(i)) calculated from the NN model is to some degree consistent with reality. In fact, [7] suggested a rough sort of the tolerance of base mismatch: CC<UC<AG<AA<GA<CA<UG<CT<GG<UT <AC<GT, while the model suggested that CC<AC<TC<AA<TT<GA<GT<GG. </br> | 3. The model is based on NN nearest neighbor model of base pairing energy[5]. This model was built for thermodynamic energy calculation of DNA strand interaction. But we employ it to model the gRNA-DNA interaction. This will bring in some inherent flaws. The most prominent one will be when the RNA side is a U and the DNA side is a G. In the NN model it’s considered as a T-G pair, which is not as energetically favorable as U-G [6]. However, there is no model available for DNA-RNA interaction energy calculation yet. So it’s assumed that the energy (△G(i)) calculated from the NN model is to some degree consistent with reality. In fact, [7] suggested a rough sort of the tolerance of base mismatch: CC<UC<AG<AA<GA<CA<UG<CT<GG<UT <AC<GT, while the model suggested that CC<AC<TC<AA<TT<GA<GT<GG. </br> | ||
4. We believe by employing a better model of energy prediction, the whole Cas9 off-target model will be improved. </br> | 4. We believe by employing a better model of energy prediction, the whole Cas9 off-target model will be improved. </br> | ||
- | 5. We assume △G’ takes up a form of <img src="https://static.igem.org/mediawiki/2013/d/dc/WHUVectorW.png"/> . Where “a” is an 1×21 vector that contain △G(1) to △G(21) as its value, ω is the weight vector. Only the impact of DNA-gRNA interaction (“a”) is counting as a variable, and the △G contributed by other interaction(eg. protein-DNA interaction) are considered as a constant b. This is also why this model cannot predict Cas9 off-target rate of a target without PAM(NGG), which interact with Cas9 rather than gRNA. F() is the function that relate | + | 5. We assume △G’ takes up a form of <img src="https://static.igem.org/mediawiki/2013/d/dc/WHUVectorW.png"/> . Where “a” is an 1×21 vector that contain △G(1) to △G(21) as its value, ω is the weight vector. Only the impact of DNA-gRNA interaction (“a”) is counting as a variable, and the △G contributed by other interaction(eg. protein-DNA interaction) are considered as a constant b. This is also why this model cannot predict Cas9 off-target rate of a target without PAM(NGG), which interact with Cas9 rather than gRNA. F() is the function that relate <img src="https://static.igem.org/mediawiki/2013/b/bf/Refineca1.png" />with △G’. </br> |
6. Both cleaving steps of Cas9 are assumed as classic Michaelis-Menten enzyme reaction. </br> | 6. Both cleaving steps of Cas9 are assumed as classic Michaelis-Menten enzyme reaction. </br> | ||
- | 7. The dCas9, aCas9 and normal Cas9 are assumed to share a same Ka with DNA, given they are guided by the same gRNA. This is reasonable as the only changes in | + | 7. The dCas9, aCas9 and normal Cas9 are assumed to share a same Ka with DNA, given they are guided by the same gRNA. This is reasonable as the only changes in Cas9 are D10A and H841A.</br></br></br> |
Line 52: | Line 85: | ||
3. Modeling result</b></h1></br> | 3. Modeling result</b></h1></br> | ||
</br> | </br> | ||
- | We employ a NN nearest neighbor model to calculate the △G(i) between gRNA and DNA on each NN position. From the first nucleotide of the target area of gRNA to the 20th, △G(i) of totally 21 position are calculated. We first proved the feasibility of our idea by calculating the correlation between △G(i) and cutting efficiency (employing data from [1]). </br> | + | We employ a NN nearest neighbor model to calculate the △G(i) between gRNA and DNA on each NN position. From the first nucleotide of the target area of gRNA to the 20th, △G(i) of totally 21 position are calculated. We first proved the feasibility of our idea by calculating the correlation between △G(i) and cutting efficiency (employing data from [1] CLTA1,2,3). </br> |
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/f/f6/%E6%96%B0%E5%9B%BE.jpg" width=700px height=300px /></br> | <img src="https://static.igem.org/mediawiki/2013/f/f6/%E6%96%B0%E5%9B%BE.jpg" width=700px height=300px /></br> | ||
Line 59: | Line 92: | ||
The result shows that roughly the closer the position to PAM the larger the correlation. This discovery is consistent with previous studies [1,2,3,4,7,8]. Therefore we confirm that △G(i) do influence the targeting efficiency of Cas9.</br></br> | The result shows that roughly the closer the position to PAM the larger the correlation. This discovery is consistent with previous studies [1,2,3,4,7,8]. Therefore we confirm that △G(i) do influence the targeting efficiency of Cas9.</br></br> | ||
- | + | But the data from [1,2,3,4,8] also revealed that △G(i) is not proportional to targeting efficiency. In most high single mismatch tolerance cases, the correlation between △G(i) and targeting efficiency is not significant. The following table can be concluded.</br></br> | |
- | + | <table> | |
- | + | <th><td class="topstike">Sequence</td><td class="topstike">Single Mismatch tolerance</td><td class="topstike">G/C</td><td class="topstike">Ref</td></th> | |
- | + | <tr><td class="topstike">TCATGCTGTTTCATATGATC</td><td class="topstike">low</td><td class="topstike">7</td><td class="topstike">[4]</td></tr> | |
- | + | <tr><td class="topstike">AACTTTCAGTTTAGCGGUCU</td><td class="topstike">low</td><td class="topstike">8</td><td class="topstike">[3]</td></tr> | |
- | + | <tr><td class="topstike">TGTGAAGAGCTTCACTGAGT</td><td class="topstike">low</td><td class="topstike">9</td><td class="topstike">[1]</td></tr> | |
- | + | <tr><td class="topstike">GATGCCGTTCTTCTGCTTGT</td><td class="topstike">low</td><td class="topstike">10</td><td class="topstike">[8]</td></tr> | |
- | + | <tr><td class="topstike">AGTCCTCATCTCCCTCAAGC</td><td class="topstike">low</td><td class="topstike">10</td><td class="topstike">[1]</td></tr> | |
- | + | <tr><td class="topstike">GAGATGATCGCCCCTTCTTC</td><td class="topstike">low</td><td class="topstike">11</td><td class="topstike">[2]</td></tr> | |
- | + | <tr><td class="topstike">CTCCCTCAAGCAGGCCCCGC</td><td class="topstike">low</td><td class="topstike">15</td><td class="topstike">[1]</td></tr> | |
- | < | + | <tr><td class="topstike"></td><td class="topstike"></td><td class="topstike">Ave. G/C 10</td><td class="topstike"></td></tr> |
- | < | + | <tr><td class="topstike">GCAGATGTAGTGTTTCCACA</td><td class="topstike">high</td><td class="topstike">9</td><td class="topstike">[1]</td></tr> |
+ | <tr><td class="topstike">GGTGGTGCAGATGAACTTCA</td><td class="topstike">high</td><td class="topstike">10</td><td class="topstike">[8]</td></tr> | ||
+ | <tr><td class="topstike">GGGGCCACTAGGGACAGGAT</td><td class="topstike">high</td><td class="topstike">13</td><td class="topstike">[2]</td></tr> | ||
+ | <tr><td class="topstike">GTCCCCTCCACCCCACAGTG</td><td class="topstike">high</td><td class="topstike">14</td><td class="topstike">[2]</td></tr> | ||
+ | <tr><td class="topstike">GGGCACGGGCAGCTTGCCGG</td><td class="topstike">high</td><td class="topstike">16</td><td class="topstike">[8]</td></tr> | ||
+ | <tr><td class="topstike"></td><td class="topstike"></td><td class="topstike">Ave. G/C 12.4</td><td class="topstike"></td></tr> | ||
+ | </table> | ||
<center><em></em>Table 2. The relation of G/C frequence and single mismatch tolerance</br></center></br> | <center><em></em>Table 2. The relation of G/C frequence and single mismatch tolerance</br></center></br> | ||
- | The mismatch tolerance is roughly determined from the data of the references, for details(pdf version) please <a href="https://static.igem.org/mediawiki/2013/2/22/WHUTargeting_analysis.pdf">click here</a>. A low tolerance sequence with single mismatch on at least 7 positions has significant performance drop. A high tolerance sequence with single mismatch at more than 16 position can perform as well as the original sequence in guiding Cas9 | + | The mismatch tolerance is roughly determined from the data of the references, for details(pdf version) please <a href="https://static.igem.org/mediawiki/2013/2/22/WHUTargeting_analysis.pdf">click here</a>. A low tolerance sequence with single mismatch on at least 7 positions has significant performance drop. A high tolerance sequence with single mismatch at more than 16 position can perform as well as the original sequence in guiding Cas9.</br></br> |
The relationship of G/C frequence with single mismatch tolerance can be explained by the fact that the abundance in G/C make the gRNA binds to DNA more stable, and single mismatch is not strong enough to disturb the binding. This suggests that the F() may be a sigmoid function. But to determine this sigmoid function ( and to determine b) requires more specific experiment data of d/aCas9 binding kinetics, which is not available. </br></br> | The relationship of G/C frequence with single mismatch tolerance can be explained by the fact that the abundance in G/C make the gRNA binds to DNA more stable, and single mismatch is not strong enough to disturb the binding. This suggests that the F() may be a sigmoid function. But to determine this sigmoid function ( and to determine b) requires more specific experiment data of d/aCas9 binding kinetics, which is not available. </br></br> |
Revision as of 12:49, 28 October 2013
1. Overview
For a pdf version of the tandem promoter modeling part,click here
This model aims at predicting the off-target rate of any Cas9-based system in vivo. It has the following key ideas.
2. Symbol table, Assumption and reasons.
Symbol | |
[ ] | The symbol of concentration, i.e. [A] means the concentration of A |
△G’ | |
△G(i) | The calculated △G for the ith position of gRNA-DNA interaction |
a | The input of △G’, a vector consist of △G’(1)-(21) |
b | The constant representing all interaction in the binding process other than the gRNA-DNA interaction. |
ω | The weight vector |
F() | Relation function |
Ka | Association constant of gRNA-Cas9 and DNA |
Kd | Dissociation constant of gRNA-Cas9 and DNA |
[A]0 | the concentration of certain sequence in the pre-selection library |
[A]tot | the concentration of all DNA sequence in the pre-selection library |
[C] | the concentration of certain sequence in the post-selection library |
[C]tot | the concentration of all DNA sequence in the post-selection library |
A’ | the number of certain sequence we sampled from the pre-selection library |
Atot’ | the number of all sequence we sampled from the pre-selection library |
P’ | the number of certain sequence we sampled from the post-selection library |
Ptot’ | the number of all sequence we sampled from the post-selection library |
θ | Cas9 targeting efficiency |
S | Substrate, DNA |
E | Enzyme, gRNA-Cas9 |
P | Product, double strands broken DNA |
A | The intact DNA duplex |
B | the DNA molecule in which one of the two strands has been cleaved at the recognition site for the restriction enzyme |
C | the DNA molecules in which both strands have been cleaved at the recognition site |
ka,kb | The two apparent first-order reaction constant of the two steps of cleaving of Cas9 |
k1,k-1,kcat | Reaction constants |
KM | MM constant |
R | Gas constant |
T | Absolute temperature |
pb | Binding probability |
pc | Cutting probablity |
Abbreviation | |
dCas9 | Deactivated Cas9, Cas9 inhibitor, a Cas9 with two mutations D10A and H841A |
aCas9 | Cas9 activator, a dCas9 that fused with a activator domain like VP64, TAL and omega subunit of RNAP. |
d/aCas9 | Deactivated Cas9, no matter whether it’s an activator or inhibitor |
3. Modeling result
We employ a NN nearest neighbor model to calculate the △G(i) between gRNA and DNA on each NN position. From the first nucleotide of the target area of gRNA to the 20th, △G(i) of totally 21 position are calculated. We first proved the feasibility of our idea by calculating the correlation between △G(i) and cutting efficiency (employing data from [1] CLTA1,2,3).Sequence | Single Mismatch tolerance | G/C | Ref |
TCATGCTGTTTCATATGATC | low | 7 | [4] |
AACTTTCAGTTTAGCGGUCU | low | 8 | [3] |
TGTGAAGAGCTTCACTGAGT | low | 9 | [1] |
GATGCCGTTCTTCTGCTTGT | low | 10 | [8] |
AGTCCTCATCTCCCTCAAGC | low | 10 | [1] |
GAGATGATCGCCCCTTCTTC | low | 11 | [2] |
CTCCCTCAAGCAGGCCCCGC | low | 15 | [1] |
Ave. G/C 10 | |||
GCAGATGTAGTGTTTCCACA | high | 9 | [1] |
GGTGGTGCAGATGAACTTCA | high | 10 | [8] |
GGGGCCACTAGGGACAGGAT | high | 13 | [2] |
GTCCCCTCCACCCCACAGTG | high | 14 | [2] |
GGGCACGGGCAGCTTGCCGG | high | 16 | [8] |
Ave. G/C 12.4 |