Team:WHU-China/templates/standardpage modelingCas9
From 2013.igem.org
(Difference between revisions)
IgnatzZeng (Talk | contribs) |
IgnatzZeng (Talk | contribs) |
||
Line 90: | Line 90: | ||
<img src="https://static.igem.org/mediawiki/2013/f/f6/%E6%96%B0%E5%9B%BE.jpg" width=700px height=300px /></br> | <img src="https://static.igem.org/mediawiki/2013/f/f6/%E6%96%B0%E5%9B%BE.jpg" width=700px height=300px /></br> | ||
</div> | </div> | ||
- | <center><em>Figure 1. Correlation map between △G(i) and Cas9 cutting efficiency</em></center></br> | + | <center><em><b>Figure 1. Correlation map between △G(i) and Cas9 cutting efficiency</b></em></center></br> |
The result shows that roughly the closer the position to PAM the larger the correlation. This discovery is consistent with previous studies [1,2,3,4,7,8]. Therefore we confirm that △G(i) do influence the targeting efficiency of Cas9.</br></br> | The result shows that roughly the closer the position to PAM the larger the correlation. This discovery is consistent with previous studies [1,2,3,4,7,8]. Therefore we confirm that △G(i) do influence the targeting efficiency of Cas9.</br></br> | ||
Line 150: | Line 150: | ||
<img src="https://static.igem.org/mediawiki/2013/0/04/WHUCas9gRNA.png" /> | <img src="https://static.igem.org/mediawiki/2013/0/04/WHUCas9gRNA.png" /> | ||
</div> | </div> | ||
- | <center><em>Figure 5. schematic picture of Cas9 digestion, modified from [1]</em></center></br> | + | <center><em><b>Figure 5. schematic picture of Cas9 digestion, modified from [1]</b></em></center></br> |
<b>Step1.</b> Set up the binding sequence</br> | <b>Step1.</b> Set up the binding sequence</br> | ||
The input will be the 21nt of the target prior to the GG of the PAM, and the corresponding 21nt of the potential off-target sequence. The reason for why we need a 21nt sequence rather than 20nt is that the NN model using the adjacent 2nt as inputs. In order to completely consider the impact of the 20nt targeting sequence of gRNA, we need to consider the 21st base to make the calculation comprehensive. Hereby we explain our way to process inputs using an example. Mismatch base pairs are highlighted in red. </br></br> | The input will be the 21nt of the target prior to the GG of the PAM, and the corresponding 21nt of the potential off-target sequence. The reason for why we need a 21nt sequence rather than 20nt is that the NN model using the adjacent 2nt as inputs. In order to completely consider the impact of the 20nt targeting sequence of gRNA, we need to consider the 21st base to make the calculation comprehensive. Hereby we explain our way to process inputs using an example. Mismatch base pairs are highlighted in red. </br></br> | ||
Line 172: | Line 172: | ||
Therefore, in the example, we consider the left end as a dangling end, the right end as a normal end. </br></br> | Therefore, in the example, we consider the left end as a dangling end, the right end as a normal end. </br></br> | ||
- | If a dangling end is determined, determine the first match position following the mismatch position. In the example, this will be position 2 (△G(2)). Set all dangling end position energy as 0, i.e. △G(1)=0, and calculate the first match according to Table | + | If a dangling end is determined, determine the first match position following the mismatch position. In the example, this will be position 2 (△G(2)). Set all dangling end position energy as 0, i.e. △G(1)=0, and calculate the first match according to Table 3, i.e. △G(2)=5’TC/G+3’GG/C=-0.58-0.44=-1.02 kcal/mol, </br></br> |
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/e/e4/WHUDanglingend.png" /></br></div> | <img src="https://static.igem.org/mediawiki/2013/e/e4/WHUDanglingend.png" /></br></div> | ||
- | <center><em>Table | + | <center><em><b>Table 3. Nearest-neighbor model for terminal dangling ends next to Watson-Crick pairs in 1 M NaCl, modified from Table 3 of [5]</b></em></center></br> |
If no dangling end appears. Determine whether the terminal pair is A-T. If yes, add a terminal AT penalty(+0.05) to the △G(i), and calculate all △G(i) according to Table 3. </br></br> | If no dangling end appears. Determine whether the terminal pair is A-T. If yes, add a terminal AT penalty(+0.05) to the △G(i), and calculate all △G(i) according to Table 3. </br></br> | ||
<b>Step3.</b> Internal energy calculation</br> | <b>Step3.</b> Internal energy calculation</br> | ||
- | Calculate all position except for first match and dangling end position according to Table | + | Calculate all position except for first match and dangling end position according to Table 4, in our example, this set contains △G(3) to △G(19). </br> |
The result will be </br> | The result will be </br> | ||
△G(3)=-2.17 kcal/mol</br> | △G(3)=-2.17 kcal/mol</br> | ||
Line 188: | Line 188: | ||
<div style="text-align:center"> | <div style="text-align:center"> | ||
<img src="https://static.igem.org/mediawiki/2013/3/33/WHUPropagation.png" /></br></div> | <img src="https://static.igem.org/mediawiki/2013/3/33/WHUPropagation.png" /></br></div> | ||
- | <center><em>Table | + | <center><em><b>Table 4.Nearest-neighbor model, modified from Table 2 of [5]</b></em></center></br> |
<b>Step4.</b> Further analysis of internal loops and bulges. </br> | <b>Step4.</b> Further analysis of internal loops and bulges. </br> | ||
We will complete this step in the future. For the model V1.0, the algorithm will skip this step. </br></br> | We will complete this step in the future. For the model V1.0, the algorithm will skip this step. </br></br> |
Revision as of 13:44, 28 October 2013
1. Overview
For a pdf version of the tandem promoter modeling part,click here
This model aims at predicting the off-target rate of any Cas9-based system in vivo. It has the following key ideas.
2. Symbol table, Assumption and reasons.
Symbol | |
[ ] | The symbol of concentration, i.e. [A] means the concentration of A |
△G’ | Difference in Modified Gibbs Free Energy. It’s assumed to determine the binding constant between gRNA-Cas9 and DNA |
△G(i) | The calculated △G for the ith position of gRNA-DNA interaction |
a | The input of △G’, a vector consist of △G’(1)-(21) |
b | The constant representing all interaction in the binding process other than the gRNA-DNA interaction. |
ω | The weight vector |
F() | Relation function |
Ka | Association constant of gRNA-Cas9 and DNA |
Kd | Dissociation constant of gRNA-Cas9 and DNA |
[A]0 | the concentration of certain sequence in the pre-selection library |
[A]tot | the concentration of all DNA sequence in the pre-selection library |
[C] | the concentration of certain sequence in the post-selection library |
[C]tot | the concentration of all DNA sequence in the post-selection library |
A’ | the number of certain sequence we sampled from the pre-selection library |
Atot’ | the number of all sequence we sampled from the pre-selection library |
P’ | the number of certain sequence we sampled from the post-selection library |
Ptot’ | the number of all sequence we sampled from the post-selection library |
θ | Cas9 targeting efficiency |
S | Substrate, DNA |
E | Enzyme, gRNA-Cas9 |
P | Product, double strands broken DNA |
A | The intact DNA duplex |
B | the DNA molecule in which one of the two strands has been cleaved at the recognition site for the restriction enzyme |
C | the DNA molecules in which both strands have been cleaved at the recognition site |
ka,kb | The two apparent first-order reaction constant of the two steps of cleaving of Cas9 |
k1,k-1,kcat | Reaction constants |
KM | MM constant |
R | Gas constant |
T | Absolute temperature |
pb | Binding probability |
pc | Cutting probablity |
Abbreviation | |
dCas9 | Deactivated Cas9, Cas9 inhibitor, a Cas9 with two mutations D10A and H841A |
aCas9 | Cas9 activator, a dCas9 that fused with a activator domain like VP64, TAL and omega subunit of RNAP. |
d/aCas9 | Deactivated Cas9, no matter whether it’s an activator or inhibitor |
3. Modeling result
We employ a NN nearest neighbor model to calculate the △G(i) between gRNA and DNA on each NN position. From the first nucleotide of the target area of gRNA to the 20th, △G(i) of totally 21 position are calculated. We first proved the feasibility of our idea by calculating the correlation between △G(i) and cutting efficiency (employing data from [1] CLTA1,2,3).Sequence | Single Mismatch tolerance | G/C | Ref |
TCATGCTGTTTCATATGATC | low | 7 | [4] |
AACTTTCAGTTTAGCGGUCU | low | 8 | [3] |
TGTGAAGAGCTTCACTGAGT | low | 9 | [1] |
GATGCCGTTCTTCTGCTTGT | low | 10 | [8] |
AGTCCTCATCTCCCTCAAGC | low | 10 | [1] |
GAGATGATCGCCCCTTCTTC | low | 11 | [2] |
CTCCCTCAAGCAGGCCCCGC | low | 15 | [1] |
Ave. G/C 10 | |||
GCAGATGTAGTGTTTCCACA | high | 9 | [1] |
GGTGGTGCAGATGAACTTCA | high | 10 | [8] |
GGGGCCACTAGGGACAGGAT | high | 13 | [2] |
GTCCCCTCCACCCCACAGTG | high | 14 | [2] |
GGGCACGGGCAGCTTGCCGG | high | 16 | [8] |
Ave. G/C 12.4 |
4. Model derivation
4.1. Calculation of △G’ of DNA-gRNA binding The calculation method of △G(i) and △G’ is modified from the NN nearest neighbor model introduced in [2].5. Discussion
There may be three reason for the correlation variation throughout output 1-19.
First, the Cas9 has a mismatch tolerance for the 5’ end of gRNA. This is backed by all studies [1,2,3,4,7,8].
Second, there are flaws in the calculation of terminal energy. As all terminal mismatch of RNA and most for DNA are stabilizing [5,6]. The NN model may fail to catch all these stabilizing effect. So improvement of the energy calculation rules may help to fix the negative correlation.
Three, the NN model is derived from the binding energy database of free binding DNA double strands, while we employed it to calculate Cas9 influenced RNA-DNA binding. We considered this as the prime source of error in our model. And it may contribute to the funny correlation valley in △G(8)~△G(12). Or, maybe the valley means that Cas9 is indeed relatively insensitive to energy changes in these position.