Team:Freiburg/Project/modeling

From 2013.igem.org

(Difference between revisions)

Revision as of 13:40, 4 October 2013

Modeling our dCAS

Introduction

We used a thermodynamic approach to model and characterize our system. It is based on various ordinary differential equations (ODE) which describe the behaviour of our network. Due to the limited measurement possibilities and the unwritten law, that you should at least measure half of the number of components of your network we started by using a small network with a limited amount of different components.

The Networks

1. dCAS-VP16

Our network includes four different components dCas9-VP16, a RNA complex (tracr/cr RNA), a RNA-dCas9-VP16 complex and the secreted alkaline phosphatase (SEAP). The RNA-complex is bound to dCas9-VP16 and the resulting complex binds the DNA, which leads to the production of SEAP.

Figure 1: Transcriptional Activation via dCas9-VP16:
The dCas9-VP16 fusion protein is guided to the desired DNA sequence by a co-expressed crRNA and tracrRNA. The binding of the gene recognition complex leads to an expression of SEAP.

Setting up the ODE

According to the graphical reaction network the ODE can be set up.

dCas9 is constitutively expressed by the CBh promoter and degraded proportionately to the current concentration. It is used to build the DNA recognition complex and is produced during complex decay.

The RNA-complex is built linearly. The production constant can be regarded as equivalent to the production constant of the lower expressed RNA (either crRNA or tracrRNA), since their expression limits the complex building. It is assumed that the RNA is degraded after DNA recognition complex decay and therefore the complex decay does not lead to an increase in the amount of RNA.

The DNA recognition complex is built when dCas9 and RNA meet and is degraded proportionately to the current DNA recognition complex concentration.

There is a leaky SEAP production and one that depends on the current concentration of the dCas9/RNA complex. This dependency is assumed to follow the Monod-kinetic. Because of the long half-life period (T₂ > 500 h) of SEAP we can neglect the SEAP decay [2, 3].

The parameters are:
k₁: linear production rate of Cas9 k₂: Cas9 degradation rate k₃: tracr/crRNA production rate k₄: tracr/crRNA degradation rate k₅: gene recognition complex building rate	k₆: cr/trRNA /Cas9 degradation rate k₇: SEAPs leaky production rate k₈: Complex dependent SEAP production rate k₉:

2. dCAS-KRAB

The dCas-KRAB sytem was modeled as a three component system. tetR as activator of gene production, dCas-KRAB as repressor and SEAP as final reporter protein.

Figure 2: Transcriptional respression via dCAS-KRAB:
The dCAS-KRAB fusion protein binds to the desired target sequence at a different locus than the tetR. tetR in turn binds to tetO and is assumed to repress SEAP production.

Setting up the ODE

According to the graphical reaction network the ODE can be set up.

dCas9 is constitutively expressed by the CBh promoter and degraded proportionately to the current concentration, similar to the activation model. It binds to the DNA and is released afterwards. Since the amount of transfected dCas9-plasmid is four times as high as that of dCas-KRAB the basal expression rate is equally assumed to be four times as high as that of the tetR.

tetR is assumed to have a similiar kinetic as Cas.

There is a leaky SEAP production and one that depends on the current concentration of dCas9 and tetR. This dependency is assumed to follow an allosteric inhibition process. Similar to the activation process the SEAP decay can be neglected because of the long half-life period (T₂ > 500 h) of SEAP [2, 3].

The parameters are:
k₁: linear production rate of Cas9 k₂: Cas9 degradation rate k₃: tetR production rate k₄: tetR degradation rate	k₅: SEAPs leaky production rate k₆: k₇: k₈:

Determining the parameters

By setting up the ODE a n-dimensional hypothesis space (n is the number of parameters) is generated. Determining the right parameter combination means finding a point in the space which fits the data best.
To find these parameters we used the maximum likelihood approach. The maximum likelihood hypothesis is the procedure which has the highest probability to generate the measured data. It was shown [3], using the maximum likelihood approach and assuming gaussion noise to the data (an assumption that holds in our case) leads to a least-square error minimization problem.

A minimization problem is an optimization problem. One searches for parameters (p0) for which the assumption holds, that the value of the function (f) at the point of the parameters is smaller than all other values. (f(p0)<=f(p)). In three dimensions the function can be regarded as a landscape and minimization as finding the deepest valley. Depending on the method one uses different problems arise. The most common problem is finding an only local minimum and not the global one (Fig. 3).

Figure 3: Example of a minimization problem
A 3D landscape is shown. Depending on the start position (the initial parameters), the minimum which is found can either be a local one or the global minimum.

To avoid this issue and to make sure a global minimum was found we started our minimization procedure using different start values for our parameters. To sample these parameters we used the latin hypercube sampling on a logarithmic scale (Fig. 4).

Figure 4: Illustration of the latin hypercube sampling in a two dimensional parameter space.
The number of initial parameter vectors is 5. Therefore the parameter space is divided in 25 subspaces. One possible parameter combination is shown here.

The number of different initial parameter settings is set to N and thus the parameter space is divided in N*N subspaces. For the initial parameter the values are chosen in a way that there is only one parameter in each row and column.

The resulting errors were plotted in an increasing order to be sure the global minimum could be found.

Data generation

dCas9 was quantified by using Western blots. SEAP as target protein that could be quantified by a SEAP assay. For more detailed information refer to our modeling notebook.

Fitting Procedure and Results

1. dCas9-VP16

Assuming the given ODE and using the fminsearch-function of matlab for various initial parameter vectors the fitting process resulted in one optimal parameter composition. Our measurement time started with a transfection. The first data were obtained after 6 h. As the initial concentrations were unknown all 14 parameters (including the initial concentrations) were assumed to obtain changing values during the fit. This procedure results in unlikely high initial concentrations. Although the transfection end point could not clearly be defined only a small amount of the components should be present within the cells after transfection . A second process in which the initial concentrations were assumed to be zero followed the first one. The change between fixed and variable parameters was easy to perform, because of an additional vector (qfit). This vector contained boolean values depending on whether the parameter was fixed or flexible during the fitting process. Moreover another parameter was required for adjusting the absolute values of the duplicates. The results of this second process are shown in table 1.

The model reflects the general construction of the network (Fig. 5). As assumed dCas9 converges asymptotically to a stable state and an exponential increase in the SEAP concentration can be observed. Both processes are reflected in the data. The model also shows a potential behavior of the non measured components, the free tracr/crRNA-complex and the gene recognition complex. The tracr/crRNA-complex reaches its steady state quickly. However there is no possibility to distinguish between the two different RNAs. There might be some differences in their expressions, especially because of the different promoters (crRNA expressed under U6-promoter; tracrRNA expressed under h1-promoter), however, the model does not show them.
The free gene recognition complex has not reached its steady state yet due to the small degradation rate.

Figure 5: Modeling Result:
Shown are the experimental results (purple square) in comparison to the model prediction values (cyan cross) for SEAP and Cas, as well as the model prediction for not measured components

Because of the fact that the fminsearch algorithm is not proved to converge to a minimum [1], we started at different points in the parameter space and therefore the probability of having found the global minimum is high (Fig. 6) .

Figure 6: Different error values plotted in increasing order.

Table 1: Overview of the parameter results of the dCas9-VP16 fitting process

parameter	value
linear production rate of Cas9	8.78E+02 [M/h]
Cas9 degradation rate	0.54 [1/h]
Tracr/crRNA production rate	0.73
Tracr/crRNA degradation rate	1.21E+08
Gene recognition complex building rate	1.61 [1/M]
cr/trRNA /Cas9 degradation rate	8.59E-26
SEAPs leaky production rate	4.05E-05 [U/l*h]
k₈	1.86E+08 [U/l*h]
k₉	1.06
Cas9 degradation rate	0.00 [M]
Tracr/crRNA -Komplex	0.00
gene recognition complex	0.00
SEAP	2.73E-09 [U/l]
least square error	159.3423

2. dCas-KRAB

The same procedure was done to find the parameters for dCas-KRAB. We started setting all parameters as variable and after that we set the initial concentrations to zero. Fitting with zero initial concentrations led to a lower least square error. As assumed before the dCas-KRAB reachs its steady state within the measured time periode, approxiametly at the same time as Cas9-VP16. However the expected stop of SEAP concentration arising could not be observed. The effect of reduction is because of the high Cas9 to tetR ratio to low.

Figure 7: Modeling Result:
Shown are the experimental results (purple square) in comparison to the model prediction values (cyan cross) for SEAP and Cas9, as well as the model prediction for the not measured component tetR.

This time we also fitted the different error values. (Fig.8) .

Figure 8: Different error values plotted in increasing order.

Table 2: Overview of the parameter results of the dCas9-KRAB fitting process

parameter	value
linear Cas production rate	7.04E-04 [M/h]
Cas degradation rate	1.55E+02 [1/h]
tetR production rate	5.59E+09
tetrR degradation rate	1.29E-04
SEAPs leaky production rate	2.70E-15 [U/l]
k₆	0.18 [U/l]
k₇	1.27E+00 [1/M]
k₈	4.30E+13
Cas9	0.00 [M]
tetR	0.00
SEAP	8.90E-19 [U/l]
Error	119.6221

The Code Files

References

(1) Lagarias, J. C., J. A. Reeds, M. H. Wright, and P. E. Wright. (1998). Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions." SIAM Journal of Optimization, Vol. 9, Number 1, 112–147 .

(2) Müller K, Engesser R, Metzger S, Schulz S, Kämpf MM, Busacker M, Steinberg T, Tomakidi P, Ehrbar M, Nagy F, Timmer J, Zubriggen MD, Weber W. (2013). A red/far-red light-responsive bi-stable toggle switch to control gene expression in mammalian cells. Nucleic acids research 41:e77. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3627562&tool=pmcentrez&rendertype=abstract. .

(3) Müller K, Engesser R, Metzger S, Schulz S, Kämpf MM, Busacker M, Steinberg T, Tomakidi P, Ehrbar M, Nagy F, Zubriggen MD, Weber W. (2013). A red / far-red light-responsive bi-stable toggle switch to control gene expression in mammalian cells Supplementary Information . Design and parameterization of the mathematical model.

@@ Line 207: / Line 207: @@
 To find these parameters we used the maximum likelihood approach. The maximum likelihood hypothesis is the procedure which has the highest probability to
-generate the measured data. It is shown <span id="refer"> <a href="#(3)"> [3]</a></span>, that using the maximum likelihood approach and assuming gaussion noise in the data (an assumption that holds in our case) leads to a
+generate the measured data. It was shown <span id="refer"> <a href="#(3)"> [3]</a></span>, using the maximum likelihood approach and assuming gaussion noise to the data (an assumption that holds in our case) leads to a least-square error minimization problem. </p>
-least-square error minimization problem. </p>
+<p>A minimization problem is an optimization problem. One searches for parameters (p0) for which the assumption holds, that the value of the function (f) at the point of the
-<p>A minimization problem is an optimization problem. You search for parameters (p0) for which holds, that the value of the function (f) at the point of the
+parameters is smaller than all other values. (f(p0)<=f(p)). In three dimensions the function can be regarded as a landscape and minimization as finding the deepest
-parameters is smaller than all other values. (f(p0)<=f(p)). In three dimensions the function can be thought as a landscape and minimization is finding the deepest
+valley. Depending on the method one uses different problems arise. The most common problem is finding an only local minimum and not the global one <span id="refer"> <a href="#Fig3">(Fig. 3)</a></span>. </p>
-valley. Depending on the method you use different problems arise. The most common problem is finding only a local minimum and not the global one <span id="refer"> <a href="#Fig3">(Fig. 3)</a></span>. </p>
 <div>
@@ Line 223: / Line 221: @@
 		</tr>
 		<tr>
-			<td> <b  >Figure 3: Example of a minimization problem.</b><br> Shown is a 3D landscape. Depending on the start position (the initial parameters), the found minimum is either a local or the global one.</td>
+			<td> <b  >Figure 3: Example of a minimization problem</b><br>A 3D landscape is shown. Depending on the start position (the initial parameters), the minimum which is found can either be a local one or the global minimum.</td>
 		</tr>
 	</tbody></table>
@@ Line 229: / Line 227: @@
-<p> To avoid this and
+<p> To avoid this issue and to make sure a global minimum was found we started our minimization procedure using different start values for our parameters. To sample these parameters we used the latin hypercube sampling on a logarithmic scale <span id="refer"> <a href="#Fig4">(Fig. 4)</a></span>. </p>
-to be sure to have found a global minimum we started our minimization procedure using different start values for our parameters. To sample these parameters we used
-the latin hypercube sampling on a logarithmic scale <span id="refer"> <a href="#Fig4">(Fig. 4)</a></span>. </p>
 <div>
@@ Line 240: / Line 234: @@
 			<td> <img id="bild" src="https://static.igem.org/mediawiki/2013/f/fa/Freiburg2013_Lhs_design_erklaerung.png"> </td>
-			<td> <b  >Figure 4: Illustration of the latin hypercube sampling in a two dimensional parameter space.</b><br> The number of initial parameter vectors is 5. Therefore the parameter space is divided in 25 subspaces. Shown is one possible parameter combination.</td>
+			<td> <b  >Figure 4: Illustration of the latin hypercube sampling in a two dimensional parameter space.</b><br> The number of initial parameter vectors is 5. Therefore the parameter space is divided in 25 subspaces. One possible parameter combination is shown here.</td>
 		</tr>
 	</tbody></table>
 </div>
-<p>The number of different initial parameter settings is set to N and thus the parameter space is divided in N*N subspaces. For the initial parameter the values are chosen so that there is only one parameter in each row and column. </p>
+<p>The number of different initial parameter settings is set to N and thus the parameter space is divided in N*N subspaces. For the initial parameter the values are chosen in a way that there is only one parameter in each row and column. </p>
-<p>The resulting errors we plotted in an increasing order to be sure to have found a global minimum.</p>
+<p>The resulting errors were plotted in an increasing order to be sure the global minimum could be found.</p>
 <p id="h3"> Data generation </p>
-<p> Cas9 is quantified by using Western blot and we used SEAP as target protein that can be quantified by a SEAP assay.
+<p> dCas9 was quantified by using Western blots. SEAP as target protein that could be quantified by a SEAP assay.
-For more detailed information refer our <a id="link" href="https://2013.igem.org/Team:Freiburg/Notebook/modeling">modeling notebook</a>. </p>
+For more detailed information refer to our <a id="link" href="https://2013.igem.org/Team:Freiburg/Notebook/modeling">modeling notebook</a>. </p>
 <p id="h3"> Fitting Procedure and Results </p>
-<p id="h4"> 1. dCas-VP16 </p>
+<p id="h4"> 1. dCas9-VP16 </p>
-<p>Assuming the given ODE and using the fminsearch-function implemented in matlab with various initial parameter vectors the fitting process results in one optimal parameter composition. Our  measurement time starts with the transfection, however the first data we got at time point 6, therefore all 14 parameters including the initial concentrations were set variable. This results in unlikely high initial concentration. Unlikely, because although the transfection end point is not clearly to define after transfection there should only be a small amount of the components. A second process in which the initial concentrations were assumed as zero followed the first one. The change between fixed and variable parameters was easily to perform, because of an additional vector (qfit). This vector contains boolean values depending on wether the parameter is fixed or flexible during the fitting process. Moreover another parameter is required for adjusting the absolute values of the duplicates. The results of this second process are shown. <span id="refer"> <a href="#Tab1">(Table 1)</a></span>. </p>
+<p>Assuming the given ODE and using the fminsearch-function of matlab for various initial parameter vectors the fitting process resulted in one optimal parameter composition. Our  measurement time started with a transfection. The first data were obtained after 6 h. As the initial concentrations were unknown all 14 parameters (including the initial concentrations) were assumed to obtain changing values during the fit. This procedure results in unlikely high initial concentrations. Although the transfection end point could not clearly be defined only a small amount of the components should be present within the cells after transfection . A second process in which the initial concentrations were assumed to be zero followed the first one. The change between fixed and variable parameters was easy to perform, because of an additional vector (qfit). This vector contained boolean values depending on whether the parameter was fixed or flexible during the fitting process. Moreover another parameter was required for adjusting the absolute values of the duplicates. The results of this second process are shown in <span id="refer"> <a href="#Tab1">table 1</a></span>. </p>
 <p>
-The model reflects the general construction of the network <span id="refer"> <a href="#Fig5">(Fig. 5)</a></span>.  As assumed Cas9 converges asymptotically to a stable state and there is an exponential increase in the SEAP concentration, both processes are reflected in the data.
+The model reflects the general construction of the network <span id="refer"> <a href="#Fig5">(Fig. 5)</a></span>.  As assumed dCas9 converges asymptotically to a stable state and an exponential increase in the SEAP concentration can be observed. Both processes are reflected in the data.
-The model also shows a potential behaviour of the not measured components, the free tracr/crRNA-complex and the gene recognition complex. The tracr/crRNA-complex reachs its steady state quickly.  There is however no possibility to distinguish between the two different RNAs. There might be some differences in their expressions, especially because of the different promoters (crRNA expressed under U6-promoter; tracrRNA expressed under h1-promoter), however the model won't show them. <br> The free gene recognition complex has, because of the small degradation rate, not reached its steady state yet.
+The model also shows a potential behavior of the non measured components, the free tracr/crRNA-complex and the gene recognition complex. The tracr/crRNA-complex reaches its steady state quickly.  However there is no possibility to distinguish between the two different RNAs. There might be some differences in their expressions, especially because of the different promoters (crRNA expressed under U6-promoter; tracrRNA expressed under h1-promoter), however, the model does not show them. <br> The free gene recognition complex has not reached its steady state yet due to the small degradation rate.
 </p>