Team:Manchester/Enzyme

From 2013.igem.org

(Difference between revisions)

Latest revision as of 02:23, 29 October 2013

page

Home

Team
- Team Profile

Project
- Project Overview
- Notebook
- Lab Book
- Parts
- Safety
- Judging
- Attributions

Modelling

Human Practices

Sponsorship

Top

Safety

Summary

Working with a pathway as large and uncharacterised as the fatty acid biosynthesis presented many challenges, the most important of which was the lack of reliable, experimentally established kinetic values for many of the key reactions. Our solution was to create a model that explicitly acknowledges this lack of data and the resulting uncertainty, using Monte Carlo sampling from plausible parameter value distributions -- enabling us to produce model predictions with confidence intervals. We believe that this unusual and innovative modelling strategy can potentially serve as a general principled approach to handling parameter uncertainty in the future. Synthetic Biology will always operate at the cutting edge of current knowledge and thus will unavoidably face the challenge of uncertainty. Building models with incorporated acknowledgment of uncertainty will yield model predictions with specified confidence intervals, and thus will lead to more robust design strategies for a wide range of engineered cellular machines.

Aim

To use uncertainty modelling to model E. coli fatty acid biosynthesis.

Early modelling attempts using traditional methods of modelling were largely unsuccessful, due to the the nature of the fatty acid biosynthesis pathway, and the lack of experimentally defined kinetic values. Rather than use models that were arbitrary or lacked information, we decided to use a less traditional method, based on Monte Carlo sampling, that can give us a clear idea of what the uncertainty of our predictions might be. By embracing this uncertainty, we hoped to create a model with practical, representative results.

Objectives

To build the first kinetic model of fatty acid biosynthesis in E. coli using uncertainty modeling
To represent the fatty acid production in our system: E.c(oil)i
To identify areas of the pathway requiring further study in the lab

Introduction

Fatty acid biosynthesis is a process that occurs in all living organisms. Glucose is converted into acetyl-CoA through the citric acid cycle, which is fed into the fatty acid biosynthesis pathway. Here it reacts with malonyl-CoA to form a four carbon compound. The four carbon compound is then reduced and dehydrated via four successive steps, executed, with the help of NADPH, by the enzymes as indicated in Figure 1. To this resulting C4 body, another malonyl-CoA reacts to form a C6 body - which is converted the same manner as the previous C4 body. A number of unchanging enzymes act on the intermediates of this cyclic pathway to ultimately produce fatty acids. From the initial reaction to the end products the whole pathway numbers 43 reactions, about 60 metabolites and 267 parameters.

In synthetic biology two main classes of computational models are commonly used: constraint-based genome-scale models and differential-equation-based dynamic models. In our project, we were interested in the concentrations of compounds and their dynamic changes as well as the reactions with the highest control over the fatty acid synthesis pathway. As our analysis would not be possible with a purely constraint-based model, we chose to use a dynamic model. However, to use a dynamic model one needs to know the enzyme kinetic parameters, and these are often unknown or very unreliable for enzymes. Uncertainty can be due to:

experimental uncertainty
in vitro measurements of enzyme kinetics are not always representative of in vivo conditions
compound concentrations often have dynamic changes

We wanted to account for the uncertainty in the fatty acid synthesis pathway parameter data by using a new “uncertainty modelling” approach, which can potentially serve as a principled approach to handling parameter uncertainty in the future.

Building models with incorporated acknowledgment of uncertainty will produce specified confidence intervals for all model predictions and thus could lead to robust design of engineered cellular machines of fatty acid synthesis and beyond.

Method

Collected parameter values from literature

Using the online database BRENDA, we searched for published parameter values for every enzyme in the pathway. In our search we discovered that the published data on the value of parameters in the E. coli fatty acid biosynthesis pathway is limited. Hence, we decided we needed to take this into account in our model.

Categorized parameters into three groups

Group 1: ☑Mean ☑Standard Deviation
Group 2: ☑Mean ☐Standard Deviation
Group 3: ☐Mean ☐Standard Deviation

Filling in the missing information for each parameter

In the case of group one, both the mean and standard deviation were collected from the literature and used to determine the probability distribution. In the case of group two, we used the mean found in the literature and the standard deviation of all enzymes of the same class or subclass with known kinetic parameters. In the case of group three, we used both the mean and the standard deviation obtained from all enzymes of the same sub-class to create a distribution. Here is a file with the distributions of enzyme classes and subclasses. To illustrate the generation of a distribution for plausible values of a parameter, we show an example parameter, DH_OHC4_OHC4, which is involved in the first reaction catalyzed by FabA. As this enzyme was chategorized into group three, we took the mean and standard deviation for the Km values listed in the same subclass (EC 4.2.1) and generated the following distribution of plausible parameters for the Km of DH_OHC4_OHC4:

Random sampling values from each parameter distribution

Once each parameter had a probability distribution associated with it, we randomly sampled values from each plausible parameter distribution to use in our model simulation. First we constructed an initial model in Copasi, using appropriate enzyme kinetic equations. The rate equations used in our model can be generalised as follows:

We used these rate equations to complete our model of the fatty acid synthesis pathway by adding the thioesterase reactions required for the production of the final fatty acids as well as the reactions catalyzed by FadD. Once our model was complete enough to represent our system, we exported this from COPASI in SBML format and converted it to a PySCeS compatible file. PySCeS uses a set of non-linear differential equations to obtain both structural and kinetic information about the system from these randomly generated kinetic values. Below is pseudocode of our workflow, the full script to randomly sample the parameters and generate model predictions can be downloaded here.

Creation of 1,000 models with distribution of predictions

We automated the generation of models to create a collection of 1,000 models. From there we were able to determine the uncertainty in our model predictions: instead of a single prediction, we have a distribution of predictions from a large collection of plausible models.

Results

The concentrations of the metabolites was outputted in tables, as depicted in Figure 3. Each line represents one simulation with ten different time points within 100 seconds. The whole data set of all simulations was then attributed with colours according concentration values. Another table was generated out of this chart according to the ordinal data obtained from colouring the metabolite concentrations. This was done to further improve ease of work and making the data more visual. Figure 4 shows the summary of this qualitative concentration distribution for each metabolite. Again, the brighter green a cell is in colour, the more often simulations rendered metabolite concentrations in the specified concentration interval. For example, the last metabolite in the table C18CoA is bright green, because all 41 simulations rendered between 0.01 - 1 mM. Out of this table, a clear distribution becomes obvious: Except for the first six initial replenishing reactions, all metabolite concentrations are within a small reasonable range mostly between 0.01 - 1 mM. Interestingly, in the reaction towards the end of the pathway, which are responsible for removing the metabolites from the system and therefore give rise to stearic and palmitic acid (our desired products) the range of results appears to be significantly narrower, despite the uncertainty.

The analysis of the data shows clearly, that due to a small and reasonable range of metabolite concentrations which stabilises towards the end of the model, a high validity of our functioning model can be safely assumed and demonstrates that the uncertainty is not globally deleterious. Even though the model was working with high uncertainties in data, the output is always within a valid range.

Upon analysing the degree of certainty in our model, and finding that it was at a level that we believe is suitable for further analysis, we were able to create a series of boxplots showing the range of values found within our simulations for species accumulation after 100 seconds. We focused on the longer chain fatty acids, which are the engineering target of our pathway. The order in which the species are shown in the box plots, Figure 5, is also the order in which they are formed. This is also shown in Figure 6, where the colour corresponds to the colour of the bar on the box plot.

These results further emphasise that although we created a model based on uncertain parameters, by embracing this uncertainty we have been able to make a model that gives us useful information – and that allows us to specify for every single prediction how certain we can be of getting it right, in particular towards the end of the pathway.

Similar data analysis was carried out on the rates of the reactions, shown in Figure 7. We focused on the reactions we had labelled AAT at the end of our pathway. These are thioesterase reactions directly responsible for the formation of palmitic and stearic acid. We can see that the rates for these reactions also fall within a relatively small range.

Conclusion

Kinetic Pathway modelling demands abundant information of the kinetic parameters. Literature research, however, showed that these were not available sufficiently or involved measurement errors. Hence this knowledge of parameter values often is uncertain. Therefore, we had to choose an approach that is able to deal with these limitations. Uncertainty modelling proved to be the most promising and useful tool for this. Even though the available data was limited, we managed to create a functioning kinetic model of the fatty acid synthesis pathway. This has not been done before and would not have been possible with any traditional approach.

A prime example of how our metabolic modelling work directly informed our experimental work is in our decision to biobrick the FabA gene (encoding β-hydroxydecanoyl-ACP dehydrase, shown by the DH_OH reactions in this model). Our uncertainty model had shown us that we would need more kinetic data on key enzymes. The least characterised reaction was catalyzed by the product of the fabA gene, therefore we wished to not only biobrick this gene, but a His-tag to purify the enzyme in order to experimental gauge its activity.

However, having taken pains to ensure our model was as realistic as possible, the idea of the insertion of a his-tag that could affect the activity of the enzyme seemed at odds to our overall goal. Therefore, we used further modelling technique to ensure the addition of this his tag would have as little overall bearing on the activity of the enzyme as possible. This can be found here

Future Applications: Potentials and Limitations

We believe that this approach to modelling could have a big impact in terms of how Synthetic Biology is modelled in the future and demonstrates a method in which, by facing the uncertainty of modelling head-on and incorporating this into our approach in a principled manner, it is possible to produce valuable models. This is particularly important in the field of Synthetic Biology, where systems, even if well characterised in one organism, are unlikely to have the same parameters when expressed in another organism.

This approach gives us the ability to model complex and poorly experimentally measured systems, where previous attempts may have produced unrepresentative models. Since the Km values can be sampled from a distribution, the model can be used to determine outcomes that may not be obvious with the use of a single Km value.

However, it is important to note that this method of modelling may not be appropriate in every case. The largest limitation of our use of this method is the inability of some of our simulations to reach steady state. This is likely to be a result of the random combination of parameter values. As the models were not fine-tuned, they will not always work. Although, we consider this as a potential strength as we can clearly highlight possible break points in the system that require further analysis. We show this in our own studies of β-hydroxydecanoyl-ACP dehydrase, described above.

Synthetic Biology operates at the cutting edge of current knowledge. Therefore, it will unavoidably face the challenge of uncertainty. Building models with incorporated acknowledgment of uncertainty will yield model predictions with specified confidence intervals, and thus will lead to more robust design strategies for a vast range of engineered cellular machines.

Appendices

The full spreadsheets with reaction rates and species concentrations can be found here:
Reaction Rates
Species Concentrations

Nomenclature of main metabolites

MODELLING

UNCERTAINTY ANALYSIS

FabA PROTEIN MODEL

POPULATION DYNAMICS

MODELLING COLLABORATION

Summary

Aim

Introduction

Method

Results

Conclusion

Future Applications: Potentials and Limitations

Appendices

@@ Line 6: / Line 6: @@
 <head>
 <title> Safety </title>
+<script type='text/javascript'>
+function blocking(nr)
+      {
+        displayNew = (document.getElementById(nr).style.display == 'none') ? 'block' : 'none';
+         document.getElementById(nr).style.display = displayNew;
+      }
+</script>
 <style type="text/css">
@@ Line 85: / Line 96: @@
 padding:10px;
 background-color:white;
+-webkit-box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
+-moz-box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
+box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
+}
+.wrapper2
+{
+position:absolute;
+clear:both;
+width:940px;
+top:120px;
+left:0;
+padding:10px;
+background-color:#f2f2f2;
+-webkit-box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
+-moz-box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
+box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
+}
+.menu
+{
+margin:5px auto;
+width:900px;
+}
+.menu li
+{
+list-style:none;
+}
+.menu li #mlink
+{
+display:block;
+width:900px;
+text-decoration:none;
+margin-bottom:5px;
+font-family:Trebuchet MS;
+font-weight:bold;
+font-size:20px;
+color:white;
+background-color:#660099;
+padding:7px 5px 5px 5px;
+-webkit-border-radius: 10px;
+border-radius: 10px;
+-webkit-box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
+-moz-box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
+box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
+}
+.menu li a #date
+{
+margin-left:10px;
+margin-right:10px;
+}
+.menu li a #arrow
+{
+margin-left:10px;
+}
+#moretext
+{
+margin:0 auto;
+margin-bottom:5px;
+width:700px;
+font-family:Trebuchet MS;
+font-weight:bold;
+font-size:14px;
+color:white;
+background-color:#BDBDBD;
+padding:5px;
 -webkit-box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
@@ Line 240: / Line 326: @@
 box-shadow: 0px 0px 5px 0px rgba(0,0,0,0.75);
 }
 .block1 a, .block2 a, .block3 a, .block4 a, .block5 a, .block6 a, .block7 a, .block8 a, .block9 a, .block10 a, .block11 a, .block12 a, .block13 a
@@ Line 289: / Line 377: @@
 .block3 a
 {
+background:#bc80ea;
 display:block;
 float:left;
@@ Line 307: / Line 395: @@
 .block5 a
 {
-background:#bc80ea;
 display:block;
 float:left;
@@ Line 330: / Line 418: @@
 }
-.question1 a, .question2 a, .question3 a, .question4 a, .question5 a
+.question1 a, .question2 a, .question3 a, .question4 a, .question5 a, .question6 a, .question7 a, .question8 a
 {
 width:120px;
@@ Line 344: / Line 432: @@
 }
-.question1 a:hover, .question2 a:hover, .question3 a:hover, .question4 a:hover, .question5 a:hover
+.question1 a:hover, .question2 a:hover, .question3 a:hover, .question4 a:hover, .question5 a:hover, .question6 a:hover, .question7 a:hover, .question8 a:hover
 {
 background:#C0C0C0;
@@ Line 391: / Line 479: @@
 }
-#Q1,#Q2,#Q3,#Q4,#Q5
+.question6 a
+{
+margin-top:1px;
+float:left;
+display:block;
+padding:5px;
+}
+.question7 a
+{
+margin-top:1px;
+float:left;
+display:block;
+padding:5px;
+}
+.question8 a
+{
+margin-top:1px;
+float:left;
+display:block;
+padding:5px;
+}
+#Q1,#Q2,#Q3,#Q4,#Q5,#Q6,#Q7,#Q8
 {
 text-decoration:none;
@@ Line 430: / Line 546: @@
 </head>
-<body onLoad="hoverLink1(); hoverLink2(); hoverLink3(); hoverImage1(); hoverImage2(); hoverImage3();
+<body onLoad="showImage(); blocking('moretext');
-              hoverLink4(); hoverImage4(); hoverLink5(); hoverImage5();">
+hover1(); hover2(); hover3(); hover4(); hover5(); hover6(); hover7(); highlight(); ">
 <div class="header">
@@ Line 446: / Line 563: @@
             <div class="central">
-	     <a><img src="https://static.igem.org/mediawiki/2013/6/66/FABMan.png"></a>
+	     <a><img src="https://static.igem.org/mediawiki/2013/3/3c/EnzymeMan.png"></a>
 	   </div>
@@ Line 461: / Line 578: @@
            <div class="text3">
-<p> <b> <u> Summary </p> </u> </b>
+<p><a id="Q1"> <b> <u> Summary </p> </u> </b>
-<p>Working with a pathway as large and uncharacterised as the fatty acid biosynthesis presented many challenges, the most important of which was the lack of reliable, experimentally established kinetic values for many of the key reactions. Our solution was to create a model that explicitly acknowledges this lack of data and the resulting uncertainty, using Monte Carlo sampling from plausible parameter value distributions -- enabling us to produce <b>model predictions with confidence intervals</b>. We believe that this <b>unusual and innovative modelling strategy</b> can potentially serve as a general principled approach to handling parameter uncertainty in the future. Synthetic Biology will always operate at the cutting edge of current knowledge and thus will unavoidably face the challenge of uncertainty. Building models with incorporated acknowledgment of uncertainty will yield model predictions with specified confidence intervals, and thus will lead to more robust design strategies for a wide range of engineered cellular machines. </p>
+<p align="justify">Working with a pathway as large and uncharacterised as the fatty acid biosynthesis presented many challenges, the most important of which was the lack of reliable, experimentally established kinetic values for many of the key reactions. Our solution was to create a model that explicitly acknowledges this lack of data and the resulting uncertainty, using Monte Carlo sampling from plausible parameter value distributions -- enabling us to produce <b>model predictions with confidence intervals</b>. We believe that this <b>unusual and innovative modelling strategy</b> can potentially serve as a general principled approach to handling parameter uncertainty in the future. Synthetic Biology will always operate at the cutting edge of current knowledge and thus will unavoidably face the challenge of uncertainty. Building models with incorporated acknowledgment of uncertainty will yield model predictions with specified confidence intervals, and thus will lead to more robust design strategies for a wide range of engineered cellular machines. </p>
 </p>
-          </div>
-<div class="text3">
-            <p> <u> <b> Aim </u> </b> </p>
-<p> <b> To use uncertainty modelling to model E.coli fatty acid biosynthesis.</p> </b>
-<p> Early modelling attempts using traditional methods of modelling were largely unsuccessful, due to the the nature of the fatty acid biosynthesis pathway, and the lack of experimentally defined kinetic values. Rather than use models that were arbitrary or lacked information, we decided to use a less traditional method, based on Monte Carlo sampling, that can give us a clear idea of what the uncertainty of our predictions might be. By embracing this uncertainty, we hoped to create a model with practical, representative results. </p>
            </div>
 <div class="text3">
-             <p> <b> <u> Uncertainty </p> </b> </u>
+             <p><a id="Q2"> <u> <b> Aim </u> </b> </p>
-<p> In synthetic biology two main classes of computational models are commonly used: constraint-based genome-scale models and differential-equation-based dynamic models. In our project, we employed the latter approach, because we are interested in the concentrations of compounds and their dynamic changes, which cannot be predicted using purely constraint-based models. We also wanted to identify the reactions and corresponding enzymes with the highest control over the fatty acid synthesis pathway; again, this is not possible with constraint-based models. </p>
+<p> <b> To use uncertainty modelling to model <i>E. coli</i> fatty acid biosynthesis.</p> </b>
-<p> However, for a dynamic model one needs to know the enzyme kinetic parameters, and these are often unknown or very unreliable for enzymes of fatty acid biosynthesis. <b> We wanted to account for the resulting uncertainty using a new “uncertainty modelling” approach, which can potentially serve as a principled approach to handling parameter uncertainty in the future. </b> </p>
+<p align="justify"> Early modelling attempts using traditional methods of modelling were largely unsuccessful, due to the the nature of the fatty acid biosynthesis pathway, and the lack of experimentally defined kinetic values. Rather than use models that were arbitrary or lacked information, we decided to use a less traditional method, based on Monte Carlo sampling, that can give us a clear idea of what the uncertainty of our predictions might be. By embracing this uncertainty, we hoped to create a model with practical, representative results. </p>
+            <p> <u> <b> Objectives </u> </b> </p>
+<p>  <ul>
+<li> To build the first kinetic model of fatty acid biosynthesis in <i>E. coli</i> using uncertainty modeling </li>
+<li> To represent the fatty acid production in our system: <i>E.c(oil)i</i> </li>
+<li> To identify areas of the pathway requiring further study in the lab </li>
+</ul>  </p>
-<p> Building models with incorporated acknowledgment of uncertainty will produce specified confidence intervals for all model predictions and thus could lead to robust design of engineered cellular machines of fatty acid synthesis and beyond.
-</p>
            </div>
 <div class="text3">
-             <p><b><u>Fatty Acid Synthesis </u></b></p>
+             <p><a id="Q3"> <b> <u> Introduction </p> </b> </u>
-<p> Fatty acid biosynthesis is a process that occurs in all living organisms. Glucose is converted into acetyl-CoA through the citric acid cycle, which is fed into the fatty acid biosynthesis pathway. Here it combines with malonyl-CoA to first form a five carbon compound. The five carbon compound is then being converted into a four carbon compound via four successive steps, executed by the enzymes as indicated in Figure XXX. To this resulting C4 body, another malonyl-CoA is added to form a C7 body - which is converted the same manner as the previous C5 body. A number of unchanging enzymes act on the intermediates of this cyclic pathway to ultimately produce fatty acids.  From the initial reaction to the end products the whole pathway numbers <b> 43 reactions, about 60 metabolites and 267 parameters. </b> </p>
+<p align="justify"> Fatty acid biosynthesis is a process that occurs in all living organisms. Glucose is converted into acetyl-CoA through the citric acid cycle, which is fed into the fatty acid biosynthesis pathway. Here it reacts with malonyl-CoA to form a four carbon compound. The four carbon compound is then reduced and dehydrated via four successive steps, executed, with the help of NADPH, by the enzymes as indicated in Figure 1. To this resulting C4 body, another malonyl-CoA reacts to form a C6 body - which is converted the same manner as the previous C4 body. A number of unchanging enzymes act on the intermediates of this cyclic pathway to ultimately produce fatty acids.  From the initial reaction to the end products the whole pathway numbers <b> 43 reactions, about 60 metabolites and 267 parameters. </b> </p>
-<img src="https://static.igem.org/mediawiki/2013/6/6f/FABcycleedited.png" width="900"/>
+<img src="https://static.igem.org/mediawiki/2013/2/21/Fabcyclefixed.png" width="900" height="400"/>
-<p id="footer"><b>Figure 5. Overlay of structures from 1 ns βHACdH simulation. Images of overlaid from the following respective time points: 0 ps, 250 ps, 500 ps, 750 ps and 1000 ps with the following colours indicating each individual image: Green, Blue, Purple, Orange and Grey, respectively.  Both the N-Terminal and C-Terminal, are specified (Dotted Box), with a zoom in on each respective terminal at an angle appropriate to visualise the positions of the terminals.
+<p id="footer"><b>Figure 1: Fatty Acid Biosynthesis Pathway, thioesterase reaction (tesA) and Δ 9, Δ12 desaturase reactions.
 </b></p>
+<p align="justify"> In synthetic biology two main classes of computational models are commonly used: constraint-based genome-scale models and differential-equation-based dynamic models. In our project, we were interested in the concentrations of compounds and their dynamic changes as well as the reactions with the highest control over the fatty acid synthesis pathway. As our analysis would not be possible with a purely constraint-based model, we chose to use a dynamic model. However, to use a dynamic model one needs to know the enzyme kinetic parameters, and these are often unknown or very unreliable for enzymes. Uncertainty can be due to:
+<ul>
+<li> experimental uncertainty</li>
+<li><i>in vitro</i> measurements of enzyme kinetics are not always representative of <i>in vivo</i> conditions</li>
+<li> compound concentrations often have dynamic changes </li>
+</ul>
+<b> We wanted to account for the uncertainty in the fatty acid synthesis pathway parameter data by using a new “uncertainty modelling” approach, which can potentially serve as a principled approach to handling parameter uncertainty in the future. </b> </p>
+<p align="justify"> Building models with incorporated acknowledgment of uncertainty will produce specified confidence intervals for all model predictions and thus could lead to robust design of engineered cellular machines of fatty acid synthesis and beyond.
+</p>
            </div>
 <div class="text3">
-<p> <b> <u> Approach </p> </b> </u>
+<p><a id="Q4"> <b> <u> Method </p> </b> </u>
 <img src="https://static.igem.org/mediawiki/2013/7/7b/Workflowmod.png" width="900"/>
-<p id="footer"><b>Figure 5. Overlay of structures from 1 ns βHACdH simulation. Images of overlaid from the following respective time points: 0 ps, 250 ps, 500 ps, 750 ps and 1000 ps with the following colours indicating each individual image: Green, Blue, Purple, Orange and Grey, respectively.  Both the N-Terminal and C-Terminal, are specified (Dotted Box), with a zoom in on each respective terminal at an angle appropriate to visualise the positions of the terminals.
+<p id="footer"><b>Figure 2: Schematic workflow representation for building the dynamic uncertainty model split in 5 successive steps. See text for details.
 </b></p>
+<ol>
+<p> <b> <li type="1">Collected parameter values from literature</li></p></b>
+<p align="justify">Using the online database BRENDA, we searched for published parameter values for every enzyme in the pathway. In our search we discovered that the published data on the value of parameters in the <i>E. coli</i> fatty acid biosynthesis pathway is limited. Hence, we decided we needed to take this into account in our model.</p><br>
+<p><b><li type="1">Categorized parameters into three groups </li></b>
+<b>Group 1:</b> 	☑Mean 	☑Standard Deviation<br>
+<b>Group 2:</b> 	☑Mean	☐Standard Deviation<br>
+<b>Group 3:</b> 	☐Mean	☐Standard Deviation<br><br></p>
+<p><b><li type="1"> Filling in the missing information for each parameter</li></p></b>
+<p align="justify">In the case of group one, both the mean and standard deviation were collected from the literature and used to determine the probability distribution. In the case of group two, we used the mean found in the literature and the standard deviation of all enzymes of the same class or subclass with known kinetic parameters. In the case of group three, we used both the mean and the standard deviation obtained from all enzymes of the same sub-class to create a distribution. <a href="https://static.igem.org/mediawiki/2013/d/db/Manchester_Probability_Distributions.pdf" target="_blank">Here is a file with the distributions of enzyme classes and subclasses.</a> To illustrate the generation of a distribution for plausible values of a parameter, we show an example parameter, DH_OHC4_OHC4, which is involved in the first reaction catalyzed by FabA. As this enzyme was chategorized into group three, we took the mean and standard deviation for the Km values listed in the same subclass (EC 4.2.1) and generated the following distribution of plausible parameters for the Km of DH_OHC4_OHC4:
+<center><img src="https://static.igem.org/mediawiki/2013/8/8a/Exampleenzymemanchester.png"width="500" height="500"></center> <br>
+All reactions and corresponding parameters can be found in the following table. The source of each parameter value is hyperlinked in blue, clicking on the link will direct you to either the literature source or the table from which the BRENDA enzyme class data can be found. <br><br>
+<iframe width='850' height='900' frameborder='0' src='https://docs.google.com/spreadsheet/pub?key=0Ajbiu1uO_n5xdFFwdVptNFN0NXdHcDlIeFFrUjVRRmc&output=html&widget=true'></iframe><br><br></p>
+<p><b><li type="1">Random sampling values from each parameter distribution</li></p></b>
+<p align="justify">Once each parameter had a probability distribution associated with it, we randomly sampled values from each plausible parameter distribution to use in our model simulation. First we constructed an initial model in Copasi, using appropriate enzyme kinetic equations. The rate equations used in our model can be generalised as follows: <br><br>
+<center><img src="https://static.igem.org/mediawiki/2013/e/e0/Manre1.png" width="600" height="800"/></center>
+<center><img src="https://static.igem.org/mediawiki/2013/5/5b/Manre2.png" width="600" height="250"/></center><br><br></p>
+<p align="justify">We used these rate equations to complete our model of the fatty acid synthesis pathway by adding the thioesterase reactions required for the production of the final fatty acids as well as the reactions catalyzed by FadD. Once our model was complete enough to represent our system, we exported this from COPASI in SBML format and converted it to a PySCeS compatible file. PySCeS uses a set of non-linear differential equations to obtain both structural and kinetic information about the system from these randomly generated kinetic values. Below is pseudocode of our workflow, the <a href="https://static.igem.org/mediawiki/2013/a/ad/MancModellingScript.txt" target="_blank">full script to randomly sample the parameters and generate model predictions can be downloaded here.</a><br><br>
+<center><img src="https://static.igem.org/mediawiki/2013/3/3a/Manpc.png" width="500" height="500"/></center><br><br></p>
+<p><b><li type="1">Creation of 1,000 models with distribution of predictions </li></p></b>
+<p align="justify">We automated the generation of models to create a collection of 1,000 models. From there we were able to determine the uncertainty in our model predictions: instead of a single prediction, we have a distribution of predictions from a large collection of plausible models. </p>
+</ol>
-<p>To build our model, we first collected parameters from published literature and the online database BRENDA. In our search, we discovered that published data on the value of parameters in the E.coli fatty acid biosynthesis pathway is limited. Hence, we decided to take uncertainty into account by creating probability distribution for each individual parameter. The method used to determine the distribution depended on the information available on that parameter and the parameters were categorised into three groups. Group one contained parameters with both the mean and standard deviation determined experimentally and published in the literature. In the case of group one, both the mean and standard deviation were collected to determine the probability distribution. Group two contained parameters with neither the means nor the standard deviations available for the parameter. In the case of group two, we used the mean and standard deviation of all enzymes of the same class or subclass with known kinetic parameters. Group three consisted of parameters with known mean parameter value, but without standard deviation. In the case of group three, we used the standard deviation obtained from all enzymes of the same sub-class to create a distribution. The means and standard deviation of enzyme classes and sub-classes are defined <Enzymeclasses.pdf>..Once each parameter had a probability distribution associated with it, we randomly sampled values from each parameter distribution to run our model simulation.This was done by constructing an initial model in Copasi, using appropriate enzyme kinetic equations, and then exporting this in SBML format to PySCeS, in which a set of non-linear differential equations are used to obtain both structural and kinetic information about the system from these randomly generated kinetic values. This was repeated to create a collection of 1,000 models. From there we were able to determine the uncertainty in our model predictions: instead of a single prediction, we get a distribution of predictions from a large collection of plausible models. </p>
+</div>
-          </div>
 <div class="text3">
-<p> <b> <u> Results </p> </u> </b>
+<p><a id="Q5"> <b> <u> Results </p> </u> </b>
-The spreadsheets generated from our script can be found here:<br>
-Rates (LINK)<br>
-Species (LINK)<br>
 <center><img src="https://static.igem.org/mediawiki/2013/c/c1/ColourfulSpreadsheet.png" width="900" height="500"/></center>
-<p id="footer"><b>Figure 5. Overlay of structures from 1 ns βHACdH simulation. Images of overlaid from the following respective time points: 0 ps, 250 ps, 500 ps, 750 ps and 1000 ps with the following colours indicating each individual image: Green, Blue, Purple, Orange and Grey, respectively.  Both the N-Terminal and C-Terminal, are specified (Dotted Box), with a zoom in on each respective terminal at an angle appropriate to visualise the positions of the terminals.
+<p id="footer"><b> Figure 3: Short excerpt of metabolite concentration of fatty acid biosynthesis from multiple simulations within 100 seconds at ten different time points. Colours visualise concentrations according to their amount (<b>Dark Red:</b> >4; <b>Pink:</b> > 2; <b>Light red:</b>> 1; <b>White:</b> 1-0.01; <b>Light yellow:</b> <0.01; <b>Dark yellow: </b><0.0001)
 </b></p>
 <br>
-<p>The concentrations of the metabolites was outputted in tables, as depicted in Figure XXX. Each line represents one simulation with ten different time points within 100 seconds. The whole data set of all simulations was then attributed with colours according concentration values (<b>Dark Red:</b> >4; <b>Pink:</b> > 2; <b>Light red:</b>> 1; <b>White:</b> 1-0.01; <b>Light yellow:</b> <0.01; <b>Dark yellow: </b><0.0001). Another table was generated out of this chart according to the ordinal data obtained from colouring the metabolite concentrations. This was done to further improve ease of work and making the data more visual. Figure XXX shows the summary of this qualitative concentration distribution for each metabolite. Again, the brighter green a cell is in colour, the more often simulations rendered metabolite concentrations in the specified concentration interval. For example, the last metabolite in the table C18CoA is bright green, because all 41 simulations rendered between 0.01 - 1 mM. Out of this table, a clear distribution becomes obvious: Except for the first six initial replenishing reactions, all metabolite concentrations are within a small reasonable range mostly between 0.01 - 1 mM. Interestingly, in the reaction towards the end of the pathway, which are responsible for removing the metabolites from the system and therefore give rise to stearic and palmitic acid (our desired products) the range of results appears to be significantly narrower, despite the uncertainty.</p>
+<p align="justify">The concentrations of the metabolites was outputted in tables, as depicted in Figure 3. Each line represents one simulation with ten different time points within 100 seconds. The whole data set of all simulations was then attributed with colours according concentration values. Another table was generated out of this chart according to the ordinal data obtained from colouring the metabolite concentrations. This was done to further improve ease of work and making the data more visual. Figure 4 shows the summary of this qualitative concentration distribution for each metabolite. Again, the brighter green a cell is in colour, the more often simulations rendered metabolite concentrations in the specified concentration interval. For example, the last metabolite in the table C18CoA is bright green, because all 41 simulations rendered between 0.01 - 1 mM. Out of this table, a clear distribution becomes obvious: Except for the first six initial replenishing reactions, all metabolite concentrations are within a small reasonable range mostly between 0.01 - 1 mM. Interestingly, in the reaction towards the end of the pathway, which are responsible for removing the metabolites from the system and therefore give rise to stearic and palmitic acid (our desired products) the range of results appears to be significantly narrower, despite the uncertainty.</p>
-<p> The analysis of the data shows clearly, that due to a small and reasonable range of metabolite concentrations which stabilises towards the end  of the model, a high validity of our functioning model can be safely assumed and demonstrates that the uncertainty is not globally deleterious. Even though the model was working with high uncertainties in data, the output is always within a valid range.  </p>
+<p align="justify"> The analysis of the data shows clearly, that due to a small and reasonable range of metabolite concentrations which stabilises towards the end  of the model, a high validity of our functioning model can be safely assumed and demonstrates that the uncertainty is not globally deleterious. Even though the model was working with high uncertainties in data, the output is always within a valid range.  </p>
 <center><img src="https://static.igem.org/mediawiki/2013/d/dc/Greenspreadsheet.png" width="639" height="721"/></center>
-<p id="footer"><b>Figure 5. Overlay of structures from 1 ns βHACdH simulation. Images of overlaid from the following respective time points: 0 ps, 250 ps, 500 ps, 750 ps and 1000 ps with the following colours indicating each individual image: Green, Blue, Purple, Orange and Grey, respectively.  Both the N-Terminal and C-Terminal, are specified (Dotted Box), with a zoom in on each respective terminal at an angle appropriate to visualise the positions of the terminals.
+<p id="footer"><b>Figure 4: Comprehensive summary table of all analysed models with colour coded visualisation according to qualitative concentration distribution for each metabolite. The brighter in colour a cell is, the more often the simulations resulted in the specified concentration interval.
 </b></p>
-<p> Upon analysing the degree of certainty in our model, and finding that it was at a level that we believe is suitable for further analysis, we were able to create a series of boxplots showing the range of values found within our simulations for species accumulation after 100 seconds. We focused on the longer chain fatty acids, which are the engineering target of our pathway. The order in which the species are shown in the box plots, Figure XXX, is also the order in which they are formed. This is also shown in Figure XXX, where the colour corresponds to the colour of the bar on the box plot. </p>
+<p align="justify"> Upon analysing the degree of certainty in our model, and finding that it was at a level that we believe is suitable for further analysis, we were able to create a series of boxplots showing the range of values found within our simulations for species accumulation after 100 seconds. We focused on the longer chain fatty acids, which are the engineering target of our pathway. The order in which the species are shown in the box plots, Figure 5, is also the order in which they are formed. This is also shown in Figure 6, where the colour corresponds to the colour of the bar on the box plot. </p>
 <center><img src="https://static.igem.org/mediawiki/2013/1/19/Boxplot.jpg" width="563" height="504"/></center>
-<p id="footer"><b>Figure 5. Overlay of structures from 1 ns βHACdH simulation. Images of overlaid from the following respective time points: 0 ps, 250 ps, 500 ps, 750 ps and 1000 ps with the following colours indicating each individual image: Green, Blue, Purple, Orange and Grey, respectively.  Both the N-Terminal and C-Terminal, are specified (Dotted Box), with a zoom in on each respective terminal at an angle appropriate to visualise the positions of the terminals.
+<p id="footer"><b>Figure 5: Summarized boxplots of the last metabolite concentrations (mM) in the modelled fatty acid synthesis pathway. Indicated colours correspond to the colours and metabolites as shown in the excerpt of the KEGG pathway in Figure 6.
 </b></p>
 <br>
 <center><img src="https://static.igem.org/mediawiki/2013/b/b8/ColourKegg.png" width="900" height="501"/></center>
-<p id="footer"><b>Figure 5. Overlay of structures from 1 ns βHACdH simulation. Images of overlaid from the following respective time points: 0 ps, 250 ps, 500 ps, 750 ps and 1000 ps with the following colours indicating each individual image: Green, Blue, Purple, Orange and Grey, respectively.  Both the N-Terminal and C-Terminal, are specified (Dotted Box), with a zoom in on each respective terminal at an angle appropriate to visualise the positions of the terminals.
+<p id="footer"><b>Figure 6: Excerpt of the simplified version of fatty acid elongation pathway. Coloured boxes and their metabolites correspond to the metabolites as indicated in Figure 5.
 </b></p>
-<p> These results further emphasise that although we created a model based on uncertain parameters, by embracing this uncertainty we have been able to make a model that gives us useful information – and that allows us to specify for every single prediction how certain we can be of getting it right, in particular towards the end of the pathway. </p>
+<p align="justify"> These results further emphasise that although we created a model based on uncertain parameters, by embracing this uncertainty we have been able to make a model that gives us useful information – and that allows us to specify for every single prediction how certain we can be of getting it right, in particular towards the end of the pathway. </p>
-<p> Similar data analysis was carried out on the rates of the reactions. We focused on the reactions we had labelled AAT at the end of our pathway. These are thioesterase reactions directly responsible for the formation of palmitic and stearic acid. We can see that the rates for these reactions also fall within a relatively small range. </p>
+<p align="justify"> Similar data analysis was carried out on the rates of the reactions, shown in Figure 7. We focused on the reactions we had labelled AAT at the end of our pathway. These are thioesterase reactions directly responsible for the formation of palmitic and stearic acid. We can see that the rates for these reactions also fall within a relatively small range. </p>
 <center><img src="https://static.igem.org/mediawiki/2013/e/ea/RateBoxPlot.jpg" width="497" height="432"/></center>
-<p id="footer"><b>Figure 5. Overlay of structures from 1 ns βHACdH simulation. Images of overlaid from the following respective time points: 0 ps, 250 ps, 500 ps, 750 ps and 1000 ps with the following colours indicating each individual image: Green, Blue, Purple, Orange and Grey, respectively.  Both the N-Terminal and C-Terminal, are specified (Dotted Box), with a zoom in on each respective terminal at an angle appropriate to visualise the positions of the terminals.
+<p id="footer"><b>Figure 7: Reaction rates of key reactions in the fatty acid synthesis pathway.
 </b></p>
            </div>
 <div class="text3">
-<p> <b> <u> Conclusion </p> </b> </u>
+<p><a id="Q6"> <b> <u> Conclusion </p> </b> </u>
-<p> Kinetic Pathway modelling demands abundant information of the kinetic parameters. Literature research, however, showed that these were not available sufficiently or involved measurement errors. Hence this knowledge of parameter values often is uncertain. Therefore, we had to choose an approach that is able to deal with these limitations. Uncertainty modelling proved to be the most promising and useful tool for this. Even though the available data was limited, we managed to create a functioning kinetic model of the fatty acid synthesis pathway. This has not been done before and would not have been possible with any traditional approach. </p>
+<p align="justify"> Kinetic Pathway modelling demands abundant information of the kinetic parameters. Literature research, however, showed that these were not available sufficiently or involved measurement errors. Hence this knowledge of parameter values often is uncertain. Therefore, we had to choose an approach that is able to deal with these limitations. Uncertainty modelling proved to be the most promising and useful tool for this. Even though the available data was limited, we managed to create a functioning kinetic model of the fatty acid synthesis pathway. This has not been done before and would not have been possible with any traditional approach. </p>
-<p> A prime example of how our metabolic modelling work directly informed our experimental work is in our decision to biobrick the FabA gene (encoding β-hydroxydecanoyl-ACP dehydrase, shown by the DH_OH reactions in this model). Our uncertainty model had shown us that we would need more kinetic data on key enzymes. The least characterised reaction was catalyzed by the product of the fabA gene, therefore we wished to not only biobrick this gene, but a His-tag to purify the enzyme in order to experimental gauge its activity. </p>
+<p align="justify"> A prime example of how our metabolic modelling work directly informed our experimental work is in our decision to biobrick the FabA gene (encoding β-hydroxydecanoyl-ACP dehydrase, shown by the DH_OH reactions in this model). Our uncertainty model had shown us that we would need more kinetic data on key enzymes. The least characterised reaction was catalyzed by the product of the fabA gene, therefore we wished to not only biobrick this gene, but a His-tag to purify the enzyme in order to experimental gauge its activity. </p>
+<p align="justify"> However, having taken pains to ensure our model was as realistic as possible, the idea of the insertion of a his-tag that could affect the activity of the enzyme seemed at odds to our overall goal. Therefore, we used further modelling technique to ensure the addition of this his tag would have as little overall bearing on the activity of the enzyme as possible. This can be found <a href="https://2013.igem.org/Team:Manchester/FabProteinModel" target="_blank">here</a><br>
-<p> However, having taken pains to ensure our model was as realistic as possible, the idea of the insertion of a his-tag that could affect the activity of the enzyme seemed at odds to our overall goal. Therefore, we used further modelling technique to ensure the addition of this his tag would have as little overall bearing on the activity of the enzyme as possible. This can be found HERE (LINK TO MARCO PAGE).
            </div>
 <div class="text3">
-<p> <b> <u> Future Applications: Potentials and Limitations </b> </u> </p>
+<p><a id="Q7"> <b> <u> Future Applications: Potentials and Limitations </b> </u> </p>
-<p> We believe that this approach to modelling could have a big impact in terms of how Synthetic Biology is modelled in the future and demonstrates a method in which, by facing the uncertainty of modelling head-on and incorporating this into our approach in a principled manner, it is possible to produce valuable models. This is particularly important in the field of Synthetic Biology, where systems, even if well characterised in one organism, are unlikely to have the same parameters when expressed in another organism. </p>
+<p align="justify"> We believe that this approach to modelling could have a big impact in terms of how Synthetic Biology is modelled in the future and demonstrates a method in which, by facing the uncertainty of modelling head-on and incorporating this into our approach in a principled manner, it is possible to produce valuable models. This is particularly important in the field of Synthetic Biology, where systems, even if well characterised in one organism, are unlikely to have the same parameters when expressed in another organism. </p>
-<p> This approach gives us the ability to model complex and poorly experimentally measured systems, where previous attempts may have produced unrepresentative models. Since the Km values can be sampled from a distribution, the model can be used to determine outcomes that may not be obvious with the use of a single Km value.  </p>
+<p align="justify"> This approach gives us the ability to model complex and poorly experimentally measured systems, where previous attempts may have produced unrepresentative models. Since the Km values can be sampled from a distribution, the model can be used to determine outcomes that may not be obvious with the use of a single Km value.  </p>
-<p> However, it is important to note that this method of modelling may not be appropriate in every case. The largest limitation of our use of this method is the inability of some of our simulations to reach steady state. This is likely to be a result of the random combination of parameter values. As the models were not fine-tuned, they will not always work. Although, we consider this as a potential strength as we can clearly highlight possible break points in the system that require further analysis. We show this in our own studies of β-hydroxydecanoyl-ACP dehydrase, described above. </p>
+<p align="justify"> However, it is important to note that this method of modelling may not be appropriate in every case. The largest limitation of our use of this method is the inability of some of our simulations to reach steady state. This is likely to be a result of the random combination of parameter values. As the models were not fine-tuned, they will not always work. Although, we consider this as a potential strength as we can clearly highlight possible break points in the system that require further analysis. We show this in our own studies of β-hydroxydecanoyl-ACP dehydrase, described above. </p>
-<p> Synthetic Biology operates at the cutting edge of current knowledge. Therefore, it will unavoidably face the challenge of uncertainty. Building models with incorporated acknowledgment of uncertainty will yield model predictions with specified confidence intervals, and thus will lead to more robust design strategies for a vast range of engineered cellular machines. </p>
+<p align="justify"> Synthetic Biology operates at the cutting edge of current knowledge. Therefore, it will unavoidably face the challenge of uncertainty. Building models with incorporated acknowledgment of uncertainty will yield model predictions with specified confidence intervals, and thus will lead to more robust design strategies for a vast range of engineered cellular machines. </p>
 </b></p>
@@ Line 564: / Line 711: @@
 <div class="text3">
-<p> <b> <u> Appendices </p> </u> </b>
+<p><a id="Q8"> <b> <u> Appendices </p> </u> </b>
+<p> The full spreadsheets with reaction rates and species concentrations can be found here:<br>
+<a href="https://static.igem.org/mediawiki/2013/e/ef/Rates.pdf" target="_blank">Reaction Rates</a><br>
+<a href="https://static.igem.org/mediawiki/2013/a/ac/Species.pdf">Species Concentrations</a><br>
+<br>
+<b>Nomenclature of main metabolites</b><br>
+<img src="https://static.igem.org/mediawiki/2013/8/84/BlueTable1.png" width="295" height="569"/><br>
+<img src="https://static.igem.org/mediawiki/2013/d/d5/BlueTable2.png" width="294" height="333"/><br>
+<br>
+<img src="https://static.igem.org/mediawiki/2013/9/9d/Greentable1.png" width="261" height="384"/><br>
+<img src="https://static.igem.org/mediawiki/2013/d/d1/Greentable2.png" width="261" height="143"/><br>
-<p> Here you can download all of the spreadsheets used in the creation of this model: </p>
-<p> The spreadsheets generated from our script can be found here:<br>
-Rates (LINK)<br>
-Species (LINK)<br>
            </div>
@@ Line 595: / Line 751: @@
                      <a href="https://2013.igem.org/Team:Manchester/Collaboration">MODELLING COLLABORATION</a>
                    </div>
                </div>
+                    <div class="rightbar">
+                  <div class="question1">
+                    <a href="#Q1">Summary</a>
+                  </div>
+                  <div class="question2">
+                    <a href="#Q2">Aim</a>
+                  </div>
+                   <div class="question3">
+                    <a href="#Q3">Introduction</a>
+                  </div>
+                  <div class="question4">
+                    <a href="#Q4">Method</a>
+                  </div>
+                  <div class="question5">
+                    <a href="#Q5">Results</a>
+                  </div>
+                  <div class="question6">
+                    <a href="#Q6">Conclusion</a>
+                  </div>
+                  <div class="question7">
+                    <a href="#Q7">Future Applications: Potentials and Limitations</a>
+                  </div>
+                  <div class="question8">
+                    <a href="#Q8">Appendices</a>
+                  </div>
+             </div>
 </div>
 </body>
 </html>