Team:KU Leuven/Project/Glucosemodel/MeS/Modelling

From 2013.igem.org

Revision as of 21:48, 1 October 2013 by SanderW (Talk | contribs)

iGem

Secret garden

Congratulations! You've found our secret garden! Follow the instructions below and win a great prize at the World jamboree!


  • A video shows that two of our team members are having great fun at our favourite company. Do you know the name of the second member that appears in the video?
  • For one of our models we had to do very extensive computations. To prevent our own computers from overheating and to keep the temperature in our iGEM room at a normal level, we used a supercomputer. Which centre maintains this supercomputer? (Dutch abbreviation)
  • We organised a symposium with a debate, some seminars and 2 iGEM project presentations. An iGEM team came all the way from the Netherlands to present their project. What is the name of their city?

Now put all of these in this URL:https://2013.igem.org/Team:KU_Leuven/(firstname)(abbreviation)(city), (loose the brackets and put everything in lowercase) and follow the very last instruction to get your special jamboree prize!

tree ladybugcartoon

This page describes the model that we made for the methyl salicylate part. This model could be very useful in order to have an approximation of the resulting production rate and in order to figure out the rate-limiting step in this system. Jump to the following topics:

The methyl salicylate pathway contains the following reactions:

with:

  • PchA = Pyochelin A
  • PchB = Pyochelin B
  • BSMT1 = Benzoate/Salicylate carboxyl methyltransferase
  • SAM = S-adenosyl-L-methionine
  • SAH = Salicylate methyl ester


At first, our intention was to model the entire pathway from the implemented DNA sequence to the resulting production rate. This could be very useful in order to have an approximation of the resulting production rate and in order to figure out the rate-limiting step. In order to achieve this we need a mathematical representation of all of our biological processes, the transcription rate, the mRNA degradation rate, the translation rate, the protein degradation rate and the enzyme kinetics.

We created a set of ordinary differential equations (ODEs), which represents every step in our pathway: transcription, translation and the chemical activity of the protein.

mRNA flux:



See the formulary below for further information about the used terminology.

Comments:
The proteins pyochelin A (PchA) and pyochelin B (PchB) are extracted from the pchDCBA operon and are the structural proteins responsible for salicylate biosynthesis. Serion et al. (1995) describe that expression of the pchA gene appears to depend on the transcription and translation of the upstream pchB gene in P. aeruginosa. They also state “Salicylate formation was demonstrated in an Escherichia coli entC mutant lacking isochorismate synthase when this strain expressed both the pchBA genes, but not when it expressed pchB alone”. This is also confirmed by Gaille, Reimman and Haas (2003): “The pchA gene is strictly co-expressed with the upstream pchB gene; without pchB being present in cis no expression of pchA can be observed”. Finally Serion et al. (1995) also report that the pchB stop codon overlaps the presumed pchA start codon.

Therefore we conclude that transcription and translation of pchA and pchB is coupled and decided to use only one gene (pchBAgene), and only one mRNA molecule (mpchbA) for both proteins (PchA and PchB) in our model.

Protein flux:



See the formulary below for further information about the used terminology.

Methyl salicylate synthesis:



See the formulary below for further information about the used terminology.

Comments:

  • For our modeling purposes, we take the chorismate concentration as a pool.
  • For every reaction we assume Michaelis-Menten kinetics.
  • The division by NA. EcoliCellVolume in the numerator is necessary to convert the amount of molecules of our enzyme to a concentration.
  • In equations [3.E] and [3.F ] Km3a represents the Km of salicylate while Km3b represents the Km of SAM.

Formulary:

For example for BSMT1:
NameUnitsDescription
BSMT1gene# genesCopy number (amount) of bsmt1 gene
mBSMT1# mRNAAmount of bsmt1 mRNA
BSMT1# proteinsAmount of BSMT1 substance (protein/molecule)
γmBSMT1 Transcription rate of PchBA gene
αmBSMT1 Degradation rate of PchBA mRNA
βBSMT1 Translation rate of PchA
αBSMT1 Degradation rate of PchA protein
kcat1 Turnover number
NA Avogadro constant
EcoliCellVolumeLiterThe average volume of one E. coli cell
KmMolarityMichaelis-Menten constant

Symbiology Diagram:

We have put this model in SimBiology, provided by MATLAB, resulting in the following diagram:


Of course this model is useless without any good parameters. In this next section you can read about our search for decent parameters and its complications.

Copy number:

The first step in our model is the determination of the number of genes which can be transcribed. In our system we start with 2 genes (pchBA operon and bsmt1). They are not on the same plasmid but both carry a pMB1 origin of replication. This ORI has a copy number of 100 to 300 genes per cell. Therefore we will assume 200 copies of genes per cell.

Transcription:

An extensive literature survey learned that prediction of transcription rate, and its promoter dependence, is very hard and even impossible to do without any good data. The review article by Shiue and Prather (2012) describes this problem the following way: “due to the large sequence space and relative lack of understanding regarding polymerase-promoter interactions, the development of such predictive models remains a daunting task”. Also the recent discussions about stochastic gene expression make it as good as impossible to do quantitative predictions of mRNA production.

In the past, many iGEM teams predicted their transcription rate using a formula introduced by NTU-Singapore in 2008:


We believe that this formula does not represent the transcription rate on a correct way because:

  1. The reference that says that the average transcription speed is 70 nt/s does not exist anymore. We tried to search for an average transcription rate ourselves and we can’t seem to find any decent value.
  2. In this formula there is no single association with the promoter strength. This is remarkable, because the strength of a promoter is a measure for how many times a transcript is initiated. (Molecular Biology of the Gene, 7th edition). The stronger your promoter, the more transcripts are initiated, the more the gene is transcribed in time and thus the higher transcription rate.
  3. The number of nucleotides could indeed have some influence on the rate of transcription. The longer the gene, the bigger the chance that the polymerase does not properly finish the transcript. But in literature we did not find any reference that uses the gene length as one of the important parameters for determining the rate of transcription.

We hope that other iGEM teams in the future will refrain from using this formula, because it is not a realistic representation of the transcription rate.

In our case we decided that we would bypass the mRNA production step because it is responsible for a large part of the uncertainty in our prediction. In order to attain our goal without needing the transcription rates we tried to determine the in vivo mRNA concentrations using qPCR. This means that we will drop formulas [1.A] and [1.B]. If you want to know more on how we tackled the qPCR, please go to our WETLAB part.

Translation:

Initiation is usually the most important rate-determining step of the translation process (McCarthy and Gualerzi, 1990). Combined with the fact that there is a negligible chance for premature disassembly of the ribosome and mRNA, only the rate of translation initiation has to be known in order to determine the rate of translation.

The initiation codon, the Shine-Dalgarno sequence, the identity of the base at position -3 and the occurrence of alternative ATGs (that do not serve as an initiation codon) are features known to be important for translation initiation (Barrick et al., 1994). When those are known it should be possible to make an estimation of the translation rate. It is however necessary to mention another feature that can be of particular importance for the initiation of translation: the occurrence of a secondary structure in the ribosome binding site. This can be regarded as an outlier tough, since evolution tuned the ribosome binding sites as such that they only rarely show this behavior. When it would occur there would be a much lower rate of translation, since initiation requires the RBS to be unfolded (De Smit and van Duin, 1990).

Pennsylvania State University was able to quantify the different relevant features and created a tool (Salis et al., 2009) (Salis, 2011) that predicts the translation rate when the mRNA sequence is known. Even within a range of five orders of magnitude the tool should not differ from the reality with a factor higher than 2.3 (Salis et al., 2009). The RBS determines the translation initiation rate, however this is relative to all other translated coding sequences (Salis, 2011). Since the RBS calculator uses the same scale for every calculation, the relative translation initiation rate of each of the proteins can thus be determined. To extract absolute rates it suffices to have an absolute translation initiation rate for only one gene. In order to model this properly we would require a translation initiation rate of one of our genes from our construct. These values are not available at this moment, but values from literature should give a reasonable result. We have found that the initiation rate of translation for the lacZ gene in the lac operon is approximately 0.31 initiations per second per mRNA copy (Kennell and Riezman, 1977), which we consequently used as a standard.

A first run through the tool yielded adequate results for both PchB and BSMT1, the output for PchA was not however. A malfunctioning translation step could explain the lack of wintergreen scent when using the MIT 2006 brick (BBa_J45700). Of this brick only the BSMT1 step was proved to function and not the PchA and the PchB step. After communication with dr. Salis himself we used a different tool on the website, designed for operon structures. This was indeed the appropriate tool to quantify the translation initiation of pchBA, since the RBS of pchA is in the end of the coding sequence of pchB. The output now showed a satisfactory translation rate for each of the proteins in E. coli. This buried the hypothesis that the low translation rate is responsible for the lack of occurrence of salycic acid while using the brick, for further elaboration on this topic we refer you to the methyl salicylate wetlab page. The results from using the above-mentioned tool (https://salis.psu.edu/software/) for the lac operon and for the genes we want to clone into E. coli, are listed in Table 1. The third column of this table shows the values of the translation initiation rate that are computed using the literature value from the lac operon.


GeneTranslation initiation rate according to the RBS calculator (a.u.)Translation initiation rate (initiations/(s.mRNA))
lacZ20579,190,3125
bsmt15587,090,085
pchB19288,230,293
pchA 326970,824,965
Table 1. Translation rates, as computed with the Penn State University RBS calculator, using the MIT 2006 BioBrick (BBa_J45700).

Protein degradation:

The perceived degradation rate results not only from the breakdown of proteins, but also from the dilution due to cell growth. Every cell cycle the proteins are divided amongst the two resulting cells and the amount is thus effectively divided by two. We will look into both in order to conclude which effect dominates and what ranges are possible.

The breakdown part of the degradation of proteins is highly dependent on the presence of a degradation signal, called degron. These degrons could be hidden in a folded protein and could become exposed for example after a stress reaction (Dougan et al., 2010). One of the most characterized and important degrons is called the N-degron, which is a destabilizing N-terminal residue. With this information, the laboratory of Varshavsky has created the N-end rule, which relates the in vivo half-life of a protein to the identity of its N-terminal residue (Varshavsky, 1997).

The N-end rule is applicable to a wide range of organisms ranging from E. coli to plants and mammals (Dougan et al., 2010). Of course we are interested in the E. coli N-end rule, described by Tobias et al. (1991) and Shrader et al. (1993). This N-end rule states that if the N-terminal residue is arginine, lysine, leucine, phenylalanine, tyrosine or tryptophan, the protein will have a half-life of only 2 minutes. These amino acids are called primary destabilizing residues. On the other hand, amino terminal arginine and lysine are secondary destabilizing residues in E. coli. These residues conjugate to primary destabilizing residues, which again results in a half-life of only 2 minutes (Tobias et al., 1991). If the N-terminal residue is neither a primary nor a secondary destabilizing residue, the half-life of the proteins exceeds 10 hours. We applied this rule to the proteins of our interest, with the results displayed in Table 2.

ProteinAA-sequenceHalf-life
PchASRLAPLSQC …>= 10 hours
PchBPHPLTLLQI …>= 10 hours
BSMT1EVVEVLHM …>= 10 hours

Table 2: The resulting half-lifes after using the N-end rule.

According to the N-End rule the half-life of our proteins exceeds 10 hours. If we compare this value with the generation time of a single E. coli cell, we can conclude that these proteins live far longer than the cell itself. Therefore we will take this generation time as a value for our “protein degradation”. On the Bionumbers website, we found that a good rule of thumb for this generation time is around 3000s, which is 50 min.

Concluding values:

pchApchBbsmt1
Copy number200 molecules200 molecules200 molecules
Transcription rate///
mRNA degradatie///
Translation4,965 per s0,293 per s0,085 per s
Protein degradation50 min50 min50 min

Due to the unforeseen circumstances with the qPCR we unfortunately were not able to get to know the real amount of mRNA molecules for each protein in our system. (More about this qPCR story can be read here.) Since these amounts where the starting point of our model, we could not do any decent predictions or figure out the rate limiting step.

Rather than running our model with totally unrealistic values (eg. The formula described in section 1 for the calculation of the transcription rate) which would result in totally unrealistic results, we opted to not use this model for any predictions. However we think that our extensive literature study has been very instructive, and hope that other iGEM teams could use this study (for example the RBS calculator) as a basis for their model.

Retrieved from "http://2013.igem.org/Team:KU_Leuven/Project/Glucosemodel/MeS/Modelling"