# Team:KU Leuven/Project/Glucosemodel/MeS/Modelling

iGem

### Secret garden

Congratulations! You've found our secret garden! Follow the instructions below and win a great prize at the World jamboree!

• A video shows that two of our team members are having great fun at our favourite company. Do you know the name of the second member that appears in the video?
• For one of our models we had to do very extensive computations. To prevent our own computers from overheating and to keep the temperature in our iGEM room at a normal level, we used a supercomputer. Which centre maintains this supercomputer? (Dutch abbreviation)
• We organised a symposium with a debate, some seminars and 2 iGEM project presentations. An iGEM team came all the way from the Netherlands to present their project. What is the name of their city?

Now put all of these in this URL:https://2013.igem.org/Team:KU_Leuven/(firstname)(abbreviation)(city), (loose the brackets and put everything in lowercase) and follow the very last instruction to get your special jamboree prize!

When we introduce new genes and pathways into our bacterium, several questions arise like for example: Does it influence its metabolism or growth rate? To answer this question we performed a Flux Balance Analysis (FBA) which can be found here.

Another important question could be: How much methyl salycilate (MeS) will be produced in the end? This question can be answered using the Kinetic Parameter Model, described on this page. By modelling the pathway leading to MeS we can get a good estimation of the average MeS production. Apart from that we can also take a closer look at the pathway and find the rate limiting steps. We can use this information to fine tune the MeS production the way we want it.

The methyl salicylate pathway contains the following reactions:

with:

• PchA = Pyochelin A
• PchB = Pyochelin B
• BSMT1 = Benzoate/Salicylate carboxyl methyltransferase
• SAH = Salicylate methyl ester

At first, our intention was to model the entire pathway from the implemented DNA sequence to the resulting production rate. This could be very useful to approximate the resulting production rate and to figure out the rate-limiting step. To achieve this we need a mathematical representation of all the relevant biological processes, including transcription rate, mRNA degradation rate, translation rate, protein degradation rate and enzyme kinetics.

We created a set of ordinary differential equations (ODEs) to represent every step in our pathway: transcription, translation and the chemical activity of the protein.

### mRNA flux:

See the formulary below for further information about the used terminology.

The proteins pyochelin A (PchA) and pyochelin B (PchB) are extracted from the pchDCBA operon and are the structural proteins responsible for salicylate biosynthesis. Serion et al. (1995) describes that the expression of the pchA gene appears to depend on the transcription and translation of the upstream pchB gene in P. aeruginosa. They also state “Salicylate formation was demonstrated in an Escherichia coli entC mutant lacking isochorismate synthase when this strain expressed both pchBA genes, but not when it expressed pchB alone”. This is also confirmed by Gaille, Reimman and Haas (2003): “The pchA gene is strictly co-expressed with the upstream pchB gene; without pchB being present in cis no expression of pchA can be observed”. Finally Serion et al. (1995) reports that the pchB stop codon overlaps the presumed pchA start codon.

Therefore we conclude that transcription and translation of pchA and pchB is coupled and we decided to use only one gene (pchBAgene), and only one mRNA molecule (mpchbA) for both proteins (PchA and PchB) in our model.

### Protein flux:

See the formulary below for further information about the used terminology.

### Methyl salicylate synthesis:

See the formulary below for further information about the used terminology.

• For our modeling purposes, we take the chorismate concentration as a pool.
• For every reaction we assume Michaelis-Menten kinetics.
• The division by NA. EcoliCellVolume in the numerator is necessary to convert the amount of molecules of our enzyme to a concentration.
• In equations [3.E] and [3.F ] Km3a represents the Km of salicylate while Km3b represents the Km of SAM.

### Formulary:

For example for BSMT1:
NameUnitsDescription
BSMT1gene# genesCopy number (amount) of bsmt1 gene
mBSMT1# mRNAAmount of bsmt1 mRNA
BSMT1# proteinsAmount of BSMT1 substance (protein/molecule)
γmBSMT1 Transcription rate of PchBA gene
αmBSMT1 Degradation rate of PchBA mRNA
βBSMT1 Translation rate of PchA
αBSMT1 Degradation rate of PchA protein
kcat1 Turnover number
EcoliCellVolumeLiterThe average volume of one E. coli cell
KmMolarityMichaelis-Menten constant

### Symbiology Diagram:

We have put this model in SimBiology, provided by MATLAB, resulting in the following diagram:

Of course this model is useless without any good parameters. In this next section you can read about our search for decent parameters and its complications.

### Copy number:

First we determine the number of genes transcribed in our model. We start with 2 genes (pchBA operon and bsmt1). They are not on the same plasmid but both carry a pMB1 origin of replication. This ORI has a copy number of 100 to 300 plasmids per cell. Therefore we assume an average of 200 gene copies per cell.

### Transcription:

An extensive literature survey revealed that it is difficult to predict transcription rate, particularly combined with the proper promoter dependence. It is even near impossible without good wet-lab data.
A recent review by Shiue and Prather (2012) describes this problem in the following way: “due to the large sequence space and relative lack of understanding regarding polymerase-promoter interactions, the development of such predictive models remains a daunting task”. Also recent discussions on stochastic gene expression suggest that reliable, quantitative predictions of mRNA production are a daunting task.

In the past, many iGEM teams predicted their transcription rate using a formula introduced by NTU-Singapore in 2008:

We fear that this formula is not a proper representation of transcription rate for a number of reasons :

1. The reference claiming an average transcription speed of 70 nt/s is no longer available. We tried to search for an average transcription rate ourselves and we can’t seem to find realistic values.
2. This formula does not take into account promoter strength. This is remarkable, because the strength of a promoter is a measure for how many times a transcript is initiated. (Watson et al., Molecular Biology of the Gene, 7th edition). The stronger your promoter, the more transcripts are initiated, the more the gene is transcribed in time and thus the higher transcription rate.
3. Gene length, aka the number of nucleotides involved, could influence transcription rate. The longer the gene, the higher the chance that the polymerase starts proofreading, slowing down the transcription rate. We did not find any reference in literature incorporating gene length as an important transcription rate parameter.

Summarized, we would strongly suggest to other iGEM teams to refrain from using this formula, because it is not a realistic representation of the transcription rate.

In our case, we observed most uncertainty in the transition from transcription to mRNA production. As an alternative to the modelled mRNA production step, we tried to determine in vivo mRNA concentrations using qPCR. This means that we will drop formulas [1.A] and [1.B]. If you want to know more on how we tackled the qPCR, please go to our WETLAB part.

### Translation:

Initiation is usually the most important rate-determining step of the translation process (McCarthy and Gualerzi, 1990). Combined with the fact that there is a negligible chance for premature disassembly of the ribosome and mRNA, only the rate of translation initiation has to be known to determine the rate of translation .

The initiation codon, the Shine-Dalgarno sequence, the identity of the base at position -3 and the occurrence of alternative ATGs (that do not serve as an initiation codon) are features known to be important for translation initiation (Barrick et al., 1994). When those are known it should be possible to make an estimation of the translation rate.

Pennsylvania State University was able to quantify the different relevant features and created a tool (Salis et al., 2009) (Salis, 2011) that predicts the translation rate when the mRNA sequence is known. Even within a range of five orders of magnitude the tool should not differ from the reality with a factor higher than 2.3 (Salis et al., 2009). The RBS determines the translation initiation rate, however, this is relative to all other translated coding sequences (Salis, 2011). Since the RBS calculator uses the same scale for every calculation, the relative translation initiation rate of each protein can thus be determined. An absolute translation initiation rate for only one gene suffices to extract absolute rates. To model this properly, we would require a translation initiation rate of one of the genes from our construct. At this moment, these values are not available from the wetlab, but values from literature should give a reasonable result. We have found that the initiation rate of translation for the lacZ gene in the lac operon is approximately 0.31 initiations per second per mRNA copy (Kennell and Riezman, 1977), which we consequently used as a standard.

A first run through the tool yielded adequate results for both PchB and BSMT1, however the output for PchA was unrealistically low. We realised the RBS for PchA was part of PchB, causing the low output. After communication with dr. Salis himself we settle on a different tool from his website, especially designed for operon structures. This was indeed the appropriate tool to quantify the translation initiation of the operon pchBA : the output now showed a satisfactory translation rate for each of the proteins in E. coli. We obtained results for the lac operon (as a control) and for the genes we want to clone into E. coli (https://salis.psu.edu/software/). They are listed in Table 1. The third column of this table shows the values of the translation initiation rate that are computed using the literature value from the lac operon.

GeneTranslation initiation rate according to the RBS calculator (a.u.)Translation initiation rate (initiations/(s.mRNA))
lacZ20579,190,3125
bsmt15587,090,085
pchB19288,230,293
pchA 326970,824,965
Table 1. Translation rates, as computed with the Penn State University RBS calculator, using the MIT 2006 BioBrick (BBa_J45700).

When MIT produced their MeS brick BBa_J45700 in 2006, it was meant to convert salicylate to methylsalicylate and produce a wintergreen scent in the process. However, the scent could only be detected when salicylate was added to the medium, proving that the BSMT1 equation functions yet the PchA and PchB equations may not function. Initially, it was thought that the RBS problems discussed above could be the cause but we could show this is most likely not the case : we entered the MIT brick sequence in the operon specific tool and obtained satisfactory data. Thus, at least in silico the pchBA operon could function, suggesting the lack of salicylate is caused by a different reason. For further elaboration on this topic we refer you to the methyl salicylate wetlab page.

The perceived degradation rate results not only from the breakdown of proteins, but also from the dilution due to cell growth. Every cell cycle the proteins are divided amongst the two resulting cells and the amount is thus effectively divided by two. We will look into both to conclude which effect dominates and what ranges are possible.

The breakdown part of the degradation of proteins is highly dependent on the presence of a degradation signal, called degron. These degrons could be hidden in a folded protein and could become exposed for example after a stress reaction (Dougan et al., 2010). One of the most characterized and important degrons is called the N-degron, which is a destabilizing N-terminal residue. With this information, the laboratory of Varshavsky has created the N-end rule, which relates the in vivo half-life of a protein to the identity of its N-terminal residue (Varshavsky, 1997).

The N-end rule is applicable to a wide range of organisms ranging from E. coli to plants and mammals (Dougan et al., 2010). Of course we are interested in the E. coli N-end rule, described by Tobias et al. (1991) and Shrader et al. (1993). This N-end rule states that if the N-terminal residue is arginine, lysine, leucine, phenylalanine, tyrosine or tryptophan, the protein will have a half-life of only 2 minutes. These amino acids are called primary destabilizing residues. On the other hand, amino terminal arginine and lysine are secondary destabilizing residues in E. coli. These residues conjugate to primary destabilizing residues, which again results in a half-life of only 2 minutes (Tobias et al., 1991). If the N-terminal residue is neither a primary nor a secondary destabilizing residue, the half-life of the proteins exceeds 10 hours. We applied this rule to the proteins of our interest, with the results displayed in Table 2.

ProteinAA-sequenceHalf-life
PchASRLAPLSQC …>= 10 hours
PchBPHPLTLLQI …>= 10 hours
BSMT1EVVEVLHM …>= 10 hours

Table 2: The resulting half-lifes after using the N-end rule.

According to the N-End rule the half-life of our proteins exceeds 10 hours. If we compare this value with the generation time of a single E. coli cell, we can conclude that these proteins live far longer than the cell itself . Therefore we will take this generation time as a value for our “protein degradation”. On the Bionumbers website, we found that a good rule of thumb for this generation time is around 3000s, which is 50 min.

### Concluding values:

pchApchBbsmt1
Copy number200 molecules200 molecules200 molecules
Transcription rate///
Translation4,965 per s0,293 per s0,085 per s

The limiting step (associated with the most uncertainty) in the kinetic parameter model is the transcription rate. We hoped to improve this aspect by using wetlab derived mRNA levels, obtained via qPCR studies. Due to the unforeseen circumstances with the qPCR we unfortunately were not able to integrate these wet-lab data for each protein in our system. More about this qPCR story can be read here.
These amounts were meant to reduce the uncertainty in the model. Rather than running our model with unrealistic values (eg. the transcription rate formula described in section 1) and add uncertainty to the data, we opted to not use this model at this point.
However, we think that our extensive literature study has been very instructive, and hope that other iGEM teams could use this study (for example the RBS calculator) as a basis for their model. We also want to emphasise the importance of the qPCR approach. Providing we can circumvent the problems associated with high copy number plasmids, wetlab data will offer more realistic values than currently used, modelled transcription values.

Barrick, D. et al. (1994). Quantitative analysis of ribosome binding sites in E. coli. Nucleic Acids Research, 22(7):1287-1295.
de Smit, M. H., and van Duin J. (1990). Secondary structure of the ribosome binding site determines translational efficiency: a quantitative anaylsis. PNAS, 87(19):7668-7672.
Dougan, D.A., Truscott, K.N., Zeth, K. (2010). The bacterial N-end rule pathway: expect the unexpected. Molecular Biology, 76(3):545–558.
Gaille, C., Reimman, C., and Haas, D. (2003). Isochorismate Synthase (PchA), the First and Rate-limiting Enzyme in Salicylate Biosynthesis of Pseudomonas aeruginosa. The Journal of Biological Chemistry, 278: 16893-16898.
Kennell, D., and Riezman, H. (1997). Transcription and translation initiation frequencies of the Escherichia coli lac operon. J Mol Biol., 114(1):1-21.
McCarthy, J. E. G., and Gualerzi, C. (1990). Translational control of prokaryotic gene expression. Trends in Genetics, 6:78-85.
Salis, H. M., Mirsky, E. A., Voigt, C. A. (2009). Automated design of synthetic ribosome binding sites to control protein expression. Nature Biotechnology, 27:946-950.
Salis, H. M. (2011). The Ribosome Binding Site Calculator. Methods in enzymology, 498.
Serino, L., et al. (1995). Structural genes for salicylate biosynthesis from chorismate in Pseudomonas Aeruginosa. Molecular & General Genetics, 249(2):217-228.
Shiue, E., Prather, K. L. J. (2012). Synthetic biology devices as tools for metabolic engineering. Biochemical Engineering Journal, 65:82-89.
Shrader, T. E., Tobias, J. W., Varshavsky, A. (1993). The N-End Rule in Escherichia coli: Cloning and Analysis of the Leucyl, Phenylalanyl-tRNA-Protein Transferase Gene aat. Journal of Bacteriology, 175(14):4364-4374.
Tobias, J. W., Shrader, T. E., Rocap, G., Varshavsky, A. (1991). The N-End Rule in Bacteria. Science,254:1374-1377. Varshavsky, A. (1997). The N-end rule pathway of protein degradation. Genes to Cell, 2:13–28.
Books:
Watson, J. D., Baker, T. A., Bell, S. P., Gann, A., Levine, M., Losick, R. (2013). Molecular Biology of the Gene (7th edition). Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.