Team:Wageningen UR/Modeling



“When I came out of school I didn't even think that modeling was a job.”


Producing a compound in a novel host at first requires investigation of the possibility to do so. Since the compounds required for biosynthesis of lovastatin (Acetyl-CoA, Malonyl-CoA, S-Adenosyl-L-Methionine and NADPH) occur naturally in metabolic routes such as the citric acid cycles and fatty acid synthesis pathways, all of the Aspergilli that are modeled have the potential ability to produce lovastatin when the required genes are introduced. Analysis and comparison of the different models allows for a broad insight in efficient biosynthesis strategies.


Modeling allows for more efficient design of experiments, such that feasibility and potential of genetic modifications can be assessed before actually setting foot in the lab. Via a system wide perspective modeling offers the investigation of gene knock-ins and knock-outs. To investigate the potential of lovastatin production in Aspergillus niger metabolic modeling offers a first insight. By comparing lovastatin production capacity in 4 different Aspergilli; Aspergillus nidulans , Aspergillus oryzae, Aspergillus terreus and A. niger, we assess the potential for lovastatin biosynthesis in these organisms. This approach deals with one of the primary challenges in systems biology; development and investigation of mathematical models of metabolic processes. Constraint-based genome scale metabolic models allow for identification of reactions and related genes that can be targeted to benefit lovastatin production. By modeling metabolic bottlenecks may be identified and by comparing the results for different models a solution may be found as well. Contrary to dynamic modeling we focus on constraint-based genome-scale metabolic modeling, which means that we analyze systems in steady-state. Imposing steady-state relationships between rates constrains the possible states the system can be in with respect to fluxes through the network, hence the name constraint-based modeling.


The aims are to assess whether A. niger is a suitable host for lovastatin production, and to explore whether the lovastatin production can be increased using genetic engineering (knock-ins & knock-outs) and/or media design.


In order to be able to compare the models we need to make the models consistent, meaning that we need to make sure that similar compounds and reactions have similar names in the different models. Since the origin of the models is not the same, and even in those that originate from the same research group, there are differences that complicate a comparative analysis. For example, the same metabolites can have a different name in the models. In our case this complicated addition of the lovastatin pathway in the different models, since the metabolites that are used as input of this pathway did not have similar names. After having generated a generic namespace for both reactions and metabolites we analysed the metabolic flux towards lovastatin using flux balance analysis (FBA) [1]. After having established that lovastatin production is feasible we need to define a medium composition. This medium composition should not contain components that cannot be used by all models, or else outcomes cannot be compared properly. Once this is done we can set medium conditions containing components that are accessible in all three models. Subsequently we determine gene knock-out strategies and compare the differences in results between the models.

Research methods

First of all we extract the models from their respective sources. Since the A. terreus model is in an excel format that is not readily imported into the COBRA toolbox we convert it into an xml file which can be. In order to do so and make the models consistent we make use of MetanetX , which is an initiative in trying to standardise metabolic models. The models that we investigate are those of A. terreus, A. niger, A. nidulans and A. oryzae [2][3][4][5]. After we have obtained all the models in xml format we make use of the COBRA toolbox within MATLAB [6][7]. The COBRA toolbox facilitates easy input of the metabolic model in the Systems Biology Markup Language (SBML) to perform these calculations in MATLAB [11]. Once the model has been expanded flux balance analysis allows for a genome-scale approach. After adding the lovastatin pathway the next thing we did was determine the exchange reactions of the models to see which of those the models have in common. Exchange reactions determine what metabolites can be either taken up from, or secreted into the medium. Changing medium conditions will allow us to obtain insight in effect of its compositions to deduce efficient production media. In the next step of the process we will determine the effect of single gene deletion strategies on lovastatin production. Finally we will also determine gene knock-out strategies using a multilayered optimisation structure. This bilevel optimisation problem involves two competing objectives, cellular growth and chemical production. The inner problem performs the flux allocation based on the optimisation of cellular growth. The outer problem then maximizes lovastatin production by restricting access to key reactions available to the optimization of cellular growth. OptKnock is such a tool and can be used to determine which gene knockouts should increase the metabolic flux towards lovastatin [10].

Lovastatin pathway

Lovastatin starts with the synthesis of dihydromonacolin L by a large iterative polyketide synthase (lovB) and an enoyl reductase (lovC). The iterative polyketide synthase starts with amalgamation of acetyl-CoA and malonyl-CoA in the first step, after which another malonyl-group is added at each subsequent step. At one point a methyl group is added, which is derived from S-adenosyl-methionine. Together LovB and LovC catalyse 35 reactions to form this intermediate of lovastatin. Also a Diels Alder cyclization occurs during the process, though this reaction occurs spontaneous. In order to model the biosynthesis of dihydromonacolin L, several steps have been lumped into a set of 8 reactions, because exact details of intermediates formed are unknown and this results in the highest level of detail possible.

Figure) Synthesis of dihydromonacolin L by LovB and LovC

In the next step dihydromonacolin L acid is converted to monacolin J acid via the intermediate monacolin L acid by the enzyme lovA (source).

Figure) Synthesis of monacolin J by LovA [9]

In parallel with this process there is another, very similar polyketide synthase LovF, that synthesizes the intermediate 2-metylbutyryl-CoA from the same starting substrates, acetyl-CoA and malonyl-CoA.

Figure) Synthesis of 2-methylbutyryl-CoA by LovF [8]

In the final step of the process, LovD amalgamates monacolin J and 2-metylbutyryl-CoA into the product lovastatin.

Figure) Synthesis of lovastatin by LovD [8]

We have now seen a schematic representation of the lovastatin biosynthesis pathway. However, it’s important, yet difficult, to identify the exact details of the individual reactions. One of this non-trivial details, it turns out, is the co-factor NADPH which is required by LovB.


In order to analyze the differences and compare the different models it is best to use a standardized namespace. Although MetanetX attempts to do so, it is not perfect and therefore one must carefully address the changed imposed by the system when one tries to upload a model. Another thing that needs to be taken into account is that the MATLAB script written is generic, such that when the models are modified the script should function properly as before.

Converting and improving the A. terreus model

The A. terreus model was obtained in excel format. The first step taken is to convert the Excel file (.xls) into .xml format such that the model can be accessed via MATLAB. For A. terreus the biomass reaction contained a great number of metabolites. By adding an additional compound to the reactions called 'BIOMASS' with a stoichiometric coefficient of 1, we can add an exchange reaction for biomass. This allows for convenient optimisation of this exchange reaction in FBA since this feature is now generic for all models.
The A. terreus model does already contain a pathway for lovastatin production, however as it turns out this pathway is not functional because it is not balanced; the first step in which they lump the 35 reactions performed by lovB and lovC is missing the co-factor NADPH on one side, while NADP+ is present on the other [5].

Standardising the models

Before proper conversion to the MNXM namespace from MetanetX can be finalized, uploading to model to MetanetX yields a mapping summary. This mapping summary includes information on the metabolites, compartments and reactions and their success of mapping. Since the system does not recognize all of them properly, the report has to be checked manually.

Figure) Mapping summary obtained after upload.

Mapping of metabolites is done on recognition of IDs and description. To check whether all metabolites were mapped correctly, mapping of those that do not have similar descriptions, IDs, or both, have to be manually curated from errors. Wrongly mapped metabolites had to be either set to their right respective MNXM id or else were given the prefix ‘my_’ to ensure proper mapping. Compartments
To ensure mapping to the right compartment, it was found useful to put all the compartment names as a suffix behind square brackets (link to script). In a few cases mapping to the wrong compartment still occurred with several compounds that were not yet in the MNXM namespace, therefore these compounds were manually modified (link to script). Assignment of metabolite compartments , manual correction via excel and new names to prevent mis mapping of metabolites. %metabolite compartments
model.mets = regexprep(model.mets,{'m$','e$','p$','([^\]])$'},{'[m]','[e]','[x]','$1[c]'});
[newmets,newMNXM] = import_excel;
%function model= newname(model,newmets,newMNXM); %function

function [mets,MNXM] = import_excel

%% Import data from spreadsheet
% Script for importing data from the following spreadsheet % Workbook: /Users/.../correction.xlsx
% Worksheet: Sheet1
% To extend the code for use with different selected data or a different
% spreadsheet, generate a function instead of a script.

% Auto-generated by MATLAB on 2013/09/05 16:25:41

%% Import the data
[~, ~, raw] = xlsread('/Users/.../model.xlsx','Sheet1');
raw = raw(:,1:3);
raw(cellfun(@(x) ~isempty(x) && isnumeric(x) && isnan(x),raw)) = {''};
cellVectors = raw(:,[1,2,3]);

%% Allocate imported array to column variable names
mets = cellVectors(:,1);
metNames = cellVectors(:,2);
MNXM = cellVectors(:,3);

%% Clear temporary variables
clearvars data raw cellVectors;

function model = newname(model,mets,MNXM)
mets = regexprep(mets,{'^M_','_[cmep]$'},{'',''});

for i=1:numel(mets)
        if strcmp(MNXM{i},'')
                model.mets = regexprep(model.mets,['^(' mets{i} '\[.*)'],'my_$1');
        elseif strncmp(MNXM(i),'MNXM',4)
               name = MNXM{i};
                model.mets = regexprep(model.mets,['^' mets{i} '\['],[name '[']);
                disp('unexpected entity')
Similar to the A. terreus model, both A. niger and A. nidulans don’t have a exchange reaction for biomass, which was added to allow for optimization of this parameter with the same script for all models. The last non-triviality that had to be taken into account was the balancing of the reactions. To do so the model adds protons to the side of the reaction where it deems they are missing. A serious downside of this feature is that when metabolites are not recognized are involved, it is not able to take into account the protons absorbed or lost by these compounds. Still however, it adds protons to balance the reaction, but these are not rightly added. click here for code . compartments assigned to an unknown compartment.         if i==4 %add biomass for terreus
                model = addMetabolites(model,{'BIOMASS[c]'},{''});
                model = removeRxns(model,{'BIOMASS'});
                rxnFormula = '71.4 h2o[c] + 0.14613 asp[c] + 0.15371 glu[c] + 0.22294 ala[c] + 0.17484 gly[c] + 0.10126 gln[c] + 71.4 atp[c] + 0.08887 asn[c] + 0.15713 pro[c] + 0.03297 cys[c] + 0.16063 arg[c] + 0.05566 met[c] + 0.2067 ser[c] + 0.03822 trp[c] + 0.15021 thr[c] + 0.06317 his[c] + 0.09533 phe[c] + 0.07351 tyr[c] + 0.26575 chit[c] + 0.02795 utp[c] + 0.3348 mnt[c] + 0.03051 gtp[c] + 0.0015 glycogen[c] + 0.90068 13glucan[c] + 0.076 gl[c] + 0.07966 pc[c] + 0.06168 tagly[c] + 0.03399 pe[c] + 0.02264 my_paa[c] + 0.01378 ps[c] + 0.02091 ctp[c] + 0.16131 val[c] + 0.23243 leu[c] + 0.03153 my_ffa[c] + 0.00431 dgtp[c] + 0.00398 datp[c] + 0.00431 dctp[c] + 0.00398 dttp[c] + 0.12286 ile[c] + 0.1126 lys[c] -> 71.4 h[c] + 71.4 pi[c] + 71.4 adp[c] + 1 BIOMASS[c]';
            model = addReaction(model,{'BIOMASS[c]','BIOMASSDEDUCED'},rxnFormula,[ ],[ ],[ ],[ ],1,'BIOMASS','',[ ],[ ],false);
        biomass = model.mets(strcmpi(model.mets,'biomass[c]'));
        if any(i==[2 3 4])
                model = addExchangeRxn(model,biomass,-1000,1000);
if i==1 %oryzae
        model = removeRxns(model,{'r1068','r1069','r499','r500',},false,false);
        rxnFormula = 'ATP[c] + GTP[c] -> AMP[c] + pppGp[x] +6 H[c]';
        model = addReaction(model,{'r1068','GTP diphosphokinase'},rxnFormula,[ ],[ ],[ ],[ ],1,'GTP diphosphokinase','',[ ],[ ],false);
        rxnFormula = 'H2O[c] + pppGp[x] + 4 H[c] <=> PI[c] + ppGp[x]';
        model = addReaction(model,{'r1069','Exopolyphosphatase'},rxnFormula,[ ],[ ],[ ],[ ],1,'Exopolyphosphatase','',[ ],[ ],false);
        rxnFormula = 'NADP[c] + 2 FERI[m] -> NADPH[c] + 2 FERO[m] + 4 H[m]';
        model = addReaction(model,{'r499','NADPH-cytochrome p-450 reductase'},rxnFormula,[ ],[ ],[ ],[ ],1,'NADPH-cytochrome p-450 reductase','',[ ],[ ],false);
        model = addReaction(model,{'r500','NADPH-cytochrome p-450 reductase'},rxnFormula,[ ],[ ],[ ],[ ],1,'NADPH-cytochrome p-450 reductase','',[ ],[ ],false);

if i==3 %niger
        model = removeRxns(model,{'312'})
        rxnFormula = '2 FERI[m] + NADP[m] -> 2 FERO[m] + NADPH[m] + 4 H[m]';
        model = addReaction(model,{'312','NADPH-ferrihemoprotein reductase (cprA)'},rxnFormula,[],[],[],[],1,'NADPH-ferrihemoprotein reductase (cprA)','',[],[],false);

if i==4 %terreus
        model = removeRxns(model,{'r0023'});
        rxnFormula = 'fad[m] + hpro[c] -> fadh2[m] + phc[c] + h[c]';
        model = addReaction(model,{'r0023','hydroxyproline oxidase'},rxnFormula,[],[],[],[],1,'hydroxyproline oxidase','',[],[],false);

Adding the lovastatin pathway

Since the lovastatin pathway is only present in the A. terreus model, but is not well balanced here, a function was written to add the pathway for lovastatin biosynthesis. First all the metabolites that are not yet in the model need to be added. Then the pathway upto the production of the intermediate dihydromonacolin L acid was added. This pathway consists of 35 reactions that have been lumped into 8 reactions. Together with the remaining 4 reactions required the total lovastatin biosynthesis pathway has been modeled in 14 steps. Finally also a exchange reaction for lovastatin was added to be able to optimise for its production.
The prefix 'Ex_' was added to all exchange reactions for easy recognition and an additional function was implemented that allows to easily change the optimisation criterion. Both the metabolites and reactions need to be added to the model. %add metabolites for lovastatin pathway
j =
h =
{'C4H5O','C6H7O','C9H13O','C11H17O','C13H19O','C13H19O','C15H23O','C17H27O2','C19H32O4','C19H32O4','C19H30O4','C19H30O5','C26H45N7O17P3S','C24H36O6'}; model = addMetabolites(model,j,h); %function
model = addLovPathway(model); %add reactions for lovastatin
model = addExchangeRxn(model,{'lovastatin_acid[c]'},0,1000); %add exchange reactions for lovastatin production
model.c(:) =0;
function model = addLovPathway(model)

metNames = {{'ACCOA','MALCOA', 'NADPH','diketide_LovBbound', 'NADP','COA','CO2','H2O'}, ...
        {'MALCOA','diketide_LovBbound', 'NADPH','triketide_LovBbound','NADP','COA','CO2','H2O'}, ...
        {'MALCOA','triketide_LovBbound','SAM','NADPH','tetraketide_LovBbound','SAH','NADP','COA','CO2','H2O'}, ...
        {'MALCOA','tetraketide_LovBbound','NADPH','pentaketide_LovBbound','NADP','COA','CO2','H2O'}, ...
        {'MALCOA','pentaketide_LovBbound','NADPH','hexaketide_LovBbound','NADP','COA','CO2','H2O'}, ...
        {'hexaketide_LovBbound','cyclic_hexaketide_LovBbound'}, ...
        {'MALCOA','cyclic_hexaketide_LovBbound','NADPH','heptaketide_LovBbound','NADP','COA','CO2','H2O'}, ...
        {'MALCOA','heptaketide_LovBbound','NADPH','octaketide_LovBbound','NADP','COA','CO2'}, ...
        {'MALCOA','octaketide_LovBbound','NADPH','H2O','nonaketide_LovBbound','NADP','COA','CO2'}, ...
        {'nonaketide_LovBbound','dihydromonacolin_L_acid'}, ...
        {'dihydromonacolin_L_acid','monacolin_L_acid'}, ...
        {'monacolin_L_acid','H2O','monacolin_J_acid'}, ...
        {'ACCOA','MALCOA','SAM','2-metylbutyryl-COA','COA','SAH','CO2','H2O'}, ...
        {'2-metylbutyryl-COA','monacolin_J_acid','lovastatin_acid','COA'} ...
stoich = {[-1 -1 -1 1 1 2 1 1], ...
        [-1 -1 -1 1 1 1 1 1], ...
        [-1 -1 -1 -2 1 1 2 1 1 1], ...
        [-1 -1 -2 1 2 1 1 1], ...
        [-1 -1 -1 1 1 1 1 1], ...
        [-1 1], ...
        [-1 -1 -2 1 2 1 1 1], ...
        [-1 -1 -1 1 1 1 1], ...
        [-1 -1 -1 1 1 1 1 1], ...
        [-1 1], ...
        [-1 1], ...
        [-1 -1 1], ...
        [-1 -1 -1 1 1 1 1 1], ...
        [-1 -1 1 1]};

for i=1:numel(stoich)
        metNames{i} = strcat(metNames{i},'[c]');
        metNames{i} = regexprep(metNames{i},' ','_');
        model = addReaction(model,['rxnName' num2str(i)],metNames{i},stoich{i});

Media composition

In order to find medium constraints of the models all lower and upper bounds are first set to the value of -1000 and 1000 arbitrary units, respectively. Then one by one the lower and upper limit of the reaction bounds are set to 0 for one reaction at a time. If the model is not able to optimise the reaction of interest this indicates that exchange of this component with the medium is a prerequisite. If a lower bound larger then 0 is required this indicates that the component needs to be present in the in silico medium. When an upper bound larger then one required this means the opposite, namely that detrimental accumulation of this metabolite occurs ( See script ). In order to define a proper in silico medium we need to know which exchange reactions the models have in common and which of these are of essential importance. The models have 75 exchange reactions ) in common. Since exchange reactions for oleic acid, linoleic acid and octadecanoic acid, which are essential requirements in the A. terreus model, were lacking in A. oryzae and A. niger model, these had to be added as well. Although the models are all different, the same input can now be used to optimise the lovastatin flux. [list,mnxmor,or,mnxmni,ni,mnxmte,te] = import_excel2;

model = readCbModel('/Users/Zarathustra/Desktop/Modeling/new_NIGER.xml');
model = readCbModel('/Users/Zarathustra/Desktop/Modeling/new_TERREUS.xml');

addReaction(model,rxnName,metaboliteList,stoichCoeffList,revFlag,lowerBound,upperBound, ...
uniqlist = unique(list);
%pre-allocate var
names = cell(numel(uniqlist),4);
names(:,1) = uniqlist;
[~,IA] = ismember(mnxmor,uniqlist);
names(IA(IA~=0),2) = or(IA~=0);
[~,IA] = ismember(mnxmni,uniqlist);
names(IA(IA~=0),3) = ni(IA~=0);
[~,IA] = ismember(mnxmte,uniqlist);
names(IA(IA~=0),4) = te(IA~=0);
inmodel = cellfun(@isempty,names(:,2:4));
share = ~any(inmodel,2);
shared = uniqlist(find(share~=0));
shared_reactions = names(share,:);
names = regexprep(names,{'R_','_LSQBKT_','_RSQBKT_'},{'','[',']'})
function model = mediumCheck(model);
er = find(findExcRxns(model));

for i=1:numel(er) = 0;
        check = optimizeCbModel(model);
        if check.f<10^-6
                display 'essential metabolite required, knowing:'
                [model.rxns(er(i)) model.rxnNames(er(i))]
        end = -1000;
for i=1:numel(er)
        check = optimizeCbModel(model);
        if check.f<10^m-6
                display 'detrimental accumulation of'
                [model.rxns(er(i)) model.rxnNames(er(i))]
        model.ub(er(i)) = 1000;

Genetic engineering strategies

In order to obtain insight in the genes that are most important for lovastatin production we performed single gene deletions. The five gene deletions that had the most profound effect on the lovastatin flux were determined. In all models it turns out that these genes are related to the central carbon metabolism. This result is not surprising and neither is the fact that this holds true for all models. More interesting perhaps are the results obtained with OptKnock. The suggested deletion strategies are include those that use components required for the production of lovastatin, such as acetyl CoA and malonyl CoA. The list also includes reactions with NADPH, which is the co-factor required for lovastatin biosynthesis.


Shared exchange reactions have been determined as well as gene deletion strategies. In the future media compositions can be changed to see if this increases the lovastatin biosynthesis. The same goes for the gene deletion strategies, as it will be interesting to see what the effect of the proposed deletions really is. Finally it would be nice to compare the outcomes of the different models to see if there are outcomes which are not intuitive.


1. Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nat Biotechnol. 2010 Mar;28(3):245-8. doi: 10.1038/nbt.1614
2. Andersen MR, Nielsen ML, Nielsen J. Metabolic model integration of the bibliome, genome, metabolome and reactome of Aspergillus niger. Mol Syst Biol. 2008;4:178. doi: 10.1038/msb.2008.12. Epub 2008 Mar 25.
3. David H, Ozçelik IS, Hofmann G, Nielsen J. Analysis of Aspergillus nidulans metabolism at the genome-scale. BMC Genomics. 2008 Apr 11;9:163. doi: 10.1186/1471-2164-9-163. PubMed PMID: 18405346
4. Vongsangnak W, Olsen P, Hansen K, Krogsgaard S, Nielsen J. Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae. BMC Genomics. 2008 May 23;9:245. doi: 10.1186/1471-2164-9-245
5. Liu J, Gao Q, Xu N, Liu L. Genome-scale reconstruction and in silico analysis of Aspergillus terreus metabolism. Mol Biosyst. 2013 Jul;9(7):1939-48. doi: 10.1039/c3mb70090a
6. Becker SA, Feist AM, Mo ML, Hannum G, Palsson BØ, Herrgard MJ. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nat Protoc. 2007;2(3):727-38.
7. Schellenberger J, Que R, Fleming RM, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S, Kang J, Hyduke DR, Palsson BØ. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0. Nat Protoc. 2011 Aug 4;6(9):1290-307. doi: 10.1038/nprot.2011.308.
8. Ames BD, Nguyen C, Bruegger J, Smith P, Xu W, Ma S, Wong E, Wong S, Xie X, Li JW, Vederas JC, Tang Y, Tsai SC. Crystal structure and biochemical studies of the trans-acting polyketide enoyl reductase LovC from lovastatin biosynthesis. Proc Natl Acad Sci U S A. 2012 Jul 10;109(28):11144-9. doi: 10.1073/pnas.1113029109.
9. Xu W, Chooi YH, Choi JW, Li S, Vederas JC, Da Silva NA, Tang Y. LovG: the thioesterase required for dihydromonacolin L release and lovastatin nonaketide synthase turnover in lovastatin biosynthesis. Angew Chem Int Ed Engl. 2013 Jun 17;52(25):6472-5. doi: 10.1002/anie.201302406.
10. Burgard AP, Pharkya P, Maranas CD. Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng. 2003 Dec 20;84(6):647-57
11. Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A, Cuellar AA, Dronov S, Gilles ED, Ginkel M, Gor V, Goryanin II, Hedley WJ, Hodgman TC, Hofmeyr JH, Hunter PJ, Juty NS, Kasberger JL, Kremling A, Kummer U, Le Novère N, Loew LM, Lucio D, Mendes P, Minch E, Mjolsness ED, Nakayama Y, Nelson MR, Nielsen PF, Sakurada T, Schaff JC, Shapiro BE, Shimizu TS, Spence HD, Stelling J, Takahashi K, Tomita M, Wagner J, Wang J; SBML Forum. The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics. 2003 Mar 1;19(4):524-31