Team:Heidelberg/Project Software

From 2013.igem.org

(Difference between revisions)
Line 100: Line 100:
                     </p>
                     </p>
-
                     <h2>Experiments</h2>
+
                   
 +
                     <h2>Results</h2>
 +
                    <h3>NRPSDesigner database structure</h3>
                     <p style="font-size:14px; text-align:justify">
                     <p style="font-size:14px; text-align:justify">
-
                         Our aim is to express delftibactin in E.coli. This will be achieved  by introducing three different plasmids which contain parts of the delftibactin-cluster [File:Del cluster.gb] ,a Methylmalonyl-CoA pathway, a Pptase which replaces the DelC-function and a permeability device for the export of the desired NRP.</p>
+
                         The NRPSDesigner is a knowledge-based software tool using stored information about NRPS pathways to predict the optimal domain sequence that is able to produce a user-defined NRP. The storage organization is of great importance for the functionality of the designer because of its dependence on a comprehensive description of the biological and biochemical properties of NRPSs. For this purpose we built a hierarchical database that comprises three layers of complexity (see Figure XX): i) the DNA level represented by all DNA coding sequences. They are directly linked to ii) their encoded NRPS domain, respectively. Finally iii) our database stores detailed information about the substrate and its potential modification of the corresponding domain. <br>
-
<ol>
+
Next to the tight links between these layers all of them also point at additional database entries that complete the needed information for the design algorithm. For example, a DNA coding sequence is linked not only to its product, the translated domain, but also to its origin (organism, plasmid etc.) Additionally, a coding sequence can also be connected to another coding sequence. This ‘parent’ sequence is a predecessor of the stored sequence that already underwent biosynthetic modification. On the domain level there is an upstream link to the coding sequence but also to the specific type of domain (e.g. thioesterase or the condensation domains). Some Domains, such as adenylation domains, also point at monomers, based on their substrate specificity. Subsequently, for these substrates we store their chirality, modification and if they are proteinogenic or not (e.g. glutamine and ornithine in Figure ?). To enable the NRPSDesigner to use information from outside of the database it is equipped with global identifiers. For organisms we saved the NCBI taxon id, while for BioBricks the unique identifier in the Parts registry. To integrate the content with other databases, we created for every layer a linkout entry that consists of a type and specific identifier. The linkout type includes a description of the corresponding resource, as well as a URL, which in combination with the specific identifier enables the cross-linking of each database entry to other resources. The most common linkout types are Norine and Pubchem IDs for the substrates, PFAM IDs for the domain types and GenBank identifiers for the coding sequences.<br>
-
<li>Methylmalonyl-CoA, ppTase & permeability device</li>
+
For visualization of the NRPS domains and the chemical structures of the substrates by Open Bable we added a JSON representation of each domain, based on the Pfam Graphics library (link) and the SDF (structure data file) format, respectively.  
-
<li>DelH</li>
+
 
-
<li>DelA-P - The rest of the genes of the Del-cluster</li>
+
-
Basic Strategy will be described in the following paragraphs. For further detailed experiments you can visit our LabJournal[Link to labjournal].
+
-
</ol>
+
-
<ol>
+
-
<li> Our first aim was to achieve a genomic integration of the genes that encode for components of the Methylmalonyl-CoA  pathway into E.coli. The presence of this pathway is required for the production of NRPs. Because the genomic integration turned out to be more challenging then expected a new strategy was developed. Therefore, two plasmids were created (pIK2) containing MethylmalonylCoA amplified from Streptomyces coeliolor and a ppTase amplified from Bacillus subtilis in the Biobrick Backbone pSB3C5 and the permeability device (BBa_I746200) for the outer membrane of E.coli was inserted in another plasmid (pIK1). Team Cambridge revealed in 2007  that Bba_I746200 is toxic. It was itherefore inserted into pIK2 between the two terminators driven by a weak promoter (BBa_J23114) and a weak RBS (Bba_B0030), yielding pIK8 with a total size of 9467 bp, which was inserted in DH10ß and BL21DE3 via electroporation.</li>
+
-
<li>
+
-
As the gene encoding DelH alone has a size of 18 kb we decided to clone and introduce this huge gene on a separate plasmid. The first restriction enzyme strategy was problematic because of DelH amplification and the low yield in the ligation. A new GibsonAssembly-strategy was performed and DelH amplified in smaller pieces. It seemed to appear the same problem of as in the pIK1 that E.coli is selecting out the mutated DelH-constructs or is activly mutating it for toxic reasons. A plasmid was designed with the same low copy promotor as in the pIK8 and a low copy RBS [BBa_]. Another shot was a plasmid without promotor so that E.coli has no need to express and mutate DelH. Finally DelH is going to be inserted in DH10ß and BL21 via electroporation.</li>  
+
-
<li>
+
-
DelA-P (the rest of the genes of the Del-cluster) [File:Del cluster.gb] was amplified with different primer combinations out of D.acidovorans, and a plasmid was created containing these genes on the pSB4K5 Backbone with lacI promotor and without mRFP. The plasmid size is .... and was transformed into DH10ß and BL21(DE3) via electroporation.</li>
+
-
</ol>
+
-
<p style="font-size:14px">
+
-
All three plasmid were then electroporated together into BL21 and are able to export delftibactin which reduces soluble gold-ions out of the solution when present in the media.
+
                     </p>
                     </p>
-
                     <h2>Results</h2>
+
                     <h3>NRPSDesigner database content</h3>
                     <p style="font-size:14px; text-align:justify">
                     <p style="font-size:14px; text-align:justify">
-
                        Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.s dyes), pharmaceutical development (such as antibiotics) and recycling (such as chelators).  
+
The NRPSDesigner is a knowledge-based software tool using stored information about NRPS pathways to predict the optimal domain sequence that is able to produce a user-defined NRP. The storage organization is of great importance for the functionality of the designer because of its dependence on a comprehensive description of the biological and biochemical properties of NRPSs. For this purpose we built a hierarchical database that comprises three layers of complexity (see Figure XX): i) the DNA level represented by all DNA coding sequences. They are directly linked to ii) their encoded NRPS domain, respectively. Finally iii) our database stores detailed information about the substrate and its potential modification of the corresponding domain. <br>
 +
Next to the tight links between these layers all of them also point at additional database entries that complete the needed information for the design algorithm. For example, a DNA coding sequence is linked not only to its product, the translated domain, but also to its origin (organism, plasmid etc.) Additionally, a coding sequence can also be connected to another coding sequence. This ‘parent’ sequence is a predecessor of the stored sequence that already underwent biosynthetic modification. On the domain level there is an upstream link to the coding sequence but also to the specific type of domain (e.g. thioesterase or the condensation domains). Some Domains, such as adenylation domains, also point at monomers, based on their substrate specificity. Subsequently, for these substrates we store their chirality, modification and if they are proteinogenic or not (e.g. glutamine and ornithine in Figure ?). To enable the NRPSDesigner to use information from outside of the database it is equipped with global identifiers. For organisms we saved the NCBI taxon id, while for BioBricks the unique identifier in the Parts registry. To integrate the content with other databases, we created for every layer a linkout entry that consists of a type and specific identifier. The linkout type includes a description of the corresponding resource, as well as a URL, which in combination with the specific identifier enables the cross-linking of each database entry to other resources. The most common linkout types are Norine and Pubchem IDs for the substrates, PFAM IDs for the domain types and GenBank identifiers for the coding sequences.<br>
 +
For visualization of the NRPS domains and the chemical structures of the substrates by Open Bable we added a JSON representation of each domain, based on the Pfam Graphics library (link) and the SDF (structure data file) format, respectively.  
 +
 
                     </p>
                     </p>
 +
                 </div>
                 </div>
                 <div class="col-sm-12 jumbotron">
                 <div class="col-sm-12 jumbotron">

Revision as of 11:04, 3 October 2013

NRPSDesigner. Design your own NRP.

Highlights

  • Transfer of the whole delftibactin NRPS pathway from D. acidovorans into E. coli
  • Novel approach for transfering a whole NRPS pathway more than 50 kb in size from one bacterial species into another
  • Optimization of the Gibson Cloning Strategy for the creation of large plasmids (over 30 kb in size) with high GC content
  • Precipitation of pure gold from electronic waste using delftibactin

Abstract

Non-ribosomal peptide (NRP) synthesis is a biochemical process of remarkable hierarchical organization. Vertically it can be described stepwise starting from the coding DNA sequence that is translated into a giant enzyme catalyzing in turn the actual NRP assembly. Horizontally its complexity is established by a modular order of functional proteinogenic units. Due to this systematic composition a bioinformatic approach appears most suitable, if we aim for the automated design of fully synthetic NRPs.
Here, we introduce a comprehensive software tool, the NRPSDesigner, which facilitates the prediction and synthesis of non-ribosomal synthetases (NRPS) that catalyze customized NRP-assembly. The predictive power of the NRPSDesigner is based on a curated database storing information of about 200 NRPS modules, their DNA coding sequences and substrate specificities. It is used to calculate the optimal domain sequence according to the weighted phylogenetic distance between domain origins. Additionally an integrated domain recognition algorithm allows for curated expansion of the database. To accelerate the process from in silco NRPS design towards experimental validation we embedded the Gibthon iGEM software tool of Cambridge 2010 for Gibson primer construction. With this framework we want to suggest a new standard for the fast and accurate computer aided design of customized short peptides.

Introduction

Even though biological processes can be characterized by their physicochemical properties, they can also be translated into an abstract model of interconnected functional entities. This is exemplified by the central dogma of Biology describing the information flow from DNA, to RNA and finally proteins. Synthetic biology has always tried to interfere with these levels of organization with the goal of systematically controlling the projected outcome. (Auch für den oberen Abschnitt wäre eine Referenz gut…) NRPS carry this principal to extremes by adding yet another hierarchical level:
The modular proteinogenic complex sequentially synthesizes its own short non-ribosomal peptide (NRP). These peptides in turn are not limited by the standard set of proteinogenic amino acids; instead D-isoforms and diverse modifications can be utilized (reference). Nature has made great use of this system by creating versatile natural products such as antibiotics, metallophores or dyes. (reference)
Although, little is known about the actual dynamic properties of the synthesis process most of its logical rules are indeed understood: A NRP-synthetase consists of a number of modules, each of which is responsible for adding one amino-acid to the nascent peptide. But even a module can be further sub-divided into domains, each with a distinct functionality.
This hierarchal organization demonstrates the large potential for the synthetic biology community: The exchange or combination of modules and domains from different organisms or different proteins has been repeatedly shown to produce fully functional NRPS (reference). Not surprisingly, several bioinformatic approaches have put great effort to meticulous categorize NRPS and their functionality. For example, databases such as NRPS-PKS Clustermine360 describe the domain organization of diverse NRPS, while the Norine database includes information about non-ribosomal peptides and their sequence of monomers. Also, many tools are capable of predicting domain sequences, substrate specificity and hence the putative product of a particular NRPS (references). Exemplary the NRPS-PKS and PKS-NRPS analysis tool (hereafter referred to as Maryland tool) elaborate in this direction. While antiSMASH also provides similar prediction capabilities its scope is broader and covers many different secondary metabolite pathways.
However, as the understanding of the underlying biological processes and methods for assembly of diverse DNA constructs has improved, many novel software tools aim at the computer aided design (CAD) of DNA sequences (reference). Such tools have been particularly valuable to the iGEM community, as they stimulate the design of more complicated, yet less error-prone biological devices. Two examples originating from the iGEM community are: Clotho, a framework enabling the automated and computer-assisted design of synthetic biology constructs introduced by the Berkeley iGEM team of 2008 (link) and Gibthon created as a web app by the Cambridge iGEM Team of 2010, which suggests primers in order to assemble a set of predefined DNA fragments using Gibson cloning.
Influenced by this development, we introduce here the NRPSDesigner, an integrated CAD software, implemented to facilitate the design of customized synthetic NRPs. In particular, the NRPSDesigner includes the following features: Based on the NRPS-PKS database we built a manually curated database capturing the biological complexity of NRPS while storing information of about 200 NRPS modules their coding sequences and substrate specificity. The database can be easily extended with curated content using automated domain prediction based on Hidden Markov Models. By applying this information, the NRPSDesigner can calculate an optimal sequence of domains based on simple evolutionary assumptions. To accelerate the process of testing this synthetic construct and eventually produce a customized peptide we included additional assisting software to the framework. We offer to incorporate the necessary domains for combining the nascent peptide with an Indigoidine tag (link). Furthermore, embedding of the Gibthon software automates the suggestion of primers necessary for the assembly of the predicted domains by Gibson cloning.

Results

NRPSDesigner database structure

The NRPSDesigner is a knowledge-based software tool using stored information about NRPS pathways to predict the optimal domain sequence that is able to produce a user-defined NRP. The storage organization is of great importance for the functionality of the designer because of its dependence on a comprehensive description of the biological and biochemical properties of NRPSs. For this purpose we built a hierarchical database that comprises three layers of complexity (see Figure XX): i) the DNA level represented by all DNA coding sequences. They are directly linked to ii) their encoded NRPS domain, respectively. Finally iii) our database stores detailed information about the substrate and its potential modification of the corresponding domain.
Next to the tight links between these layers all of them also point at additional database entries that complete the needed information for the design algorithm. For example, a DNA coding sequence is linked not only to its product, the translated domain, but also to its origin (organism, plasmid etc.) Additionally, a coding sequence can also be connected to another coding sequence. This ‘parent’ sequence is a predecessor of the stored sequence that already underwent biosynthetic modification. On the domain level there is an upstream link to the coding sequence but also to the specific type of domain (e.g. thioesterase or the condensation domains). Some Domains, such as adenylation domains, also point at monomers, based on their substrate specificity. Subsequently, for these substrates we store their chirality, modification and if they are proteinogenic or not (e.g. glutamine and ornithine in Figure ?). To enable the NRPSDesigner to use information from outside of the database it is equipped with global identifiers. For organisms we saved the NCBI taxon id, while for BioBricks the unique identifier in the Parts registry. To integrate the content with other databases, we created for every layer a linkout entry that consists of a type and specific identifier. The linkout type includes a description of the corresponding resource, as well as a URL, which in combination with the specific identifier enables the cross-linking of each database entry to other resources. The most common linkout types are Norine and Pubchem IDs for the substrates, PFAM IDs for the domain types and GenBank identifiers for the coding sequences.
For visualization of the NRPS domains and the chemical structures of the substrates by Open Bable we added a JSON representation of each domain, based on the Pfam Graphics library (link) and the SDF (structure data file) format, respectively.

NRPSDesigner database content

The NRPSDesigner is a knowledge-based software tool using stored information about NRPS pathways to predict the optimal domain sequence that is able to produce a user-defined NRP. The storage organization is of great importance for the functionality of the designer because of its dependence on a comprehensive description of the biological and biochemical properties of NRPSs. For this purpose we built a hierarchical database that comprises three layers of complexity (see Figure XX): i) the DNA level represented by all DNA coding sequences. They are directly linked to ii) their encoded NRPS domain, respectively. Finally iii) our database stores detailed information about the substrate and its potential modification of the corresponding domain.
Next to the tight links between these layers all of them also point at additional database entries that complete the needed information for the design algorithm. For example, a DNA coding sequence is linked not only to its product, the translated domain, but also to its origin (organism, plasmid etc.) Additionally, a coding sequence can also be connected to another coding sequence. This ‘parent’ sequence is a predecessor of the stored sequence that already underwent biosynthetic modification. On the domain level there is an upstream link to the coding sequence but also to the specific type of domain (e.g. thioesterase or the condensation domains). Some Domains, such as adenylation domains, also point at monomers, based on their substrate specificity. Subsequently, for these substrates we store their chirality, modification and if they are proteinogenic or not (e.g. glutamine and ornithine in Figure ?). To enable the NRPSDesigner to use information from outside of the database it is equipped with global identifiers. For organisms we saved the NCBI taxon id, while for BioBricks the unique identifier in the Parts registry. To integrate the content with other databases, we created for every layer a linkout entry that consists of a type and specific identifier. The linkout type includes a description of the corresponding resource, as well as a URL, which in combination with the specific identifier enables the cross-linking of each database entry to other resources. The most common linkout types are Norine and Pubchem IDs for the substrates, PFAM IDs for the domain types and GenBank identifiers for the coding sequences.
For visualization of the NRPS domains and the chemical structures of the substrates by Open Bable we added a JSON representation of each domain, based on the Pfam Graphics library (link) and the SDF (structure data file) format, respectively.

  1. blabla
  2. balbla