Team:Heidelberg/Project Software
From 2013.igem.org
Line 111: | Line 111: | ||
<h3>NRPSDesigner database content</h3> | <h3>NRPSDesigner database content</h3> | ||
<p style="font-size:14px; text-align:justify"> | <p style="font-size:14px; text-align:justify"> | ||
- | The | + | The information currently stored in the NRPS database was mainly retrieved from already published data (Link) and extended or changed according to our own experimental results. Because of the already present domain organization and the related substrate specificities we mainly filled our database with information from the NRPS-PKS library and added the missing coding sequences. Accordingly, the positions of domain boundaries and linkers had to be converted from their protein specific coordinates to DNA coordinates. In some cases manual curation was necessary, because for example…: In table XX all changes to the original NRPS-PKS are listed and explained. |
- | + | In collaboration with the iGEM team Edinburgh the database was extended by NRPS sequences the team has worked with during their project… | |
- | + | In total our database contains curated information about… | |
+ | |||
</p> | </p> | ||
+ | <h3>Extension and validation of the database</h3> | ||
+ | <p style="font-size:14px; text-align:justify"> | ||
+ | One of the core requirements for our software is the ability to detect NRPS domains. This feature enables the automatic and standardized definition of domain boundaries and also facilitates the addition of new entries into the NRPSDesigner database by the community. <br> | ||
+ | Based on the antiSMASH program we established our own pipeline to maximize domain recognition specificity. antiSMASH is implemented in (BIO)Python and uses HMMER3 and thus, could be easily integrated in our framework. To improve domain recognition especially for the Adenylation/Oxidation/Adenylation (AOxA) domain of IndC, the Adenylation domain HMM (from Pfam, id PF00501.21) of antiSMASH was replaced by another HMM of the Maryland tool constructed from the seed alignment for the A domain (special thanks to Prof. Jacques Ravel for providing us with the seed alignment). In addition, a HMM profile of the AOxA domain appearing in diverse indigoidine synthetases was constructed and added to the pipeline. Furthermore, Thiolation domains have been split into two categories, depending on whether an epimerisation domain follows or not, as experimental evidence has shown that their functionality differs (link).<br> | ||
+ | To easily extend the NRPSDesigner database this domain recognition pipeline was integrated into a user-friendly interface. The user can add an additional description for each domain, possibly change the specificity of the adenylation domains if it has not been predicted correctly and also define his own domain boundaries, with the help of an integrated multiple sequence alignment against the domains of the same type already present in the database. | ||
+ | |||
+ | </p> | ||
+ | <h3>Guidance for cloning of NRPS constructs with Gibthon</h3> | ||
+ | <p style="font-size:14px; text-align:justify"> | ||
+ | Going further than offering in-silico predicted sequences of NRPS domains producing a particular NRP would be the implementation of a cloning procedure based on Gibson assembly for the NRPSDesigner. This should make NRPS more accessible to the synthetic biology community. One of the most popular tools for computer-aided primer design is Gibthon. Created as a web app by Bill Collins of the Cambridge iGEM Team of 2010 Gibthon suggests primers for the assembly of predefined DNA fragments (entered by the user or imported from the Parts Registry) using Gibson cloning. Since Gibthon was such a successful iGEM software project and also written in Django, we decided to use it as our tool of choice for automated primer design.<br> | ||
+ | Gibthon was integrated into the core NRPSDesigner GUI using a modular interface. Particular care was taken to keep Gibthon and the rest of the NRPSDesigner clearly separated, to enable the use of other primer design software such as J5 in the future. The strategy applied for the integration is the following: For each of the domains returned by the in-silico prediction, the DNA sequence is extracted from the NRPSDesigner database. The resulting sequences are returned in a ring structure, which ensures a minimal number of Gibson fragments to be assembled. These sequences, together with metadata, such as references and descriptions, are appropriately converted to the Gibthon database format and then copied into the Gibthon gene fragment table. In return the user has access to the standard Gibthon interface to get an overview over the suggested primers. Similar to Gibthon the NRPSDesigner is tightly linked with the registry of standard biological parts. The user can add his parts of choice, using the automated parts registy import tool. However, some additional restrictions have been placed in order to ensure the integrity of the designed NRPS sequence: The user cannot enter a new fragment/part in between one of the NRPS domains; instead he can only place it after the Thioesterase and before the initiation adenylation domain.<br> | ||
+ | To easily extend the NRPSDesigner database this domain recognition pipeline was integrated into a user-friendly interface. The user can add an additional description for each domain, possibly change the specificity of the adenylation domains if it has not been predicted correctly and also define his own domain boundaries, with the help of an integrated multiple sequence alignment against the domains of the same type already present in the database. | ||
+ | </p> | ||
</div> | </div> | ||
<div class="col-sm-12 jumbotron"> | <div class="col-sm-12 jumbotron"> |
Revision as of 11:47, 3 October 2013
NRPSDesigner. Design your own NRP.
Highlights
- Transfer of the whole delftibactin NRPS pathway from D. acidovorans into E. coli
- Novel approach for transfering a whole NRPS pathway more than 50 kb in size from one bacterial species into another
- Optimization of the Gibson Cloning Strategy for the creation of large plasmids (over 30 kb in size) with high GC content
- Precipitation of pure gold from electronic waste using delftibactin
Abstract
Non-ribosomal peptide (NRP) synthesis is a biochemical process of remarkable hierarchical organization. Vertically it can be described stepwise starting from the coding DNA sequence that is translated into a giant enzyme catalyzing in turn the actual NRP assembly. Horizontally its complexity is established by a modular order of functional proteinogenic units. Due to this systematic composition a bioinformatic approach appears most suitable, if we aim for the automated design of fully synthetic NRPs.
Here, we introduce a comprehensive software tool, the NRPSDesigner, which facilitates the prediction and synthesis of non-ribosomal synthetases (NRPS) that catalyze customized NRP-assembly. The predictive power of the NRPSDesigner is based on a curated database storing information of about 200 NRPS modules, their DNA coding sequences and substrate specificities. It is used to calculate the optimal domain sequence according to the weighted phylogenetic distance between domain origins. Additionally an integrated domain recognition algorithm allows for curated expansion of the database. To accelerate the process from in silco NRPS design towards experimental validation we embedded the Gibthon iGEM software tool of Cambridge 2010 for Gibson primer construction. With this framework we want to suggest a new standard for the fast and accurate computer aided design of customized short peptides.
Introduction
Even though biological processes can be characterized by their physicochemical properties, they can also be translated into an abstract model of interconnected functional entities. This is exemplified by the central dogma of Biology describing the information flow from DNA, to RNA and finally proteins. Synthetic biology has always tried to interfere with these levels of organization with the goal of systematically controlling the projected outcome. (Auch für den oberen Abschnitt wäre eine Referenz gut…)
NRPS carry this principal to extremes by adding yet another hierarchical level:
The modular proteinogenic complex sequentially synthesizes its own short non-ribosomal peptide (NRP). These peptides in turn are not limited by the standard set of proteinogenic amino acids; instead D-isoforms and diverse modifications can be utilized (reference). Nature has made great use of this system by creating versatile natural products such as antibiotics, metallophores or dyes. (reference)
Although, little is known about the actual dynamic properties of the synthesis process most of its logical rules are indeed understood: A NRP-synthetase consists of a number of modules, each of which is responsible for adding one amino-acid to the nascent peptide. But even a module can be further sub-divided into domains, each with a distinct functionality.
This hierarchal organization demonstrates the large potential for the synthetic biology community: The exchange or combination of modules and domains from different organisms or different proteins has been repeatedly shown to produce fully functional NRPS (reference). Not surprisingly, several bioinformatic approaches have put great effort to meticulous categorize NRPS and their functionality. For example, databases such as NRPS-PKS Clustermine360 describe the domain organization of diverse NRPS, while the Norine database includes information about non-ribosomal peptides and their sequence of monomers. Also, many tools are capable of predicting domain sequences, substrate specificity and hence the putative product of a particular NRPS (references). Exemplary the NRPS-PKS and PKS-NRPS analysis tool (hereafter referred to as Maryland tool) elaborate in this direction. While antiSMASH also provides similar prediction capabilities its scope is broader and covers many different secondary metabolite pathways.
However, as the understanding of the underlying biological processes and methods for assembly of diverse DNA constructs has improved, many novel software tools aim at the computer aided design (CAD) of DNA sequences (reference). Such tools have been particularly valuable to the iGEM community, as they stimulate the design of more complicated, yet less error-prone biological devices. Two examples originating from the iGEM community are: Clotho, a framework enabling the automated and computer-assisted design of synthetic biology constructs introduced by the Berkeley iGEM team of 2008 (link) and Gibthon created as a web app by the Cambridge iGEM Team of 2010, which suggests primers in order to assemble a set of predefined DNA fragments using Gibson cloning.
Influenced by this development, we introduce here the NRPSDesigner, an integrated CAD software, implemented to facilitate the design of customized synthetic NRPs. In particular, the NRPSDesigner includes the following features: Based on the NRPS-PKS database we built a manually curated database capturing the biological complexity of NRPS while storing information of about 200 NRPS modules their coding sequences and substrate specificity. The database can be easily extended with curated content using automated domain prediction based on Hidden Markov Models. By applying this information, the NRPSDesigner can calculate an optimal sequence of domains based on simple evolutionary assumptions. To accelerate the process of testing this synthetic construct and eventually produce a customized peptide we included additional assisting software to the framework. We offer to incorporate the necessary domains for combining the nascent peptide with an Indigoidine tag (link). Furthermore, embedding of the Gibthon software automates the suggestion of primers necessary for the assembly of the predicted domains by Gibson cloning.
Results
NRPSDesigner database structure
The NRPSDesigner is a knowledge-based software tool using stored information about NRPS pathways to predict the optimal domain sequence that is able to produce a user-defined NRP. The storage organization is of great importance for the functionality of the designer because of its dependence on a comprehensive description of the biological and biochemical properties of NRPSs. For this purpose we built a hierarchical database that comprises three layers of complexity (see Figure XX): i) the DNA level represented by all DNA coding sequences. They are directly linked to ii) their encoded NRPS domain, respectively. Finally iii) our database stores detailed information about the substrate and its potential modification of the corresponding domain.
Next to the tight links between these layers all of them also point at additional database entries that complete the needed information for the design algorithm. For example, a DNA coding sequence is linked not only to its product, the translated domain, but also to its origin (organism, plasmid etc.) Additionally, a coding sequence can also be connected to another coding sequence. This ‘parent’ sequence is a predecessor of the stored sequence that already underwent biosynthetic modification. On the domain level there is an upstream link to the coding sequence but also to the specific type of domain (e.g. thioesterase or the condensation domains). Some Domains, such as adenylation domains, also point at monomers, based on their substrate specificity. Subsequently, for these substrates we store their chirality, modification and if they are proteinogenic or not (e.g. glutamine and ornithine in Figure ?). To enable the NRPSDesigner to use information from outside of the database it is equipped with global identifiers. For organisms we saved the NCBI taxon id, while for BioBricks the unique identifier in the Parts registry. To integrate the content with other databases, we created for every layer a linkout entry that consists of a type and specific identifier. The linkout type includes a description of the corresponding resource, as well as a URL, which in combination with the specific identifier enables the cross-linking of each database entry to other resources. The most common linkout types are Norine and Pubchem IDs for the substrates, PFAM IDs for the domain types and GenBank identifiers for the coding sequences.
For visualization of the NRPS domains and the chemical structures of the substrates by Open Bable we added a JSON representation of each domain, based on the Pfam Graphics library (link) and the SDF (structure data file) format, respectively.
NRPSDesigner database content
The information currently stored in the NRPS database was mainly retrieved from already published data (Link) and extended or changed according to our own experimental results. Because of the already present domain organization and the related substrate specificities we mainly filled our database with information from the NRPS-PKS library and added the missing coding sequences. Accordingly, the positions of domain boundaries and linkers had to be converted from their protein specific coordinates to DNA coordinates. In some cases manual curation was necessary, because for example…: In table XX all changes to the original NRPS-PKS are listed and explained. In collaboration with the iGEM team Edinburgh the database was extended by NRPS sequences the team has worked with during their project… In total our database contains curated information about…
Extension and validation of the database
One of the core requirements for our software is the ability to detect NRPS domains. This feature enables the automatic and standardized definition of domain boundaries and also facilitates the addition of new entries into the NRPSDesigner database by the community.
Based on the antiSMASH program we established our own pipeline to maximize domain recognition specificity. antiSMASH is implemented in (BIO)Python and uses HMMER3 and thus, could be easily integrated in our framework. To improve domain recognition especially for the Adenylation/Oxidation/Adenylation (AOxA) domain of IndC, the Adenylation domain HMM (from Pfam, id PF00501.21) of antiSMASH was replaced by another HMM of the Maryland tool constructed from the seed alignment for the A domain (special thanks to Prof. Jacques Ravel for providing us with the seed alignment). In addition, a HMM profile of the AOxA domain appearing in diverse indigoidine synthetases was constructed and added to the pipeline. Furthermore, Thiolation domains have been split into two categories, depending on whether an epimerisation domain follows or not, as experimental evidence has shown that their functionality differs (link).
To easily extend the NRPSDesigner database this domain recognition pipeline was integrated into a user-friendly interface. The user can add an additional description for each domain, possibly change the specificity of the adenylation domains if it has not been predicted correctly and also define his own domain boundaries, with the help of an integrated multiple sequence alignment against the domains of the same type already present in the database.
Guidance for cloning of NRPS constructs with Gibthon
Going further than offering in-silico predicted sequences of NRPS domains producing a particular NRP would be the implementation of a cloning procedure based on Gibson assembly for the NRPSDesigner. This should make NRPS more accessible to the synthetic biology community. One of the most popular tools for computer-aided primer design is Gibthon. Created as a web app by Bill Collins of the Cambridge iGEM Team of 2010 Gibthon suggests primers for the assembly of predefined DNA fragments (entered by the user or imported from the Parts Registry) using Gibson cloning. Since Gibthon was such a successful iGEM software project and also written in Django, we decided to use it as our tool of choice for automated primer design.
Gibthon was integrated into the core NRPSDesigner GUI using a modular interface. Particular care was taken to keep Gibthon and the rest of the NRPSDesigner clearly separated, to enable the use of other primer design software such as J5 in the future. The strategy applied for the integration is the following: For each of the domains returned by the in-silico prediction, the DNA sequence is extracted from the NRPSDesigner database. The resulting sequences are returned in a ring structure, which ensures a minimal number of Gibson fragments to be assembled. These sequences, together with metadata, such as references and descriptions, are appropriately converted to the Gibthon database format and then copied into the Gibthon gene fragment table. In return the user has access to the standard Gibthon interface to get an overview over the suggested primers. Similar to Gibthon the NRPSDesigner is tightly linked with the registry of standard biological parts. The user can add his parts of choice, using the automated parts registy import tool. However, some additional restrictions have been placed in order to ensure the integrity of the designed NRPS sequence: The user cannot enter a new fragment/part in between one of the NRPS domains; instead he can only place it after the Thioesterase and before the initiation adenylation domain.
To easily extend the NRPSDesigner database this domain recognition pipeline was integrated into a user-friendly interface. The user can add an additional description for each domain, possibly change the specificity of the adenylation domains if it has not been predicted correctly and also define his own domain boundaries, with the help of an integrated multiple sequence alignment against the domains of the same type already present in the database.
- blabla
- balbla