Team:Heidelberg/Project Software

From 2013.igem.org

Revision as of 11:04, 3 October 2013 by Hetitus (Talk | contribs)

NRPSDesigner. Design your own NRP.

Highlights

  • Transfer of the whole delftibactin NRPS pathway from D. acidovorans into E. coli
  • Novel approach for transfering a whole NRPS pathway more than 50 kb in size from one bacterial species into another
  • Optimization of the Gibson Cloning Strategy for the creation of large plasmids (over 30 kb in size) with high GC content
  • Precipitation of pure gold from electronic waste using delftibactin

Abstract

Non-ribosomal peptide (NRP) synthesis is a biochemical process of remarkable hierarchical organization. Vertically it can be described stepwise starting from the coding DNA sequence that is translated into a giant enzyme catalyzing in turn the actual NRP assembly. Horizontally its complexity is established by a modular order of functional proteinogenic units. Due to this systematic composition a bioinformatic approach appears most suitable, if we aim for the automated design of fully synthetic NRPs.
Here, we introduce a comprehensive software tool, the NRPSDesigner, which facilitates the prediction and synthesis of non-ribosomal synthetases (NRPS) that catalyze customized NRP-assembly. The predictive power of the NRPSDesigner is based on a curated database storing information of about 200 NRPS modules, their DNA coding sequences and substrate specificities. It is used to calculate the optimal domain sequence according to the weighted phylogenetic distance between domain origins. Additionally an integrated domain recognition algorithm allows for curated expansion of the database. To accelerate the process from in silco NRPS design towards experimental validation we embedded the Gibthon iGEM software tool of Cambridge 2010 for Gibson primer construction. With this framework we want to suggest a new standard for the fast and accurate computer aided design of customized short peptides.

Introduction

Even though biological processes can be characterized by their physicochemical properties, they can also be translated into an abstract model of interconnected functional entities. This is exemplified by the central dogma of Biology describing the information flow from DNA, to RNA and finally proteins. Synthetic biology has always tried to interfere with these levels of organization with the goal of systematically controlling the projected outcome. (Auch für den oberen Abschnitt wäre eine Referenz gut…) NRPS carry this principal to extremes by adding yet another hierarchical level:
The modular proteinogenic complex sequentially synthesizes its own short non-ribosomal peptide (NRP). These peptides in turn are not limited by the standard set of proteinogenic amino acids; instead D-isoforms and diverse modifications can be utilized (reference). Nature has made great use of this system by creating versatile natural products such as antibiotics, metallophores or dyes. (reference)
Although, little is known about the actual dynamic properties of the synthesis process most of its logical rules are indeed understood: A NRP-synthetase consists of a number of modules, each of which is responsible for adding one amino-acid to the nascent peptide. But even a module can be further sub-divided into domains, each with a distinct functionality.
This hierarchal organization demonstrates the large potential for the synthetic biology community: The exchange or combination of modules and domains from different organisms or different proteins has been repeatedly shown to produce fully functional NRPS (reference). Not surprisingly, several bioinformatic approaches have put great effort to meticulous categorize NRPS and their functionality. For example, databases such as NRPS-PKS Clustermine360 describe the domain organization of diverse NRPS, while the Norine database includes information about non-ribosomal peptides and their sequence of monomers. Also, many tools are capable of predicting domain sequences, substrate specificity and hence the putative product of a particular NRPS (references). Exemplary the NRPS-PKS and PKS-NRPS analysis tool (hereafter referred to as Maryland tool) elaborate in this direction. While antiSMASH also provides similar prediction capabilities its scope is broader and covers many different secondary metabolite pathways.
However, as the understanding of the underlying biological processes and methods for assembly of diverse DNA constructs has improved, many novel software tools aim at the computer aided design (CAD) of DNA sequences (reference). Such tools have been particularly valuable to the iGEM community, as they stimulate the design of more complicated, yet less error-prone biological devices. Two examples originating from the iGEM community are: Clotho, a framework enabling the automated and computer-assisted design of synthetic biology constructs introduced by the Berkeley iGEM team of 2008 (link) and Gibthon created as a web app by the Cambridge iGEM Team of 2010, which suggests primers in order to assemble a set of predefined DNA fragments using Gibson cloning.
Influenced by this development, we introduce here the NRPSDesigner, an integrated CAD software, implemented to facilitate the design of customized synthetic NRPs. In particular, the NRPSDesigner includes the following features: Based on the NRPS-PKS database we built a manually curated database capturing the biological complexity of NRPS while storing information of about 200 NRPS modules their coding sequences and substrate specificity. The database can be easily extended with curated content using automated domain prediction based on Hidden Markov Models. By applying this information, the NRPSDesigner can calculate an optimal sequence of domains based on simple evolutionary assumptions. To accelerate the process of testing this synthetic construct and eventually produce a customized peptide we included additional assisting software to the framework. We offer to incorporate the necessary domains for combining the nascent peptide with an Indigoidine tag (link). Furthermore, embedding of the Gibthon software automates the suggestion of primers necessary for the assembly of the predicted domains by Gibson cloning.

Results

NRPSDesigner database structure

The NRPSDesigner is a knowledge-based software tool using stored information about NRPS pathways to predict the optimal domain sequence that is able to produce a user-defined NRP. The storage organization is of great importance for the functionality of the designer because of its dependence on a comprehensive description of the biological and biochemical properties of NRPSs. For this purpose we built a hierarchical database that comprises three layers of complexity (see Figure XX): i) the DNA level represented by all DNA coding sequences. They are directly linked to ii) their encoded NRPS domain, respectively. Finally iii) our database stores detailed information about the substrate and its potential modification of the corresponding domain.
Next to the tight links between these layers all of them also point at additional database entries that complete the needed information for the design algorithm. For example, a DNA coding sequence is linked not only to its product, the translated domain, but also to its origin (organism, plasmid etc.) Additionally, a coding sequence can also be connected to another coding sequence. This ‘parent’ sequence is a predecessor of the stored sequence that already underwent biosynthetic modification. On the domain level there is an upstream link to the coding sequence but also to the specific type of domain (e.g. thioesterase or the condensation domains). Some Domains, such as adenylation domains, also point at monomers, based on their substrate specificity. Subsequently, for these substrates we store their chirality, modification and if they are proteinogenic or not (e.g. glutamine and ornithine in Figure ?). To enable the NRPSDesigner to use information from outside of the database it is equipped with global identifiers. For organisms we saved the NCBI taxon id, while for BioBricks the unique identifier in the Parts registry. To integrate the content with other databases, we created for every layer a linkout entry that consists of a type and specific identifier. The linkout type includes a description of the corresponding resource, as well as a URL, which in combination with the specific identifier enables the cross-linking of each database entry to other resources. The most common linkout types are Norine and Pubchem IDs for the substrates, PFAM IDs for the domain types and GenBank identifiers for the coding sequences.
For visualization of the NRPS domains and the chemical structures of the substrates by Open Bable we added a JSON representation of each domain, based on the Pfam Graphics library (link) and the SDF (structure data file) format, respectively.

NRPSDesigner database content

The NRPSDesigner is a knowledge-based software tool using stored information about NRPS pathways to predict the optimal domain sequence that is able to produce a user-defined NRP. The storage organization is of great importance for the functionality of the designer because of its dependence on a comprehensive description of the biological and biochemical properties of NRPSs. For this purpose we built a hierarchical database that comprises three layers of complexity (see Figure XX): i) the DNA level represented by all DNA coding sequences. They are directly linked to ii) their encoded NRPS domain, respectively. Finally iii) our database stores detailed information about the substrate and its potential modification of the corresponding domain.
Next to the tight links between these layers all of them also point at additional database entries that complete the needed information for the design algorithm. For example, a DNA coding sequence is linked not only to its product, the translated domain, but also to its origin (organism, plasmid etc.) Additionally, a coding sequence can also be connected to another coding sequence. This ‘parent’ sequence is a predecessor of the stored sequence that already underwent biosynthetic modification. On the domain level there is an upstream link to the coding sequence but also to the specific type of domain (e.g. thioesterase or the condensation domains). Some Domains, such as adenylation domains, also point at monomers, based on their substrate specificity. Subsequently, for these substrates we store their chirality, modification and if they are proteinogenic or not (e.g. glutamine and ornithine in Figure ?). To enable the NRPSDesigner to use information from outside of the database it is equipped with global identifiers. For organisms we saved the NCBI taxon id, while for BioBricks the unique identifier in the Parts registry. To integrate the content with other databases, we created for every layer a linkout entry that consists of a type and specific identifier. The linkout type includes a description of the corresponding resource, as well as a URL, which in combination with the specific identifier enables the cross-linking of each database entry to other resources. The most common linkout types are Norine and Pubchem IDs for the substrates, PFAM IDs for the domain types and GenBank identifiers for the coding sequences.
For visualization of the NRPS domains and the chemical structures of the substrates by Open Bable we added a JSON representation of each domain, based on the Pfam Graphics library (link) and the SDF (structure data file) format, respectively.

  1. blabla
  2. balbla