Revision as of 18:51, 27 September 2013

Biosensor Mining

Method
Result
Source Code

Biosensor Mining

In order to comprehensively profile aromatics in environment, our toolkit should be equipped with a collection of biosensors that senses diverse aromatic components. However, there is no such a comprehensive collection of biosensors available currently. Noting the abundant genomic and proteomic data in databases today, we speculated that large protein databases, like Uniprot, are ideal gold mines finding new Biobricks. This year, Peking iGEM team has developed a four-step bioinformatic mining method to screen out feasible and well-characterized aromatics-sensing transcriptional regulators from the protein database. This method consists of several computer programs to process massive data and a manual adjustment step to further guarantee the reliability of the mining results:

Figure 1. The flow chart of mining aromatic-sensing transcriptional regulators from the database Uniprot. Step 1, narrowing down the scope of proteins into transcription factors (TFs) in specific bacteria species. Step 2, screening out aromatics-related transcription factors. Step 3, the aromatics-related transcription factors with most detailed studies are selected. Step 4, manual adjustment to further evaluate the reliability of the selected transcription factors. Move the mouse cursor to see the detailed explanations of individual steps.

First (Step 1 in Fig. 1), we narrowed down the scope of proteins into transcription factors of specific bacteria species. We chose Pseudomonas putida, pseudomonas sp and pseudomonas nitroreducens as our source organisms because they live in aromatics-rich environments and chose E.coli and bacillus subtilis due to their clear genetic contexts. We downloaded all 21,096 entries of transcription-regulation-related proteins of these five bacteria species from the protein data base uniprot.

Second (Step 2 in Fig. 1), we screened out aromatics-related transcription factors by analyzing the downloaded entries with a computer program. The computer program searched all the entries with a list of keywords (aromatic, benzene, phenol, phenyl, naphthalene, benzoic, benzaldehyde, tolyl, toluene, xylene, styrene) and scored the proteins. Once a keyword appeared in a protein’s entry, the program added one point to its score. 912 proteins with scores higher than 0 remained after this step.

Third (Step 3 in Fig. 1), we used another computer program to examine whether the transcription factors remaining after step two were well studied. The computer program excluded unnamed proteins that have open reading frame numbers only. Because proteins that have been characterized in E.coli are more likely to work well in our expected biosensor circuits (that works in E.coli), the computer program then searched the names of the remaining proteins together with the keyword “E. coli” in google scholar and added k/10 point to its score (k is the number of citations). 60 proteins scored higher than 10 points remained after this step.

Finally (Step 4 in Fig. 1), we carried out a manual adjustment on the 60 proteins to confirm their reliability. Proteins that has no actual ability to sense aromatic compounds and those other possible false positive cases, such as bacterial two-component systems (their performance is highly genetic-context-dependent across different bacterial species), were excluded. Finally, 17 proteins were manually determined at last (Table 1). The entire mining process has been summarized in Fig. 2.

Table 1. Aromatics-sensing transcriptional regulators mined from the Uniprot

Protein names	Sources	Reported Typical Inducers (Click Here for the chemical formula of aromatic compounds)	Scores
XylS	Pseudomonas putida (Arthrobacter siderocapsulatus)	Benzoic acid	259
XylR	Pseudomonas putida (Arthrobacter siderocapsulatus)	m-Xylene	219
tyrR	Escherichia coli (strain K12)	tyrosine	160
nahR	Pseudomonas putida (Arthrobacter siderocapsulatus)	Salicylic acid	106
CapR	Pseudomonas putida (Arthrobacter siderocapsulatus)	phenol	80
hcaR	Escherichia coli (strain K12)	3-Phenyl-propionic acid	56
dmpR	Pseudomonas sp. (strain CF600).	phenol	43
pobR	Pseudomonas putida(Arthrobacter siderocapsulatus)	p-Hydroxybenzoic acid	29
CymR	Pseudomonas putida (Arthrobacter siderocapsulatus)	4-Isopropyl benzoate	23
Paax	Escherichia coli (strain K12)	phenylacedtyl-CoA	20
hpaR	Pseudomonas putida (Arthrobacter siderocapsulatus)	(3-Hydroxy-phenyl)-acetic acid	18
mhpR	Escherichia coli (strain K12)	(3-Hydroxy-phenyl)-propionic acid	18
phhR	Pseudomonas putida (Arthrobacter siderocapsulatus)	phenylalanine	16
bphS	Pseudomonas sp. (strain CF600).	2-hydroxy-6-oxo-6-phenylhexa-2,4-dienoic acid	16
HbpR	Pseudomonas nitroreducens	2-Hydroxybiphenyl	12
phcR	Pseudomonas putida (Arthrobacter siderocapsulatus)	phenol	11
yodB	Bacillus subtilis (strain 168)	2-methyl hydroquinone	11

In summary, using the four-step bioinformatic data mining method. we have successfully screened out a set of aromatics-sensing transcriptional regulators (Fig. 2). These 17 aromatics-sensing regulators are supposed to be reliable and well studied.

We believe that this method may also be applied to mine other types of Biobricks. Moreover, although our data mining method is conventional in bioinformatics field, we deem such a bioinformatics approach to be highly instructive to routine synthetic biology research, for it will greatly reinforce our ability to mine rich collections of high-quality Biobricks from increasingly massive data in an automated manner.

In the following study, we will take these regulators as the core component to build a comprehensive set of biosensor circuits for aromatics detection.

Figure 2. Summary of data mining process and screening criteria. Numbers of remained candidates after each step are shown on the left surface of the pyramid. The screening criteria are shown on the right.

Source Code

Source code for protein sorting: Protein analysis.cpp, Protein heap.cpp, Proteinheap.h.

Source code for internet crawler: New crawler.cpp, Protein heap.cpp, Proteinheap.h, heap_oper.cpp, heap.h.

@@ Line 366: / Line 366: @@
      <h1 id="SourceCodeTitle">Source Code</h1>
-     <p id="SourceCodeProA">Source code for protein analysis: <a href="https://static.igem.org/mediawiki/igem.org/b/b5/Peking2013_Mining_ProteinAnalysis.cpp.txt">Protein analysis.cpp,</a> <a href="https://static.igem.org/mediawiki/igem.org/1/1f/Peking2013_Mining_Protein_heap.cpp.txt">Protein heap.cpp,</a> <a href="https://static.igem.org/mediawiki/igem.org/b/bd/Peking2013_Mining_Proteinheap.h.txt">Proteinheap.h.</a></p>
+     <p id="SourceCodeProA">Source code for protein sorting: <a href="https://static.igem.org/mediawiki/igem.org/b/b5/Peking2013_Mining_ProteinAnalysis.cpp.txt">Protein analysis.cpp,</a> <a href="https://static.igem.org/mediawiki/igem.org/1/1f/Peking2013_Mining_Protein_heap.cpp.txt">Protein heap.cpp,</a> <a href="https://static.igem.org/mediawiki/igem.org/b/bd/Peking2013_Mining_Proteinheap.h.txt">Proteinheap.h.</a></p>
      <p id="SourceCodeCrawl">Source code for internet crawler: <a href="https://static.igem.org/mediawiki/igem.org/d/df/Peking2013_Mining_NewCrawler.cpp.txt">New crawler.cpp,</a> <a href="https://static.igem.org/mediawiki/igem.org/1/1f/Peking2013_Mining_Protein_heap.cpp.txt">Protein heap.cpp,</a> <a href="https://static.igem.org/mediawiki/igem.org/b/bd/Peking2013_Mining_Proteinheap.h.txt">Proteinheap.h,</a>
   <a href="https://static.igem.org/mediawiki/igem.org/7/7e/Peking2013_Mining_Heap_oper.cpp.txt">heap_oper.cpp,</a>

Team:Peking/Project/SensorMining

From 2013.igem.org

Revision as of 18:51, 27 September 2013

Biosensor Mining

Biosensor Mining

Table 1. Aromatics-sensing transcriptional regulators mined from the Uniprot

Source Code