Team:Peking/Project/SensorMining
From 2013.igem.org
Biosensor Mining
In order to comprehensively profile aromatics in environment, our toolkit should be equipped with biosensors responding to various aromatic components. Abundant with protein informations, large protein databases, like Uniprot, are ideal gold mines finding new Biobricks. How to reliably mine new Biobricks with object functions, however, has not been studied yet. This year, Peking iGEM team has developed a four-step bioinformatic mining method to screen out feasible and well characterized aromatic-sensing transcriptional regulators from the protein database Uniprot. This method consists of several computer programs to process massive data and a manual adjustment step to further guarantee the reliability of the mining results.
Figure 1. The flow chart of mining aromatic-sensing transcriptional regulators from the database Uniprot. Step 1, narrowing down the scope of proteins into transcription factors (TFs) in specific bacteria species. Step 2, screening out aromatics-related transcription factors. Step 3, the aromatics-related transcription factors with most detailed studies are selected. Step 4, manual adjustment to further evaluate the reliability of the selected transcription factors. Move the mouse cursor to see the detailed explanations on each step of the data mining.
First, we narrowed down the scope of proteins into transcription factors in specific bacteria species. We chose Pseudomonas putida, pseudomonas sp and pseudomonas nitroreducens as our source organisms because they live in aromatics-rich environments and chose E.coli and bacillus subtilis for their clear genetic contexts. We downloaded all 21,096 entries of transcription regulation related proteins of these five bacteria species from the protein data base uniprot.
Second, we screened out aromatics related transcription factors by analyzing the downloaded entries with a computer program. The computer program searched all the entries with a list of keywords (aromatic, benzene, phenol, phenyl, naphthalene, benzoic, benzaldehyde, tolyl, toluene, xylene, styrene) and rated the proteins. Once a keyword appeared in a protein’s entry, the program added one point to its rate. 912 proteins rated more than 0 points remained after this step.
Third, we used another computer program to examine whether the transcription factors remaining after step two were well studied. The computer program excluded unnamed proteins that have open reading frame numbers only. Because proteins that have been characterized in E.coli are more likely to work well in our host species, the computer program then searched the names of the remaining proteins together with the keyword “E coli” in google scholar and added k/10 point to its rate ( k is the number of papers in the result). 60 proteins rated more than 10 points remained after this step.
Finally, we carried out a manual adjustment on the selected 60 proteins to verify their feasibility. Proteins that regulate aromatic degradation pathways without actually responding to aromatic compounds and those originated from two component systems were excluded. Finally, 17 proteins were manually selected at last (Table 1).
Table 1. Proteins selected after manual adjustment
Protein names | Sources | Reported Typical Inducers | Scores |
---|---|---|---|
XylS | Pseudomonas putida (Arthrobacter siderocapsulatus) | Benzoic acid | 259 |
XylR | Pseudomonas putida (Arthrobacter siderocapsulatus) | m-Xylene | 219 |
tyrR | Escherichia coli (strain K12) | tyrosine | 160 |
nahR | Pseudomonas putida (Arthrobacter siderocapsulatus) | Salicylic acid | 106 |
CapR | Pseudomonas putida (Arthrobacter siderocapsulatus) | phenol | 80 |
hcaR | Escherichia coli (strain K12) | 3-Phenyl-propionic acid | 56 |
dmpR | Pseudomonas sp. (strain CF600). | phenol | 43 |
pobR | Pseudomonas putida(Arthrobacter siderocapsulatus) | p-Hydroxybenzoic acid | 29 |
CymR | Pseudomonas putida (Arthrobacter siderocapsulatus) | 4-Isopropyl benzoate | 23 |
Paax | Escherichia coli (strain K12) | phenylacedtyl-CoA | 20 |
hpaR | Pseudomonas putida (Arthrobacter siderocapsulatus) | (3-Hydroxy-phenyl)-acetic acid | 18 |
mhpR | Escherichia coli (strain K12) | (3-Hydroxy-phenyl)-propionic acid | 18 |
phhR | Pseudomonas putida (Arthrobacter siderocapsulatus) | phenylalanine | 16 |
bphS | Pseudomonas sp. (strain CF600). | 2-hydroxy-6-oxo-6-phenylhexa-2,4-dienoic acid | 16 |
HbpR | Pseudomonas nitroreducens | 2-Hydroxybiphenyl | 12 |
phcR | Pseudomonas putida (Arthrobacter siderocapsulatus) | phenol | 11 |
yodB | Bacillus subtilis (strain 168) | 2-methyl hydroquinone | 11 |
Peking iGEM team has successfully screened out a set of feasible aromatic sensors using the four step sieving method. Because of its good transferability and massive data processing ability, we also believe that this method will be useful in other kinds of biobriks mining in this information explosion age.
Figure 2. Sieving conditions and sieving results of each step. Numbers of selected proteins after each step are showing on the left surface of the pyramid. Sieving conditions are showing on the right.