Team:TU-Munich/Modeling/Protein Predictions

From 2013.igem.org

Revision as of 21:08, 4 October 2013 by AndiB (Talk | contribs)


Prediction of Protein Structures and Functions

Structural properties of effector proteins are often important for their function, so it is advantageous to know about it. It is for example necessary to know whether termini are accessible for protein fusion or whether the protein is functional in a multimeric fold. For this reason a structure based search was performed in the [http://www.rcsb.org/pdb/home/home.do protein data bank]. As the number of solved structures is still limited, it is a promising attempt to look for homologous proteins where crystal structures have been solved.

Searching for Homologous Structures using HHpred

The search for homologous structures was performed by using the free accessible web server HHpred http://www.ncbi.nlm.nih.gov/pubmed/15980461 Söding et al., 2005. The protein sequences for the BioBricks were translated into amino acid sequences using the AutoAnnotator and was then inserted into the the search field. The results for all proteins investigated in our project are shown in table 1.

Table 1: Predicted Structures
Protein BioBrick PDB-code Identity Similarity Structure
XylE <partinfo>BBa_E0040</partinfo> [http://www.rcsb.org/pdb/explore.do?structureId=3hpy 3hpy_A] 50% 0.939 TUM13 small XylE.png
Laccase <partinfo>BBa_K863000</partinfo> [http://www.rcsb.org/pdb/explore.do?structureId=2wsd 2wsd_A] 68% 1.223 TUM13 small Laccase.png
NanoLuc <partinfo>BBa_K1159001</partinfo> [http://www.rcsb.org/pdb/explore.do?structureId=3ppt 3ppt_A] 21% 0.359 TUM13 small NanoLuc.png
EreB <partinfo>BBa_K1159000</partinfo> [http://www.rcsb.org/pdb/explore.do?structureId=3b55 3b55_A] 19% 0.318 TUM13 small EreB.png
Spycatcher <partinfo>BBa_K1159200</partinfo> [http://www.rcsb.org/pdb/explore.do?structureId=2x5p 2x5p_A] 97% 1.298 TUM13 small SpyCatcher.png
PP1 <partinfo>BBa_K1159004</partinfo> [http://www.rcsb.org/pdb/explore.do?structureId=3e7a 3e7a_A] 96% 1.593 TUM13 Physco-lifecycle.png
GFP <partinfo>BBa_K1159311</partinfo> [http://www.rcsb.org/pdb/explore.do?structureId=2WUR 2WUR] 98% 1.477 TUM13 small GFP.png
Glutathiontransferase / DDT Dehydrochlorinase <partinfo>BBa_K620000</partinfo> [http://www.rcsb.org/pdb/explore.do?structureId=3F6D 3F6D] 68% 1.155 TUM13 small GST.png
SERK-TM <partinfo>BBa_E0040</partinfo> [http://www.rcsb.org/pdb/explore.do?structureId=2ks1 2ks1_B] 24% 0.233 TUM13 Physco-lifecycle.png
TEV Protease Commercial reagent [http://www.rcsb.org/pdb/explore.do?structureId=1Q31 1Q31] n.d. n.d. TUM13 Physco-lifecycle.png
Streptavidin Commercial reagent [http://www.rcsb.org/pdb/explore.do?structureId=3RY2 3RY2] n.d. n.d. TUM13 Physco-lifecycle.png

Results

The homology search showed that some effector proteins have some very close related proteins with a solved structure in comparison to others where no structure of a related protein has been solved so far. For example for the SypCatcher, PP1 and GFP very similar protein structures are availible which show and identitiy of above 90%. Some other effector proteins such as XylE, the laccase or the DDT Dehydrochlorinase have related protein wherefor the structure still gives good hints for structural questions on the effector proteins. For some other effector proteins there are only structures solved that show a very weak identity with our protein of interst wherefor just the rough fold can be expected to be in agreement for these proteins. Examples for such structurally unknown proteins are the NanoLuc which has been identified from shript this year and were not solved so far. Other expampls for structurally not determined proteins are the erythromycin esterase (EreB) and the transmembrane domain of the SERK receptor.
The structures obtained in this part were used for planing of of experiments, a homology modelling for the laccase was performed to investigate the probability of this BioBrick to contain disulphid bridges and most importantly the homologous structures were used as illustrations as it is shown in one of our How-Tos on annimated Gifs.

Analysis of Receptor Sequences – Choosing the right template

For several purposes in our project we needed a synthetic receptor that enables us to express protein-domains at the cellular or extracellular side of the cell membrane. As a template we investigated several different plant-receptors form the well understood dicotyledon Arabidopsis thaliana and the moss we are currently using as our chassis Physcomitrella patens. The A. taliana-receptors have the advantage that their transgenic expression has successfully been demonstrated (Ref.) whereas the P. patens-receptors bear less risk that they do not function in the evolutionary distant moss (Ref).
As there were many different availible receptors that we could use as a template for our synthetic receptor we applied bioinformatical methods to evaluate the suitability of thes receptors. From this work the three examples ERF, FLS2 and SERK are shown (see Table 2).

Table 2: Examined Receptors
Receptor Organism Length (aa) Sequence reference Literature reference
ERF A. thaliana 1031 [http://www.ncbi.nlm.nih.gov/protein/NP_197548.1 NP_197548.1]
FLS2 A. thaliana 1173 [http://www.ncbi.nlm.nih.gov/protein/NP_199445.1 NP_199445.1]
SERK P. patens 625 [http://www.ncbi.nlm.nih.gov/protein/XP_001759122.1 XP_001759122.1] [http://www.freidok.uni-freiburg.de/volltexte/5390/pdf/Lienhart_Dissertation_2008.pdf Lienhart, 2007]

Prediction of Signal Peptides

Figure 3:

Introduction: The first analysis were performed to identify the signal-peptide that is bound by the cellular signal recognition particle and lead to the translocation of the polypeptide into the ER. The signal peptide becomes afterwards cleaved by a signal peptidase at a distict site. The analysis of the signal peptides was carried out using the [http://www.cbs.dtu.dk/services/SignalP SignalP 4.1 Server].
Results: The prediction of the signal peptides was carried out for different receptors and will be illustrated for the three mentioned examples for which a signal peptide could be identified (see fig. 3).
The figure shows the N-terminal sequence of the receptors together with three scores: (1) the C-score (raw cleavage site score) in red, (2) the S-score (signal peptide score) in green and (3) the Y-score (combined cleavage site score) in blue.
The C-score shows the most probable cleavage site for the signal peptidase that could be identified for all shown receptors with a unclear result with two possible cleavage sites for the SERK-receptor. The amino acid with the highest C-score is according to the algorithm predicted to be the first amino acid of the cleaved receptor. The S-score was developed to identify amino acid sequences the lie in a polypeptide and others that do belong to the matured receptor. The course of this parameter is high for the first 23-28 amino acids for all receptors identifing these residues as signal peptides and decreases to low values quickly. The amino acid residue that lies at the greatest fall is the predicted border between the n-terminal signal peptide and the receptor. The Y-score as the third parameters represents the geometrical of the two previous parameters. It illustrates that the two first parameters show a good agreement for the identification of the signal peptide in all three illustrated receptors.
Discussion: It can be concluded that all three three depicted receptors seem to contain a sequence that functions as a signalpeptide. For many of the predicted receptors in the genome of P. patens this prediction did not yield a positive result. Concerning the signal peptide all mentioned receptors would be suitable as a template for our synthetic receptor. Although the SERK-receptor is favourable after this analysis as it's signal peptide is most probably recognized by the cellular machinery in P. patens and beers the smallest risk of failure.

Prediction of Transmembrane Regions

Figure 4:

Introduction: Beside the identification of the signal peptide it was important to identify trans-membrane regions within the receptors as we wanted to use a type I receptor as a template that contains a N-terminal extracellular domain, a trans-domain region and a C-terminal intracellular domain (see Localisation page. For this analysis the prediction tool [http://www.cbs.dtu.dk/services/TMHMM TMHMM] was applied for several different receptors and is again depicted for the reseptors ERK, FLS2 and SERK.
Results: The analysis yielded a singal peptide and a single trans membrane domain for all the depicted receptors (see fig. 4). The prediction for the transmembrane region was equally clear for all examined receptors, whereas the signalpeptide was most clearly predicted for the SERK receptor.
Discussion:
From the membrane topology point of view all the investigated receptors would be good blue prints for our synthetic receptor. As the SERK-Receptor yields the clearest prediction this is the favourite template, even more as it is derived from Physcomitrella patens. The only problem concerning this prediction is that the N-terminal portion of this receptor is predicted to be extracellularely. The falsification of this prediction was simple as the SERK receptor contains a c-terminal kinase domain which is known to be involved in signal transduction ans has to be located intracellularely to fulfill its purpose.

Choice of the SERK Receptor

Finally it was decided to use the SERK receptor as a template for the generation of our synthetic receptor. The final receptor was designed in RFC[25] which allows for in frame protein fusions. The final constructs were designed in a way that they contained the SERK signal peptide (<partinfo>BBa_K1159303</partinfo>), an extracellularely located effector protein, the transmembranedomain of the SERK receptor (<partinfo>BBa_K1159305</partinfo>), a short linker and a GFP to investigate the cellular localisatzion of our receptor using fluorescense microscopy. For the signal peptide

<partinfo>BBa_K1159303</partinfo>

References:

http://www.ncbi.nlm.nih.gov/pubmed/15980461 Söding et al., 2005 Söding J, Biegert A, Lupas AN. (2005). The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W244-8. http://www.freidok.uni-freiburg.de/volltexte/5390/pdf/Lienhart_Dissertation_2008.pdf Lienhart, 2007 Lienhart Otmar. Untersuchungen zu einem Somatic-Embryogenesis-Receptor-like-Kinase-Homolog in Physcomitrella patens (Hedw.) B.S.G. PhD-thesis at Freiburg University