Team:TU-Munich/Modeling/Protein Predictions

From 2013.igem.org

(Difference between revisions)

Revision as of 23:23, 4 October 2013

Prediction of Protein Structures and Functions

Structural properties of effector proteins are often important for their function, so it is advantageous to know about it. It is for example necessary to know whether termini are accessible for protein fusion or whether the protein is functional in a multimeric fold. For this reason a structure based search was performed in the protein data bank. As the number of solved structures is still limited, it is a promising attempt to look for homologous proteins where crystal structures have been solved.

Searching for Homologous Structures using HHpred

The search for homologous structures was performed by using the free accessible web server HHpred [Söding et al., 2005]. The protein sequences for the BioBricks were translated into amino acid sequences using the AutoAnnotator and was then inserted into the the search field. The results for all proteins investigated in our project are shown in table 1.

Table 1: Predicted Structures
Protein	BioBrick	PDB-code	Identity	Similarity
XylE	BBa_K147002	3hpy_A	50%	0.939
Laccase	BBa_K1159002	2wsd_A	68%	1.223
NanoLuc	BBa_K1159001	3ppt_A	21%	0.359
EreB	BBa_K1159000	3b55_A	19%	0.318
Spycatcher	BBa_K1159200	2x5p_A	97%	1.298
PP1	Part:BBa_K1159004	3e7a_A	96%	1.593
GFP	BBa_K1159311	2WUR	98%	1.477
Glutathiontransferase / DDT Dehydrochlorinase	<partinfo>BBa_K620000</partinfo>	3F6D	68%	1.155
SERK-TM	BBa_K1159305	2ks1_B	24%	0.233
TEV Protease	Commercial reagent	1Q31	n.d.	n.d.
Streptavidin	Commercial reagent	3RY2	n.d.	n.d.

Results

The homology search showed that some effector proteins have some very close related proteins with a solved structure in comparison to others where no structure of a related protein has been solved so far. For example there are very similar protein structures available for the SypCatcher, PP1 and GFP which show a similar identiy of above 90%. Some other effector proteins such as XylE, Laccase or the DDT Dehydrochlorinase have related protein wherefor the structure still gives good hints to solve occurrent structural questions. For some other effector proteins there are only structures solved that show very weak identity with our proteins of interest, wherefore just the rough fold can be expected. Examples for a structurally unknown protein is the NanoLuc, which is a highly engineered protein which derives from shrimps and was just released this year. Its structure has not been solved so far. Other examples for structurally unknown proteins are the Erythromycin Esterase (EreB) and the transmembrane domain of the SERK receptor.
The structures obtained here were used to design our experiments. A homology modeling for the Laccase was performed to calculate the probability containing disulphide bridges. Further on the resulting homologous structures were used as illustrations as it is shown in one of our How-Tos about animated Gifs.

Analysis of Receptor Sequences – Choosing the right template

For several purposes of our project, we needed a synthetic receptor which enables us to express protein-domains at the intracellular or extracellular side of the cell membrane. We investigated several different plant-receptors from the well characterized dicotyledon Arabidopsis thaliana and the moss Physcomitrella patens we are currently using as our chassis. The receptors from Arabidopsis thaliana have the advantage that their transgenic expression has successfully been demonstrated (Ref.) whereas the receptors from Physcomitrella patens bear less risk that they do not work in the evolutionary far distant moss (Ref).
Due to the fact that there were many different available receptors, which we could use as a template for our synthetic receptor, we used bioinformatical methods to evaluate the suitability of these receptors. The following three examples ERF, FLS2 and SERK shown in table 2 resulted from this equation.

Table 2: Examined Receptors
Receptor	Organism	Length (aa)	Sequence reference	Literature reference
ERF	A. thaliana	1031	NP_197548.1
FLS2	A. thaliana	1173	NP_199445.1
SERK	P. patens	625	XP_001759122.1	Lienhart, 2007

Prediction of Signal Peptides

Figure 3:

Introduction
The first analysis was performed to identify a signal-peptide, which is bound by the cellular signal recognition particle and leads to the translocation of the bound polypeptide into the endoplasmic reticulum. The signal peptide afterwards gets cleaved by a signal peptide peptidase at a distinct site. The analysis of the cut-off signal peptide was carried out by using the SignalP 4.1 Server.

Results
The prediction of the signal peptide was realized for different receptors and will be illustrated for the three examples mentioned above (see fig. 3).
The figure shows the N-terminal sequence of the receptors, together with three scores:
(1) The C-Score (raw cleavage site score) in red.
(2) The S-Score (signal peptide score) in green.
(3) The Y-Score (combined cleavage site score) in blue.

The C-Score shows the most probable cleavage site, the signal peptidase is identifying. It was possible to identify the most probable cleavage site for all shown receptors with ambiguous cleavage sites for the SERK-receptor. The amino acid with the highest C-score is, according to the algorithm, predicted to be the first amino acid of the primary structure of the cleaved receptor.
The S-Score was developed to identify amino acid sequences which appear in a polypeptide and others that belong to the matured receptor. The course of this parameter is high for the first 23-28 amino acids of all receptors, identifying these residues as signal peptides. The amino acid residue, which lies at the greatest decrease of the S-Score, is the predicted border between the N-terminal signal peptide and the receptor.
The Y-Score results from the geometrical structure of the protein and the predetermined first Scoring parameters. It illustrates that the two first parameters show a good fit for the identification of the signal peptide in all three illustrated receptors.

Discussion
Summarizing these parameters, it can be concluded that all three pictured receptors seem to contain a sequence that works as a signal peptide. For many of the predicted receptors in the genome of Physcomitrella patens the prediction did not yield a positive result. Referring to the signal peptide, all mentioned receptors would be suitable as a template for our synthetic receptor. The predicted data show that the SERK-Receptor is favorable for our application, because it's signal peptide is statistically seen the most recognized one and bears the smallest risk of failure.

Prediction of Transmembrane Regions

Figure 4:

Introduction
Beside to the identification of the signal peptide, it was very important to identify transmembrane regions within the receptors, because we wanted to use a type I receptor as a template that contains a N-terminal extracellular domain, a Transmembrane-domain region and a C-terminal intracellular domain (see Localization page. To analyze this issue, the prediction tool TMHMM was used for several different receptors. Again the most suitable receptors have been ERK, FLS2 and SERK.

Results
The analysis yields a signal peptide and a single transmembrane domain for all the depicted receptors (see fig. 4). The estimated reliability of the prediction of the transmembrane region was equally good for all examined receptors, whereas the signalpeptide was predicted best, for the SERK receptor.

Discussion
Focussing the membrane topology point of view, all the investigated receptors would be suitable blue prints for our synthetic receptor. As the SERK-Receptor yields the best prediction, it was elected as the favorable template. Another reason to elect the SERK-Receptor was that it is derived from Physcomitrella patens. The only problem, concerning this prediction, is that the N-terminal position of this receptor is predicted to be orientated extracellularly. The falsification of this prediction was simple, as the SERK receptor contains a C-terminal kinase-domain, which is known to be involved in signal transduction.

Choice of the SERK Receptor

Finally we decided to use the SERK receptor as a template to generate our synthetic receptor. The final receptor was designed in RFC[25] standard, which allows in frame protein fusions. The final constructs were designed containing the SERK signal peptide (BBa_K1159303), an extracellularely located effector protein, the transmembrane domain of the SERK receptor (BBa_K1159305), a short linker and a GFP, to investigate the cellular localization of our receptor with the aid of fluorescense microscopy.

References:

[Söding et al., 2005] Söding J, Biegert A, Lupas AN. (2005). The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W244-8. [Lienhart, 2007] Lienhart Otmar. Untersuchungen zu einem Somatic-Embryogenesis-Receptor-like-Kinase-Homolog in Physcomitrella patens (Hedw.) B.S.G. PhD-thesis at Freiburg University

@@ Line 60: / Line 60: @@
 |96%
 |1.593
-|[[File:Blanko.png|85px]]
+|[[File:Blanko2.png|85px]]
 |-
 |GFP
@@ Line 81: / Line 81: @@
 |24%
 |0.233
-|[[File:Blanko.png|85px]]
+|[[File:Blanko2.png|85px]]
 |-
 |TEV Protease
@@ Line 88: / Line 88: @@
 |n.d.
 |n.d.
-|[[File:Blanko.png|85px]]
+|[[File:Blanko2.png|85px]]
 |-
 |Streptavidin
@@ Line 95: / Line 95: @@
 |n.d.
 |n.d.
-|[[File:Blanko.png|85px]]
+|[[File:Blanko2.png|85px]]
 |-
 |}