Team:TU-Delft/NovelPeptides
From 2013.igem.org
Novel Peptides
The antimicrobial peptide(AMPs) field is growing rapidly in response to the demand for novel antimicrobial agents. In particular AMPs are promising candidates in the fight against antibiotic-resistant pathogents due to their low toxicity, and broad range of activity. Antimicrobial peptides are generally between 12 and 50 amino acids long. These peptides include two or more positively charged residues provided by arginine, lysine or, in acidic environments, histidine, and a large proportion of hydrophobic residues.
Due to the fact that AMPs constitute a current research area, both the knowledge and the experimentally validated data are rapidly increasing.It was decided to use these data in order to create novel peptides which will be high toxic for S.aureus but low toxic for E.coli. The method that was developed is described in the following sections.
Data and Feature extraction
The necessary data were acquired from the CAMP: Collection of Anti-Microbial Peptides Database. The database contains 3789 records with MIC values but only the records that target both E.coli and S.aureus(and are experimentally validated) were taken into account. The acquired records were seperated into 4 classes based on the MIC values:
- class 0: Toxic for both S.aureus and E.coli
- class 1: Toxic for S.aureus but not for E.coli
- class 2: Toxic for E.coli but not for S.aureus
- class 3: Non Toxic for both E.coli and S.aureus
The next step is related to the feature extraction for each one of the collected peptides.The resulting number of features per sequence is 21[1][2][3].In particular, the attributes for each peptide are either general such as the length of the sequence or specific based on AMPs properties.A list of them is presented underneath:
- length
- charge
- prolines' frequency
- glycines' frequency
- hydrophobic residues appearance
- hydropathy
- C terminus
- N terminus
- polarity
The N and C terminus were examined only for 3 positions due to the different size of each peptide.
Rule Learning
After creating the final data set, a machine learning toolkit, WEKA, was used. In particular, WEKA contains a collection of machine learning algorithms for data mining tasks. In our case, it was decided to use nnge algorithm in order to perform association rule mining[4].
By the term association rule mining, a method for discovering interesting relations between variables in data sets is described.In that way, it is possible to discover rules that represent the class of interest and create our novel peptides!Some of the rules identified can be seen in Figure 1.
Based on the rules discovered, one can conclude for example that a peptide belongs in class 1 if it has a charge of 1-4 or 10 and hydropathy with a minimum value of -0.37 and a maximum value of 1.82(dependent on the peptide length). Moreover the first 3 amino acids are either FLP or GLL and the 3 last amino acids are RLL, GLL or FGL. The amino acid sequence in between N and C terminus has to be composed by 0-2 prolines, 0-4 or 7 prolines and different frequencies for specific hydrophobic residues. Last but not least, the rest of the amino acids can be included in the sequence between the N and C terminus as their appearance is of no importance for AMPs peptides but the peptide properties must still be satisfied(like hydropathy for example).
Model Evaluation
In order to evaluate the performance of our model, we are interested in investigating the ability of the model to correctly predict or separate the classes. For that reason, the measurements accuracy, precision , recall and F-measure are computed. A brief explanation for each measurement is presented below.
- Accuracy: the overall correctness of the model
- Precision:percent of positive predictions which are correct
- Recall:true positive rate (percent of positive cases that you can catch)
- F-measure:a measure that combines precision and recall
In our case, we succeeded in the aforementioned results:
Accuracy: 94.4149 %
Detailed Accuracy by class
Class | Precision | Recall | F-measure | |
---|---|---|---|---|
1 | 0.955 | 0.986 | 0.97 | |
2 | 0.917 | 0.611 | 0.733 | |
3 | 0.963 | 0.867 | 0.912 | |
4 | 0.737 | 0.875 | 0.8 | |
Weighted Avg. | 0.945 | 0.944 | 0.942 |
Final Created Peptides
The rules that generated are taken into consideration in order to create our final peptides.First of all it was decided to create peptides which are 13 amino acids long in order to avoid post translation modification. The next step was to set the amino acids for the N and C terminus because it was proven to be of great importance for the the toxicity and selectivity of the peptides. We also set the number of prolines, glycines and specific hydrophobic amino acids to satisfy the rules due to the fact that the amino acid composition of these specific amino acids proved to be of great importance for the AMPs. The rest of the amino acids were chosen so as to satisfy the remaining rules. It is also necessary to be mentioned that we designed our peptides by taking into consideration their hydrophobic nature. We tried to design them in a way that they will both satisfy the rules and they will not be highly hydrophobic. In that way we ensured that the peptides will not be toxic for humans as the toxicity to humans is directly related and influenced by the peptide's hydrophobic mature.
Finally it was also significant to ensure that the synthesized peptide would have a high probability of working. For that reason after synthesizing the peptides we also checked the aforementioned criteria.
The amino acid sequences for each peptide and their properties are depicted underneath.
- Peptidor : GFGLCKNKAFGLL
Figure 2: Peptidor properties Figure 3: Peptidor amino acid composition
The Peptidor peptide was also proven to have similarity with the MIRJA antimicrobial peptide(E- Value 6.5). The specific peptide do not target E.coli but it targets Gram positive bacteria.
We also run SVM classifier in CAMP database for predicting the antimicrobial nature of the peptide.Sequence Id Class Probability Unknown AMP 0.961
- Derpini: FLPILGVARKGLL
Figure 4: Derpini properties Figure 5: Derpini amino acid composition
The Derpini peptide was proven to have similarity with both Vespid chemotactic peptide 5h and Temporin-1CSb(E-value: 3.6). Temporin is an AMP which has MIC = 128 μM for E.coli and MIC = 8 μM for S.aureus. The other AMP is inactive against E.coli but active against S.aureus.
After running SVM classifier in CAMP the peptide was predicted as antimicrobial.Sequence Id Class Probability Unknown AMP 0.955
- Staphycine: FLPLLASLFSRLL
Figure 6: Staphycine properties Figure 7: Staphycine amino acid composition
Staphycine was proven to have similarity with Temporin-1CSb(E-value: 0.011).
Temporin has MIC = 70 μM for E.Coli and MIC = 2 μM for S.Aureus.Sequence Id Class Probability Unknown AMP 0.862
Our lab people test our synthesized peptides in the lab!!!The Peptidor peptide worked well. Staphycine peptide worked as expected whereas Derpini did not work at all. The MICs of the newly synthesized peptide were determined by lab experiments and are presented in the following figures.
Figure 8: MICs of Peptidor
Figure 9: MICs of Staphicine
For more information, check our lab pages!
Discussion
As observed there is a large set of generated rules and some overlapping rules between the classes. It is highly probable that one peptide failed to work due to this reason. The are limitation to the specific model and this is related not only to the fact that the experimentally validated data set is of small size but also to the fact that the number of samples that belong to the class of interest is limited compared to the other classes. In the future, it is possible to improve the model by performing a better feature selection and/or using different algorithms. However, it is necessary for all the data that are currently available to be experimentally validated and more to be included in the current databases.