Team:XMU Software/Project

From 2013.igem.org

Revision as of 09:41, 23 October 2013 by YuezhenChen (Talk | contribs)

PROJECT
Our project includes 2 independent software tools-the brick worker and E' NOTE. The former is a software suit for the evaluation and optimization of biobricks, i.e., promoter, RBS, protein coding sequences and terminator. E' NOTE is a web application serving as an assistant for experiments. Its useful functions such as experiments recording and experimental template customization make experimental process easier and more enjoyable.
RBS-decoder


Abstract

The efficiency of translation in bacteria is greatly influenced by the binding affinity between the ribosome and RBS, which can be measured by RBS strength. Experimental work to determine a RBS sequences can be awfully laborious while our software program can solve this problem easily. RBS-decoder is a software tool for evaluating RBS strength and locating SD sequences. This program uses the same method—PWM to calculate the similarity between the RBS sequences and the position frequency matrix of SD sequences and transform the similarity to the relative strength of a RBS sequences.


Background

Translational efficiency in Escherichia coli is generally determined at the stage of initiation. There are several principal mRNA sequences elements which can affect the kinetics of ternary initiation complex formation (30 S-mRNA-fMet-tRNA): SD sequences and the start codon (ATG). The SD sequences base-pairs with a RNA molecule that forms part of the bacterial ribosome (the 16s rRNA), while the start codon base-pairs with the initiator tRNA which is bound to the ribosome. In addition the SD sequences and the start codon being important, the spacer between them also influences the RBS strength, these two sequencess need to be positioned approximately 6-7 nucleotides apart so they can both make contact with the appropriate parts of the ribosome complex1.


Introduction

How do baterial Ribosome Binding Sites work?

The bacterial ribosome binds to particular sequencess on an mRNA, primarily the SD sequences and the start codon (ATG). The SD sequences base-pairs with an RNA molecule that forms part of the bacterial ribosome (the 16s rRNA), while the start codon base-pairs with the initiator tRNA which is bound to the ribosome. In addition to SD sequences and the start codon being important, these two sequences need to be positioned approximately 6-7 nucleotides apart so they can both make contact with the appropriate parts of the ribosome complex.1

The Shine-Dalgarno sequences

Figure 1 The RBS sequences logo representing the sequencess of 149 RBS from E. coli. The height of each letter represents the frequency of the base at that location. From Tom Schneider, "A Gallery of sequences Logos".

The end of the 16s rRNA that is free to bind with the mRNA includes the sequences 5′–ACCUCC–3′. The complementary sequences, 5′–GGAGGU–3′, named the Shine-Dalgarno sequences, can be found in whole or in part in many bacterial mRNA. Very roughly speaking, ribosome binding sites with purine-rich sequencess (A's and G's close to the Shine-Dalgarno sequences will lead to high rates of translation initiation whereas sequencess that are very different from the Shine-Dalgarno sequences will lead to low or negligible translation rates. You can see how common the sequences is by looking at the RBS sequences logo on the right (where the height of a letter indicates the frequency of the letter at that location).


Algorithms

As we know, the RBS strength is greatly influenced by the SD sequences, where the 16s RNA of the ribosome binds to, so the strength can be determined by the binding free energy between the SD sequences and the 16s RNA. So we designed a program calculating the binding free energy but the results show that the correlation between the free energy and the strength of RBS is rather weak (R2=0.5517). So we decide to find other algorithms for better accuracy.

Inspired by the strength prediction algorithms used in promoter part, in which the similarity to the sigma factors' PWM is interlocking with the binding affinity between the protein and DNA sequences. We obtained the Position Frequency Matrix of SD sequences of E. coli and use the PMW method (illustrated in detail in the promoter part) to calculate the similarity between the RBS sequences and the Position frequency sequences, what is different from the promoter is that, the spacer length between the SD sequences and the startcodon and the start codon itself both act as constraints in locating the SD sequences, which is confined to 3-16 bp and ATG/TTG/GTG. And similar to the prediction of promoter strength, the spacer length between the SD sequence also contributes to the RBS strength, the optimal spacer length is 7 bp, and the spacer score is calculated using the same algorithm applied in the promoter part.2 The weight of the influence of the spacer on the strength isderived from the algorithm to predict the promoter strength, in which the weight of the total MSS and the spacer is 0.29:0.71, and since in promoter the total MSS is the sum of two motifs while the SD sequences is only one motif, the weight between the MSS(SD sequences) and the spacer is 0.29:0.355.

Nucleotide frequencies for the RBS model
1 2 3 4 5
T 0.161 0.050 0.012 0.071 0.115
C 0.077 0.037 0.012 0.025 0.046
A 0.681 0.105 0.105 0.861 0.164
G 0.077 0.808 0.960 0.043 0.659
Figure 2 The RBS nucleotide position frequency matrix.3

Results

We use the RBS sequences listed on the iGEM registry with experimentally determined relative strength,4 and the correlation between the RBS strength predicted by our software and the actual relative strength is strong, with a determination coefficient value 0.8039.

Figure 3 The correlation between actual RBS strength and predicted strength

Future work

Due to scarcity of experimental data, the relative weight of the SD sequences and the spacer length used currently is roughly determined which may undermine the accuracy of RBS strength prediction. For further improvement of our program, we'll try to obtain more reliable experimental data to accurately determine the weight used in our algorithm and hopefully elevate the accuracy of RBS-decoder.

In the next version of RBS-decoder, the secondary structure of the RBS sequences will be shown on the software and we'll also include the other species' SD sequence data in order to predict the RBS strength of a larger range of species.


Reference

[1] Ma, J.; Campbell, A.; Karlin, S., Correlations between Shine-Dalgarno sequencess and gene features such as predicted expression levels and operon structures. Journal of bacteriology 2002, 184 (20), 5733-5745.
[2] Noguchi, H.; Taniguchi, T.; Itoh, T., MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA research 2008, 15 (6), 387-396.
[3] Alexander V. Lukashin, Mark B, GeneMark.hmm: new solutions for gene finding, Nucleic Acids Research, 1998, 1107–11153.
[4] http://parts.igem.org/Ribosome_Binding_Sites/Prokaryotic/Constitutive/Community_Collection.