Team:TU-Munich/Results/Software

From 2013.igem.org

Revision as of 16:50, 9 August 2013 by ChristopherW (Talk | contribs)


The AutoAnnotator

Introduction to the Idea behind our AutoAnnotator

Figure 1:

The Parts Registry contains a wide range of interesting protein coding BioBricks, but there is no standardized way of presenting basic information about them. This is a real pity, because after the identification of the open reading frame a multitude of parameters of the protein can be computed automatically, e.g. its molecular mass, theoretical pI or codon quality for different organisms. We have developed a tool which is able to identify the open reading frame of a BioBrick, analyze the sequence and the encoded protein and export the results in a format that can easily be integrated into the part description (and the team wikis) as a single table.
This enables users to see basic information about the BioBrick at a quick glance in a standardized table, saving time, facilitating the comparison of BioBricks and improving the annotation of the parts. The AutoAnnotator can also be used for planning new Bricks, by quickly computing the relevant parameters in a single place rather than having to go to several different websites and gather the information together.
Try it out: The AutoAnnotator!

Overview

The AutoAnnotator is a web-based tool compiling information about encoded proteins from the DNA sequence. It performs the following steps:

  1. Input: When entering a BioBrick number, the AutoAnnotator imports the nucleotide sequence directly from the Registry data base.
    Alternatively a nucleotide sequence can be entered directly. This has to be used for new BioBricks, which aren't in the Registry yet, but can also be helpful for planning new BioBricks.
  2. Finding the Open Reading Frame: In order to determine the Open Reading Frame (ORF) the algorithm first tries to determine what BioBrick assembly standard the BioBrick is in. If necessary (e.g. for an [http://parts.igem.org/Assembly_standard_25 RFC25] Brick), nucleotides are added to the sequence. Then the ORF is determined by taking the first start codon and the first matching in-frame stop codon.
  3. Computation of Parameters: From the nucleotide sequence the codon usage for different organisms, i.e. whether the preferred codons are used or not (which contributes to the level of gene expression), is computed directly. Then after translating the DNA sequence into its amino acid sequence, several parameters of the encoded protein are determined, namely: the amino acid composition, the number of charged amino acids, the atomic composition, the molecular mass, the isoelectric point (pI) and the extinction coefficient of the protein. For more information on each of these see below. Additionally the sequence is also compared to a list of sequence features, such as binding sites or cleavage sites.
  4. Presentation of the Computed Data: The data is then put together into a concise, structured HTML table and displayed to the user. Additionally the code producing the table is displayed underneath it and so by a single copy&paste the table can be integrated into any wiki, part description or other website.

Import of BioBrick Sequences

Upon entering a BioBrick number the AutoAnnotator uses the [http://parts.igem.org/DAS_-_Distributed_Annotation_System Registry DAS] interface to load the nucleotide sequence from the data base of the Registry. To allow this cross-domain information request, which is blocked by most browsers for security reasons, an [http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/ extension] to the .ajax() method in jQuery written by James Padolsey was used. This uses the [http://developer.yahoo.com/yql/ YQL] (Yahoo! Query Language), which is a service by Yahoo!, to redirect the request via their servers, in this way solving the security issues and allowing the Annotator to read the information from the Registry.

Determination of the Open Reading Frame

The first step is to work out the Assembly Standard of the BioBrick, since parts of the coding sequence may be in the pre- or suffix. As of version 1.0 the most common standards [http://parts.igem.org/Assembly_standard_10 RFC 10] and [http://parts.igem.org/Assembly_standard_25 RFC 25] are supported. Then the first start codon ATG is used and the first corresponding in-frame stop codon determined. These are taken to be the open reading frame.

Recognising Sequence Features

There are several useful building blocks, which are frequently integrated into BioBricks, such as different tags for analytical purposes or cleavage and docking sites for protein interaction. We have put together a list of such common sequence features and the AutoAnnotator automatically looks for these, lists the appearing features and marks them in the amino sequence. For the currently supported features please see the Feature List. If you have any suggestions for other interesting features, please get in touch and we will add them.

Computation of Parameters

Amino acid counting, atomic composition and molecular weight

The amino acid counting section is straight forward from the amino acid sequence. Then with the amino acid composition the atomic composition can easily be calculated by using the atomic composition of each amino acid (e.g. given [http://www.matrixscience.com/help/aa_help.html here]) and adding a water molecule (for the ends). Similarly the molecular weight is obtained by adding the individual weights (using the [http://web.expasy.org/findmod/findmod_masses.html#AA average isotopic masses]) and again adding the molecular weight of a water molecule.

Theoretical pI

The theoretical pI is the isoelectric point of the protein ignoring effects due to folding, which can't be computed properly. By definition the isoelectric point of a protein is the pH-value where the overall charge of the protein is zero, so we need to relate the pH value to the charges of the amino acids. For acid groups HA this is done by the Henderson-Hasselbalch equation, where pKa is the negative logarithm of the acid dissociation constant:
TUM13 HendersonHasselbalch.png

We can rearrange this to get the fraction of molecules, which are deprotonised and so negatively charged:

TUM13 HendersonHasselbalch negative.png

Analogously by regarding HB+ as an acid, where B is a base, we can obtain the fraction of positively charged molecules:

TUM13 HendersonHasselbalch positive.png

These fractions can also be regarded as a "fractional charge", because they give the average charge over all molecules of this type. So by adding up the fractional charge of each amino acid (those with non-basic and non-acidic residues contribute no charge) and those for the N- and C-terminal groups we can determine the charge of the protein at a specific pH. The dissociation constants were taken from http://www.ncbi.nlm.nih.gov/pubmed/8125050 Bjellqvist et al., 1993 & http://www.ncbi.nlm.nih.gov/pubmed/8055880 Bjellqvist et al., 1994, which are also those used by the [http://web.expasy.org/protparam/ ExPASy ProtParam Tool] and are also shown in the table below:

Positively charged groups
Group pKa
Lysine residue 10.00
Arginine residue 12.00
Histidine 5.98
N-terminal -NH2 (unless specified otherwise) 7.50
N-terminal -NH2 on Alanine 7.59
N-terminal -NH2 on Methionine 7.00
N-terminal -NH2 on Serine 6.93
N-terminal -NH2 on Proline 8.36
N-terminal -NH2 on Threonine 6.82
N-terminal -NH2 on Valine 7.44
N-terminal -NH2 on Glutamic acid 7.70
Negatively charged groups
Group pKa
C-terminal -COOH 3.55
Aspartic acid residue 4.05
Glutamic acid residue 4.45
Cysteine residue 9.00
Tyrosine residue 10.00

Now all that remains to be done is to find the pH such that the total charge is zero. This is most easily done by the bisection method: Start with pH=7.0 and determine the charge there. If it is positive, we know the pI must be greater than 7.0 and so we only consider that interval. If it is negative, we continue with the lower half of the pH range. In the subinterval we again evaluate the charge at its middle and choose a subinterval accordingly. By repeating this algorithm we halve the remaining range of pH values on every recursion and can determine the theoretical isoelectric point upto our required precision by continuing until the remaining range is smaller than that precision. However it has to be noted, that the pKa values are only estimations, which depend on the experimental procedure (so you will find many different values in the literature), and that modifications to the protein and the formation of disulfide bridges affect the isoelectric point significantly, so it doesn't make sense to choose a precision of less than 0.01.

Extinction coefficient at 280 nm

The calculation of the extinction coefficient of a protein at 280 nm from its amino acid composition is straight-forward http://www.ncbi.nlm.nih.gov/pubmed/2610349 Gill and von Hippel, 1989. The only residues absorbing at this wavelength are those of Tyrosine, Tryptophan and Cystine (which consists of two Cysteines forming a disulfide bridge). Then the extinction coefficient is given by

TUM13 extinction coeff formula.png

where Numb(amino acid) is the number of appearances of that amino acid in the protein. Since the number of formed disulfide bridges is impossible to calculate, two values are calculated: One under the assumption that all Cysteines are reduced, i.e. that there are no disulfide bridges, the other assuming that every Cysteine is oxidized and hence part of a disulfide bridge.

Codon usage (CAI)

Export of the Computed Parameters

Export!


How to use the BioBrick-AutoAnnotator

Text

Programming of the BioBrick-Autoannotator

The Annotator is a JavaScript program using the jQuery library and an [http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/ extension] to the .ajax() method in jQuery written by James Padolsey (also see#Import of BioBrick Sequences above).

and creates a table in HTML-code.
Source code of the BioBrick-Autoannotator

source code



Application of our Software-tool

Annotation by TU-Munich 2013 Team

Annotation by other Teams

References:

http://www.ncbi.nlm.nih.gov/pubmed/6327079 Edens et al., 1984 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC340524/ Sharp and Li, 1987 http://www.ncbi.nlm.nih.gov/pubmed/8125050 Bjellqvist et al., 1993 http://www.ncbi.nlm.nih.gov/pubmed/8055880 Bjellqvist et al., 1994 http://www.ncbi.nlm.nih.gov/pubmed/2610349 Gill and von Hippel, 1989

  1. http://www.ncbi.nlm.nih.gov/pubmed/8125050 Bjellqvist et al., 1993 Bjellqvist, B., Hughes, G.J., Pasquali, Ch., Paquet, N., Ravier, F., Sanchez, J.-Ch., Frutiger, S. and Hochstrasser, D.F. (1993). The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis, 14:1023-1031.
  2. http://www.ncbi.nlm.nih.gov/pubmed/8055880 Bjellqvist et al., 1994 Bjellqvist, B., Basse, B., Olsen, E. and Celis, J.E. (1994). Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis, 15:529-539.
  3. http://www.ncbi.nlm.nih.gov/pubmed/2610349 Gill and von Hippel, 1989 Gill, S.C. and von Hippel, P.H. (1989). Calculation of protein extinction coefficients from amino acid sequence data. Anal. Biochem., 182:319-326.
  4. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC340524/ Sharp and Li, 1987 Sharp, P.M. and Li, W.H. (1987). The Codon Adaptation Index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15(3):1281–95.





$(document).ready(function(){


// put the footer in the right place

$("#footer-box").prepend($("#social-footer"));


// implement image preloading

var images = new Array()

function preload() {

   for (i = 0; i < preload.arguments.length; i++) {
       images[i] = new Image()
       images[i].src = preload.arguments[i]
   }

}


// preload menu backgrounds

preload( "TUM13_menu-highlight.png",

        "TUM13_submenu-bg.png",
        "TUM13_submenu-highlight.png",
        "TUM13_tour-start-highlight.png",
        "TUM13_tour-previous-highlight.png",
        "TUM13_tour-next-highlight.png" );

// preload team pictures

if ( $("div#teamfield").length > 0 ) { preload( "TUM13_raven-front.png",

        "TUM13_raven-top.png",
        "TUM13_raven-top-left.png",
        "TUM13_raven-left.png",
        "TUM13_raven-bottom-left.png",
        "TUM13_raven-bottom.png",
        "TUM13_raven-bottom-right.png",
        "TUM13_raven-right.png",
        "TUM13_raven-top-right.png",
        "TUM13_katrin-front.png",        // Katrin
        "TUM13_katrin-t.png",
        "TUM13_katrin-t-l.png",
        "TUM13_katrin-l.png",
        "TUM13_katrin-b-l.png",
        "TUM13_katrin-b.png",
        "TUM13_katrin-b-r.png",
        "TUM13_katrin-r.png",
        "TUM13_katrin-t-r.png",
        "TUM13_rosario-front.png",        // Rosario
        "TUM13_rosario-t.png",
        "TUM13_rosario-t-l.png",
        "TUM13_rosario-l.png",
        "TUM13_rosario-b-l.png",
        "TUM13_rosario-b.png",
        "TUM13_rosario-b-r.png",
        "TUM13_rosario-r.png",
        "TUM13_rosario-t-r.png",
        "TUM13_fabian-front.png",        // Fabian
        "TUM13_fabian-t.png",
        "TUM13_fabian-t-l.png",
        "TUM13_fabian-l.png",
        "TUM13_fabian-b-l.png",
        "TUM13_fabian-b.png",
        "TUM13_fabian-b-r.png",
        "TUM13_fabian-r.png",
        "TUM13_fabian-t-r.png",
        "TUM13_andreas-front.png",        // Andreas
        "TUM13_andreas-t.png",
        "TUM13_andreas-t-l.png",
        "TUM13_andreas-l.png",
        "TUM13_andreas-b-l.png",
        "TUM13_andreas-b.png",
        "TUM13_andreas-b-r.png",
        "TUM13_andreas-r.png",
        "TUM13_andreas-t-r.png",
        "TUM13_louise-front.png",        // Louise
        "TUM13_louise-t.png",
        "TUM13_louise-t-l.png",
        "TUM13_louise-l.png",
        "TUM13_louise-b-l.png",
        "TUM13_louise-b.png",
        "TUM13_louise-b-r.png",
        "TUM13_louise-r.png",
        "TUM13_louise-t-r.png",
        "TUM13_johanna-front.png",        // Johanna
        "TUM13_johanna-t.png",
        "TUM13_johanna-t-l.png",
        "TUM13_johanna-l.png",
        "TUM13_johanna-b-l.png",
        "TUM13_johanna-b.png",
        "TUM13_johanna-b-r.png",
        "TUM13_johanna-r.png",
        "TUM13_johanna-t-r.png",
        "TUM13_meike-front.png",        // Meike
        "TUM13_meike-t.png",
        "TUM13_meike-t-l.png",
        "TUM13_meike-l.png",
        "TUM13_meike-b-l.png",
        "TUM13_meike-b.png",
        "TUM13_meike-b-r.png",
        "TUM13_meike-r.png",
        "TUM13_meike-t-r.png",
        "TUM13_volker-front.png",        // Volker
        "TUM13_volker-t.png",
        "TUM13_volker-t-l.png",
        "TUM13_volker-l.png",
        "TUM13_volker-b-l.png",
        "TUM13_volker-b.png",
        "TUM13_volker-b-r.png",
        "TUM13_volker-r.png",
        "TUM13_volker-t-r.png",
        "TUM13_polte-front.png",        // Polte
        "TUM13_polte-t.png",
        "TUM13_polte-t-l.png",
        "TUM13_polte-l.png",
        "TUM13_polte-b-l.png",
        "TUM13_polte-b.png",
        "TUM13_polte-b-r.png",
        "TUM13_polte-r.png",
        "TUM13_polte-t-r.png",
        "TUM13_leonie-front.png",        // Leonie
        "TUM13_leonie-t.png",
        "TUM13_leonie-t-l.png",
        "TUM13_leonie-l.png",
        "TUM13_leonie-b-l.png",
        "TUM13_leonie-b.png",
        "TUM13_leonie-b-r.png",
        "TUM13_leonie-r.png",
        "TUM13_leonie-t-r.png",
        "TUM13_philipp-front.png",        // Philipp
        "TUM13_philipp-t.png",
        "TUM13_philipp-t-l.png",
        "TUM13_philipp-l.png",
        "TUM13_philipp-b-l.png",
        "TUM13_philipp-b.png",
        "TUM13_philipp-b-r.png",
        "TUM13_philipp-r.png",
        "TUM13_philipp-t-r.png",
        "TUM13_jeff-front.png",        // Jeff
        "TUM13_jeff-t.png",
        "TUM13_jeff-t-l.png",
        "TUM13_jeff-l.png",
        "TUM13_jeff-b-l.png",
        "TUM13_jeff-b.png",
        "TUM13_jeff-b-r.png",
        "TUM13_jeff-r.png",
        "TUM13_jeff-t-r.png",
        "TUM13_chris-front.png",        // Chris
        "TUM13_chris-t.png",
        "TUM13_chris-t-l.png",
        "TUM13_chris-l.png",
        "TUM13_chris-b-l.png",
        "TUM13_chris-b.png",
        "TUM13_chris-b-r.png",
        "TUM13_chris-r.png",
        "TUM13_chris-t-r.png",
        "TUM13_flo-front.png",        // Flo
        "TUM13_flo-t.png",
        "TUM13_flo-t-l.png",
        "TUM13_flo-l.png",
        "TUM13_flo-b-l.png",
        "TUM13_flo-b.png",
        "TUM13_flo-b-r.png",
        "TUM13_flo-r.png",
        "TUM13_flo-t-r.png" );

}

// Slideshows

$('.bxslider').bxSlider({

   responsive: false,
   auto: true,
   autoHover: true

});

$('.bxgallery').bxSlider({

   slideMargin: 10,
   minSlides: 3,
   maxSlides: 3,
   moveSlides: 1,
   slideWidth: 5000

});

$("ul.bxgallery img").slimbox({ loop: true }, function(el) { url = el.src; url = url.substring(0, url.lastIndexOf('/')); url = url.replace('/thumb/', '/'); // description = $(el).parents("div.thumbinner").children("div.thumbcaption").text(); return [url, ]; }, function(el) { return (this == el) || (this.parentNode.parentNode && (this.parentNode.parentNode == el.parentNode.parentNode)); });


// Counter and Countdown

function render_counter(c) { i = 0; iid = window.setInterval(function(){ if ( (c-i) > (c/200) ) { $('span#counter').html(i); i += Math.round(c/200); } else { $('span#counter').html(c); window.clearInterval(iid); } }, 10); }

if ($('span#counter').length > 0) { $.ajax({ url: "https://2013.igem.org/Special:PopularPages", success: function( html ) { dom = $.parseHTML(html); visitors = $(dom).find('a[title="Team:TU-Munich"]').parent().text(); visitors = visitors.substring(visitors.indexOf('(')+1); visitors = visitors.substring(0, visitors.indexOf(' ')); visitors = visitors.replace(',', ); render_counter(visitors); }, error: function( xhr, status ) { render_counter(4700); } }); }

if ($('span#countdown) { clock = window.setInterval(function(){ jetzt = new Date(); time_left = Date.UTC(2013, 9, 5, 4, 0, 0) - Date.UTC(jetzt.getUTCFullYear(), jetzt.getUTCMonth(), jetzt.getUTCDate(), jetzt.getUTCHours(), jetzt.getUTCMinutes(), jetzt.getUTCSeconds()); left_sec = (time_left/1000)%60; left_sec = (left_sec < 10) ? "0" + left_sec : left_sec; left_min = Math.floor(time_left/60000)%60; left_min = (left_min < 10) ? "0" + left_min : left_min; left_h = Math.floor(time_left/3600000)%24; left_h = (left_h < 10) ? "0" + left_h : left_h; left_d = Math.floor(time_left/86400000); left_d = (left_d == 1) ? left_d + " day" : left_d + " days"; $('span#countdown').html(left_d + " " + left_h + ":" + left_min + ":" + left_sec); }, 1000); }

// Animate teamfield

if ( $("div#teamfield").length > 0 ) {

var $members = $("div#teamfield a");

$("body").mousemove(function(event){ for (i=0; i<$members.length; i++) {

if ( $members.eq(i).offset().left > event.pageX ) {

   if ( $members.eq(i).offset().top > event.pageY ) {
       $members.eq(i).removeClass();
       $members.eq(i).addClass("top-left");
   } else if ( $members.eq(i).offset().top <= event.pageY && ( $members.eq(i).offset().top + $members.eq(i).height() ) >= event.pageY ) {
       $members.eq(i).removeClass();
       $members.eq(i).addClass("left");
   } else if ( ( $members.eq(i).offset().top + $members.eq(i).height() ) < event.pageY ) {
       $members.eq(i).removeClass();
       $members.eq(i).addClass("bottom-left");
   }

} else if ( $members.eq(i).offset().left <= event.pageX && ( $members.eq(i).offset().left + $members.eq(i).width() ) >= event.pageX ) {

   if ( $members.eq(i).offset().top > event.pageY ) {
       $members.eq(i).removeClass();
       $members.eq(i).addClass("top");
   } else if ( $members.eq(i).offset().top <= event.pageY && ( $members.eq(i).offset().top + $members.eq(i).height() ) >= event.pageY ) {
       $members.eq(i).removeClass();
       $members.eq(i).addClass("front");
   } else if ( ( $members.eq(i).offset().top + $members.eq(i).height() ) < event.pageY ) {
       $members.eq(i).removeClass();
       $members.eq(i).addClass("bottom");
   }    

} else if ( ( $members.eq(i).offset().left + $members.eq(i).width() ) < event.pageX ) {

   if ( $members.eq(i).offset().top > event.pageY ) {
       $members.eq(i).removeClass();
       $members.eq(i).addClass("top-right");
   } else if ( $members.eq(i).offset().top <= event.pageY && ( $members.eq(i).offset().top + $members.eq(i).height() ) >= event.pageY ) {
       $members.eq(i).removeClass();
       $members.eq(i).addClass("right");
   } else if ( ( $members.eq(i).offset().top + $members.eq(i).height() ) < event.pageY ) {
       $members.eq(i).removeClass();
       $members.eq(i).addClass("bottom-right");
   }  

}

} });

}

});