Team:USTC-Software/Project/Method

From 2013.igem.org

(Difference between revisions)
 
(24 intermediate revisions not shown)
Line 1: Line 1:
{{USTC-Software/hidden}}
{{USTC-Software/hidden}}
-
{{USTC-Software/header}}
+
{{USTC-Software/header2}}
<!--{{USTC-Software/project}}-->
<!--{{USTC-Software/project}}-->
Line 29: Line 29:
<body>
<body>
-
<!--div id="direction">
+
<div id="direction">
         <ul>
         <ul>
-
           <li><a href="#abstract" class="button">Abstract</a>
+
                             
-
           <li><a href="#Fetch_Database" class="button">Fetch Database</a>
+
           <!--li><a href="#sequence" class="button">2.1 Sequence</a></br>
-
           <li><a href="#Alignment_Analyze" class="button">Alignment Analyze</a>
+
                <a href="#nwa" class="button" id="subbutton">2.1.1 Needleman-Wunsch Algorithm</a></br>
-
           <li><a href="#New_Network_Construction" class="button">New Network Construction</a>
+
                <a href="#asg" class="button" id="subbutton">2.1.2 A Supplementary Game</a>
-
          <li><a href="#Network_Model" class="button">Network Model</a>
+
          </li>
-
          <li><a href="#Predict" class="button">Predict</a>
+
 
-
          <li><a href="#Database" class="button">Database</a>           
+
           <li><a href="#filtering" class="button">2.2 Filtering</a></br>
 +
                <a href="#rn" class="button" id="subbutton">2.2.1 Random Noise</a></br>
 +
                <a href="#filter" class="button" id="subbutton">2.2.2 Filter</a>
 +
        </li>
 +
 
 +
           <li><a href="#rc" class="button">2.3 Regulation Calculation</a></li>
 +
 
 +
           <li><a href="#main" class="button">Top</a></li-->
 +
 
 +
 
 +
        <li><a href="#Fetch_Database" class="button">Database</a></li>
 +
        <li><a href="#Alignment_Analyze" class="button">Operon Theory and Regulatory Model</a></li>
 +
        <li><a href="#fa" class="button">Forward Analysis</a></li>
 +
        <li><a href="#ra" class="button">Reverse Analysis</a></li>
 +
         <li><a href="#reference" class="button">Reference</a></li>
 +
        <li><a href="#main" class="button">Top</a></li>
 +
 
         </ul>
         </ul>
</div>
</div>
Line 45: Line 61:
         var href = $(this).attr("href");
         var href = $(this).attr("href");
         var pos = $(href).offset().top - 100;
         var pos = $(href).offset().top - 100;
-
         $("html,body").animate({scrollTop: pos}, 1500);
+
         $("html,body").animate({scrollTop: pos}, 1500);//the smaller the quicker
         return false;
         return false;
     });
     });
});
});
-
</script-->
+
</script>
        
        
Line 59: Line 75:
   <h1 align="justify">Methodologies</h1>
   <h1 align="justify">Methodologies</h1>
   <p align="justify">In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.</br><br/>
   <p align="justify">In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.</br><br/>
 +
 +
<img src="https://static.igem.org/mediawiki/2013/6/6b/USTC_Software_FLOW.png" style="width:1000px;"/></br>
 +
There are four parts of methodologies: Database, Operon Theory and Regulatory Model, Forward Analysis and Reverse Analysis.
There are four parts of methodologies: Database, Operon Theory and Regulatory Model, Forward Analysis and Reverse Analysis.
   </p>
   </p>
Line 70: Line 89:
<h2>Database</h2>
<h2>Database</h2>
<div id="jobs_container">
<div id="jobs_container">
-
        <div class="jobs_trigger"><strong>Abstract</strong></div>
+
        <div class="jobs_trigger" id="dbabstract"><strong>Abstract</strong></div>
<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">To simulate and analyze a genetic regulatory network (GRN), we need to build an objects' array to store the complete information of each gene. It contains regulation relationships between genes, sequences of genes, sequences of promoters and so on. However, it's hard to find an appropriate database online containing all information we need in a simple file. RegulonDB has downloadable files about the regulation between transcription factors (TF) and genes. Files about genetic information, transcription unit information and promoter information can also be downloaded from the RegulonDB. All those files have been put into file “source data” in the root directory of our software. They contain all information the simulation needs and we use fetching module to achieve data extraction and integration. There are four steps: fetch regulation relationships, fetch gene information, fetch promoter information and integrate information above.
<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">To simulate and analyze a genetic regulatory network (GRN), we need to build an objects' array to store the complete information of each gene. It contains regulation relationships between genes, sequences of genes, sequences of promoters and so on. However, it's hard to find an appropriate database online containing all information we need in a simple file. RegulonDB has downloadable files about the regulation between transcription factors (TF) and genes. Files about genetic information, transcription unit information and promoter information can also be downloaded from the RegulonDB. All those files have been put into file “source data” in the root directory of our software. They contain all information the simulation needs and we use fetching module to achieve data extraction and integration. There are four steps: fetch regulation relationships, fetch gene information, fetch promoter information and integrate information above.
</p>
</p>
Line 76: Line 95:
   <div id="jobs_container">
   <div id="jobs_container">
-
        <div class="jobs_trigger"><strong>Fetch Regulation</strong></div>
+
        <div class="jobs_trigger" id="fetch"><strong>Fetch Regulation</strong></div>
<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">In GRN, there are two kinds of files: <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_tf.txt"> TF to TF</a> and <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt">TF to Gene</a>. Since the database about the regulation between TFs and Genes contains only one-way interaction, the matrix of GRN is a rectangle.</br></br>
<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">In GRN, there are two kinds of files: <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_tf.txt"> TF to TF</a> and <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt">TF to Gene</a>. Since the database about the regulation between TFs and Genes contains only one-way interaction, the matrix of GRN is a rectangle.</br></br>
First of all, read the regulation relationship of TFs. Our software filters the documentation of RegulonDB on the head of all files and then reads the name of regulate and regulated TF, which is also the name of its genes, one by one. In the same time, our software numerates the genes and stores their names into an objects' array of genetic data. </br></br>
First of all, read the regulation relationship of TFs. Our software filters the documentation of RegulonDB on the head of all files and then reads the name of regulate and regulated TF, which is also the name of its genes, one by one. In the same time, our software numerates the genes and stores their names into an objects' array of genetic data. </br></br>
&nbsp;&nbsp;The format of regulation database:</br>
&nbsp;&nbsp;The format of regulation database:</br>
&nbsp;&nbsp;&nbsp;&nbsp;TF_name &nbsp;&nbsp;&nbsp;TF_name &nbsp;&nbsp;&nbsp;+/-/+-</br></br>
&nbsp;&nbsp;&nbsp;&nbsp;TF_name &nbsp;&nbsp;&nbsp;TF_name &nbsp;&nbsp;&nbsp;+/-/+-</br></br>
-
<img src="https://static.igem.org/mediawiki/2013/6/69/USTC_Software_TT.jpg"/>
+
<div align="center"><img src="https://static.igem.org/mediawiki/2013/6/69/USTC_Software_TT.jpg"/></div></br>
The regulation of TFs has been put into a square matrix whose row is the regulator and column is the one regulated by. To make our GRN as complete as possible, the regulation between TF and genes has joined into the matrix. The one-way interaction results that we must read the TF in order to fulfill the regulator before completing the TF to gene's regulation in the same way of TF to TF. </br></br>
The regulation of TFs has been put into a square matrix whose row is the regulator and column is the one regulated by. To make our GRN as complete as possible, the regulation between TF and genes has joined into the matrix. The one-way interaction results that we must read the TF in order to fulfill the regulator before completing the TF to gene's regulation in the same way of TF to TF. </br></br>
&nbsp;&nbsp;The format of regulation database:</br>
&nbsp;&nbsp;The format of regulation database:</br>
&nbsp;&nbsp;&nbsp;&nbsp;TF_name &nbsp;&nbsp;&nbsp;Gene_name &nbsp;&nbsp;&nbsp;+/-/+-</br></br>
&nbsp;&nbsp;&nbsp;&nbsp;TF_name &nbsp;&nbsp;&nbsp;Gene_name &nbsp;&nbsp;&nbsp;+/-/+-</br></br>
-
<img src="https://static.igem.org/mediawiki/2013/4/47/USTC_Software_TG.jpg"/>
+
<div align="center"><img src="https://static.igem.org/mediawiki/2013/4/47/USTC_Software_TG.jpg"/></div></br>
At last, a regulatory matrix whose row represents regulate gene (TF) and whose column represents gene regulated by (TF+Gene) has been output into a file called “old_GRN” in root directory. The values in GRN matrix are regulations in which “1” means positive activation, “-1” means repression and “0” means no relationship. There have been some regulations both positive and negative identified regulations are determined by the experimental environment. As a result, our software picks out those uncertain genes and stores them into a file named “uncertain_database”.</br></br>
At last, a regulatory matrix whose row represents regulate gene (TF) and whose column represents gene regulated by (TF+Gene) has been output into a file called “old_GRN” in root directory. The values in GRN matrix are regulations in which “1” means positive activation, “-1” means repression and “0” means no relationship. There have been some regulations both positive and negative identified regulations are determined by the experimental environment. As a result, our software picks out those uncertain genes and stores them into a file named “uncertain_database”.</br></br>
&nbsp;&nbsp;The format of uncertain database:</br>
&nbsp;&nbsp;The format of uncertain database:</br>
Line 94: Line 113:
                 </div>
                 </div>
-
<div class="jobs_trigger"><strong> Fetch Gene Info</strong></div>
+
<div class="jobs_trigger" id="fgi"><strong> Fetch Gene Info</strong></div>
<div class="jobs_item" style="display: none;"><p align="justify">
<div class="jobs_item" style="display: none;"><p align="justify">
All gene information has been deposited into a file named gene_info which could be downloaded <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/Gene_sequence.txt">here</a>. In order of picking out the genes in GRN as fast as possible, all genetic information are stored in a “map”. “Map” is just like a dictionary yet its words are names of genes and its descriptions of words are replaced by genetic information. By using binary tree method, it is very fast to search the “word” wanted in the “dictionary”. As tested, the speed of binary tree method built-in “map” function is 720 times faster than traversal method.</br></br>
All gene information has been deposited into a file named gene_info which could be downloaded <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/Gene_sequence.txt">here</a>. In order of picking out the genes in GRN as fast as possible, all genetic information are stored in a “map”. “Map” is just like a dictionary yet its words are names of genes and its descriptions of words are replaced by genetic information. By using binary tree method, it is very fast to search the “word” wanted in the “dictionary”. As tested, the speed of binary tree method built-in “map” function is 720 times faster than traversal method.</br></br>
&nbsp;&nbsp;The format of Gene Info database:</br>
&nbsp;&nbsp;The format of Gene Info database:</br>
&nbsp;&nbsp;&nbsp;&nbsp;ID_assigned_by_RegulonDB &nbsp;&nbsp;&nbsp;Gene_name &nbsp;&nbsp;&nbsp;Left_end_position &nbsp;&nbsp;&nbsp;Right_end_position &nbsp;&nbsp;&nbsp;DNA_strand &nbsp;&nbsp;&nbsp;Product_type &nbsp;&nbsp;&nbsp;&nbsp;Product_name &nbsp;&nbsp;&nbsp;Start_codon_sequence&nbsp;&nbsp;&nbsp;  Stop_codon_sequence &nbsp;&nbsp;&nbsp;Gene_sequence</br></br>
&nbsp;&nbsp;&nbsp;&nbsp;ID_assigned_by_RegulonDB &nbsp;&nbsp;&nbsp;Gene_name &nbsp;&nbsp;&nbsp;Left_end_position &nbsp;&nbsp;&nbsp;Right_end_position &nbsp;&nbsp;&nbsp;DNA_strand &nbsp;&nbsp;&nbsp;Product_type &nbsp;&nbsp;&nbsp;&nbsp;Product_name &nbsp;&nbsp;&nbsp;Start_codon_sequence&nbsp;&nbsp;&nbsp;  Stop_codon_sequence &nbsp;&nbsp;&nbsp;Gene_sequence</br></br>
-
<img src="https://static.igem.org/mediawiki/2013/4/45/USTC_Software_GI.jpg"/>
+
<div align="center"><img src="https://static.igem.org/mediawiki/2013/4/45/USTC_Software_GI.jpg"/></div></br>
The label of the map vector is gene name which will be picked out based on the names read in regulation matrix before. It is really fast using the binary tree method to find the specific genetic information and store them into a specific object. Those information includes gene ID, left position, right position, gene description and gene sequence. The gene ID is used to link to RegulonDB's gene details; The left position is used to find its specific transcription unit; The right position is used to figure out the base amount; The description of genes is used to distinguish the RNA and protein; The sequence is used to predict the regulation by alignment.
The label of the map vector is gene name which will be picked out based on the names read in regulation matrix before. It is really fast using the binary tree method to find the specific genetic information and store them into a specific object. Those information includes gene ID, left position, right position, gene description and gene sequence. The gene ID is used to link to RegulonDB's gene details; The left position is used to find its specific transcription unit; The right position is used to figure out the base amount; The description of genes is used to distinguish the RNA and protein; The sequence is used to predict the regulation by alignment.
Line 107: Line 126:
                  
                  
                  
                  
-
             <div class="jobs_trigger"> <strong>Fetch Promoter Info</strong></div>
+
             <div class="jobs_trigger" id="fpi"> <strong>Fetch Promoter Info</strong></div>
        <div class="jobs_item" style="display: none;"><p align="justify">All promoter information has been deposited into a file named promoter_info which could be downloaded <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/PromoterSet.txt">here</a>. But we also need transcription unit information because the information files about promoter do not contain all genes' names backward. “TU Info” file, which can be downloaded <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/TUSet.txt">here</a>, contains the starting position of each TU and its promoter name. Our software picks out the starting position into a integer array. Using the left position picked out in gene info, our software would find out which unit the gene belongs to through dichotomy method and then stores the name of promoter into corresponding object.</br></br>
        <div class="jobs_item" style="display: none;"><p align="justify">All promoter information has been deposited into a file named promoter_info which could be downloaded <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/PromoterSet.txt">here</a>. But we also need transcription unit information because the information files about promoter do not contain all genes' names backward. “TU Info” file, which can be downloaded <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/TUSet.txt">here</a>, contains the starting position of each TU and its promoter name. Our software picks out the starting position into a integer array. Using the left position picked out in gene info, our software would find out which unit the gene belongs to through dichotomy method and then stores the name of promoter into corresponding object.</br></br>
&nbsp;&nbsp;The format of TU info database:</br>
&nbsp;&nbsp;The format of TU info database:</br>
&nbsp;&nbsp;&nbsp;&nbsp;Operon_name &nbsp;&nbsp;&nbsp;Unit_name &nbsp;&nbsp;&nbsp;promoter_name &nbsp;&nbsp;&nbsp;Transcription_start_site ......</br></br>
&nbsp;&nbsp;&nbsp;&nbsp;Operon_name &nbsp;&nbsp;&nbsp;Unit_name &nbsp;&nbsp;&nbsp;promoter_name &nbsp;&nbsp;&nbsp;Transcription_start_site ......</br></br>
-
<img src="https://static.igem.org/mediawiki/2013/1/1e/USTC_Software_TI.jpg"/>
+
<div align="center"><img src="https://static.igem.org/mediawiki/2013/1/1e/USTC_Software_TI.jpg"/></div></br>
The principle of fetching information of promoters is same as fetching genes's. Our software stores the promoter information from the file named “promoter_info” in a “map” which could be used to pick out the promoter sequence by searching promoter name through binary tree method.</br></br>
The principle of fetching information of promoters is same as fetching genes's. Our software stores the promoter information from the file named “promoter_info” in a “map” which could be used to pick out the promoter sequence by searching promoter name through binary tree method.</br></br>
&nbsp;&nbsp;The format of Promoter Info database:</br>
&nbsp;&nbsp;The format of Promoter Info database:</br>
&nbsp;&nbsp;&nbsp;&nbsp;Promoter_ID_assigned_by_RegulonDB &nbsp;&nbsp;&nbsp;Promoter_name</br></br>
&nbsp;&nbsp;&nbsp;&nbsp;Promoter_ID_assigned_by_RegulonDB &nbsp;&nbsp;&nbsp;Promoter_name</br></br>
-
<img src="https://static.igem.org/mediawiki/2013/8/8a/USTC_Software_PI.jpg"/>
+
<div align="center"><img src="https://static.igem.org/mediawiki/2013/8/8a/USTC_Software_PI.jpg"/></div></br>
The sequence of promoter will be used in the alignment method in next module which could make a prediction of exogenous genes' regulation pattern.
The sequence of promoter will be used in the alignment method in next module which could make a prediction of exogenous genes' regulation pattern.
</p>          </div>   
</p>          </div>   
Line 121: Line 140:
                  
                  
                
                
-
<div class="jobs_trigger"> <strong>Integration</strong></div>
+
<div class="jobs_trigger" id="Integration"> <strong>Integration</strong></div>
<div class="jobs_item" style="display: block;"><p align="justify">                     
<div class="jobs_item" style="display: block;"><p align="justify">                     
Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface's reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software.</br></br>
Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface's reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software.</br></br>
Line 230: Line 249:
-
<h2>Forward Analysis</h2>
+
<h2 id="fa">Forward Analysis</h2>
-
<div class="jobs_trigger"><strong>Construct New GRN</strong></div>
+
<div class="jobs_trigger" id="cng"><strong>Construct New GRN</strong></div>
   <div class="jobs_item" style="display: none;">
   <div class="jobs_item" style="display: none;">
-
     <h3>1 User Input</h3>
+
     <h3 id="ui">1 User Input</h3>
     <p align="justify">
     <p align="justify">
       Some genes' regulation could be get from experiment. So, if users could get the unknow regulation between new gene and old ones, they could manually set the interactions which do not need model. Those regulations will be used in later simulation.
       Some genes' regulation could be get from experiment. So, if users could get the unknow regulation between new gene and old ones, they could manually set the interactions which do not need model. Those regulations will be used in later simulation.
     </p>
     </p>
     <h3>2 Simalarity Analysis</h3>
     <h3>2 Simalarity Analysis</h3>
-
     <p align="justify"><b>2.1 Sequence</b></br>
+
     <p align="justify"><div id="sequence"><h4>2.1 Sequence</h4></div></br>
-
       <h4>2.1.1 Needleman-Wunsch Algorithm</h4>
+
       <div id="nwa"><h5>2.1.1 Needleman-Wunsch Algorithm</h5></div>
       The Needleman-Wunsch algorithm was first published in1970 by Saul B. Needleman and Christian D. Wunsch. It performs a global alignment of two sequences and is mostly used in bioinformatics to align protein or nucleotide sequence. Our software applied this algorithm in the alignment of DNA and amino acid sequences.<br/><br/>
       The Needleman-Wunsch algorithm was first published in1970 by Saul B. Needleman and Christian D. Wunsch. It performs a global alignment of two sequences and is mostly used in bioinformatics to align protein or nucleotide sequence. Our software applied this algorithm in the alignment of DNA and amino acid sequences.<br/><br/>
Line 296: Line 315:
                               CGAGAC - - GT - - -
                               CGAGAC - - GT - - -
       </em></strong></p>
       </em></strong></p>
-
       <h4>2.1.2 A Supplementary Game</h4>
+
       <div id="asg"><h5>2.1.2 A Supplementary Game</h5></div>
       <p align="justify">The rows and columns in the GRN matrix can be regarded as vectors containing the regulated or the regulating information. The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. Biologists usually think the more similar two sequences are, the more likely they have similar behaviors. Whether the ratio of genes with similar behaviors is positively correlated with gene similarity is essential to our project. So we obtained 1.6 million sets of data by pairwise alignment of all the 1748 units in the GRN of K-12. Each set of data consists of gene similarity and behavior similarity. The result is analyzed and plotted in the figure. The linear fit shows that the ratio is positively correlated with the similarity.</p><br/>
       <p align="justify">The rows and columns in the GRN matrix can be regarded as vectors containing the regulated or the regulating information. The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. Biologists usually think the more similar two sequences are, the more likely they have similar behaviors. Whether the ratio of genes with similar behaviors is positively correlated with gene similarity is essential to our project. So we obtained 1.6 million sets of data by pairwise alignment of all the 1748 units in the GRN of K-12. Each set of data consists of gene similarity and behavior similarity. The result is analyzed and plotted in the figure. The linear fit shows that the ratio is positively correlated with the similarity.</p><br/>
Line 302: Line 321:
       <p><strong>Figure 4.</strong>Linear fit of ratio-similarity relationship.</p></div>
       <p><strong>Figure 4.</strong>Linear fit of ratio-similarity relationship.</p></div>
       <p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p>
       <p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p>
-
  <p>
+
 
-
     <b>2.2 Filtering</b></p>
+
     <div id="filtering"><h4>2.2 Filtering</h4></div>
-
     <h4>2.2.1 Random Noise</h4>
+
     <div id="rn"><h5>2.2.1 Random Noise</h5></div>
     <p class="bodytext"></p><p align="justify">Normally, the similarity of two sequences will not be zero. Some computational
     <p class="bodytext"></p><p align="justify">Normally, the similarity of two sequences will not be zero. Some computational
experiments were carried out to study the random sequence similarities. We randomly
experiments were carried out to study the random sequence similarities. We randomly
Line 313: Line 332:
<img src="https://static.igem.org/mediawiki/igem.org/8/89/USTC_Software_Figure_4.png" />
<img src="https://static.igem.org/mediawiki/igem.org/8/89/USTC_Software_Figure_4.png" />
<p><strong>Figure 5.</strong> Random similarity distribution</p></div>
<p><strong>Figure 5.</strong> Random similarity distribution</p></div>
-
     <h4>2.2.2 Filter</h4>
+
     <div id="filter"><h5>2.2.2 Filter</h5></div>
     <p align="justify">We need the genes highly similar to the exogenous one to interact with it. The program will
     <p align="justify">We need the genes highly similar to the exogenous one to interact with it. The program will
align the exogenous gene(query) with genes in the network(subject) and get the original
align the exogenous gene(query) with genes in the network(subject) and get the original
Line 328: Line 347:
An example about filtring and consistency is presented in “Example”.
An example about filtring and consistency is presented in “Example”.
</p>
</p>
-
     <p><b>2.3 Regulation Calculation</b></p>
+
     <div id="rc"><h4>2.3 Regulation Calculation</h4></div>
     <p align="justify">If there is a three-unit network and they interact with each other as it is shown in the figure.
     <p align="justify">If there is a three-unit network and they interact with each other as it is shown in the figure.
The regulation is described by the GRN matrix.</p>
The regulation is described by the GRN matrix.</p>
<div align="center"><img src="https://static.igem.org/mediawiki/igem.org/8/8a/USTC_Software_Figure_5.png" />
<div align="center"><img src="https://static.igem.org/mediawiki/igem.org/8/8a/USTC_Software_Figure_5.png" />
-
<p align="justify"><strong>Figure 6.</strong> Example network and its GRN matrix.</p></div>
+
<p align="center"><strong>Figure 6.</strong> Example network and its GRN matrix.</p></div>
Line 363: Line 382:
<img src="https://static.igem.org/mediawiki/igem.org/c/c5/USTC_Software_Figure_7.png" />
<img src="https://static.igem.org/mediawiki/igem.org/c/c5/USTC_Software_Figure_7.png" />
</div>
</div>
-
<p><strong>Figure 8.</strong> Construct New GRN</p>
+
<p align="center"><strong>Figure 8.</strong> Construct New GRN</p>
     <h3>3 Clustering</h3>
     <h3>3 Clustering</h3>
     <p>
     <p>
Line 371: Line 390:
     </p>
     </p>
   </div>
   </div>
-
<div class="jobs_trigger"><strong>Network Model</strong></div>
+
<div class="jobs_trigger" id="nm"><strong>Network Model</strong></div>
   <div class="jobs_item" style="display: none;">
   <div class="jobs_item" style="display: none;">
<p align="justify">Network analysis includes finding stable condition of network, adding new gene, finding new stable condition and changes from original condition to new condition. We use densities of materials to describe network condition. If all material densities are time-invariant, we can say the network condition is stable.</p>
<p align="justify">Network analysis includes finding stable condition of network, adding new gene, finding new stable condition and changes from original condition to new condition. We use densities of materials to describe network condition. If all material densities are time-invariant, we can say the network condition is stable.</p>
Line 398: Line 417:
   </div>
   </div>
-
<div class="jobs_trigger"><strong>Evaluate Network</strong></div>
+
<div class="jobs_trigger" id="en"><strong>Evaluate Network</strong></div>
   <div class="jobs_item" style="display: none;">
   <div class="jobs_item" style="display: none;">
<p align="justify">Record the original stable condition, set new material density to 0 and this is the new initial density vector. Solve new equations and record density vectors before the new condition is stable and store these data in a text file.</br></br>
<p align="justify">Record the original stable condition, set new material density to 0 and this is the new initial density vector. Solve new equations and record density vectors before the new condition is stable and store these data in a text file.</br></br>
Line 414: Line 433:
-
<h2>Reverse Analysis</h2>
+
<h2 id="ra">Reverse Analysis</h2>
-
<div class="jobs_trigger"><strong>Virtual Gene</strong></div>
+
 
 +
<div class="jobs_trigger" id="vg"><strong>Virtual Gene</strong></div>
   <div class="jobs_item" style="display: none;">
   <div class="jobs_item" style="display: none;">
<p align="justify">Before reverse analysis, we use the same idea about constructing a new GRN. So we create a virtual gene which replace the gene what users want to get. In calculation, it means that we add a row and a column to the matrix of GRN.</p>
<p align="justify">Before reverse analysis, we use the same idea about constructing a new GRN. So we create a virtual gene which replace the gene what users want to get. In calculation, it means that we add a row and a column to the matrix of GRN.</p>
 +
</div>
-
 
+
<div class="jobs_trigger" id="er"><strong>Expression Range</strong></div>
-
  </div>
+
-
<div class="jobs_trigger"><strong>Expression Range</strong></div>
+
   <div class="jobs_item" style="display: none;">
   <div class="jobs_item" style="display: none;">
<p align="justify">Before prediction, the expression of specific genes which the experimenter needs should be input into our software as well as the improvement or depression. The number of target gene is SIX at most.</br></br>
<p align="justify">Before prediction, the expression of specific genes which the experimenter needs should be input into our software as well as the improvement or depression. The number of target gene is SIX at most.</br></br>
It is a must that figuring out the strongest and weakest expression strength before inputting the extreme cases into the target expression. The way to find out the strongest and weakest expression is modeling the GRN's steady state by a large amount of random regulation from -1 and 1. We ran it for 1000 times to get the range of gene expression. On the other hand, the expression of genes unpicked by the users should be stable as much as possible. The initial strength of expression is calculated by modeling the original GRN with Hill's equation.
It is a must that figuring out the strongest and weakest expression strength before inputting the extreme cases into the target expression. The way to find out the strongest and weakest expression is modeling the GRN's steady state by a large amount of random regulation from -1 and 1. We ran it for 1000 times to get the range of gene expression. On the other hand, the expression of genes unpicked by the users should be stable as much as possible. The initial strength of expression is calculated by modeling the original GRN with Hill's equation.
</p>
</p>
 +
</div>
-
 
+
<div class="jobs_trigger" id="pso"><strong>Particle Swarm Optimaztion</strong></div>
-
  </div>
+
-
<div class="jobs_trigger"><strong>Particle Swarm Optimaztion</strong></div>
+
   <div class="jobs_item" style="display: none;">
   <div class="jobs_item" style="display: none;">
<p align="justify">
<p align="justify">
Line 438: Line 456:
We constantly revises the factors in PSO algorithm by machine learning method for accurate simulation with a fast PSO particle-motion equation. At the same time, our software also filter the result of regulatory value which is more intuitive.
We constantly revises the factors in PSO algorithm by machine learning method for accurate simulation with a fast PSO particle-motion equation. At the same time, our software also filter the result of regulatory value which is more intuitive.
</p>
</p>
 +
</div>
-
 
+
<div class="jobs_trigger" id="lot"><strong>Locate Optimal Target</strong></div>
-
  </div>
+
-
<div class="jobs_trigger"><strong>Locate Optimal Target</strong></div>
+
   <div class="jobs_item" style="display: none;">
   <div class="jobs_item" style="display: none;">
<p align="justify">To improve the efficiency of choosing a suitable gene after getting a series of regulatory value, our software picks out some obvious regulation. The value of regulation is between -1 to 1 in which -1 means negative effect and 1 means positive effect. As a result, what our software has done is filtering out the absolute value which is lower than 0.9. Because the difference of regulatory intensity lower than 0.1 has very little effect to the stable expression, the final result of regulation is indicated by Boolean value.</br></br>
<p align="justify">To improve the efficiency of choosing a suitable gene after getting a series of regulatory value, our software picks out some obvious regulation. The value of regulation is between -1 to 1 in which -1 means negative effect and 1 means positive effect. As a result, what our software has done is filtering out the absolute value which is lower than 0.9. Because the difference of regulatory intensity lower than 0.1 has very little effect to the stable expression, the final result of regulation is indicated by Boolean value.</br></br>
The format of regulatory prediction in “Result”:</br>
The format of regulatory prediction in “Result”:</br>
Gene_name->Gene_name    regulation(+/-)
Gene_name->Gene_name    regulation(+/-)
-
 
</p>           
</p>           
 +
</div>
 +
<h2 id="reference">Reference</h2>
 +
 +
<p align="justify">
 +
 +
Lei Z, Dai Y. Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction[J]. BMC bioinformatics, 2006, 7(1): 491.</br></br>
 +
 +
 +
Ramoni M F, Sebastiani P, Kohane I S. Cluster analysis of gene expression dynamics[J]. Proceedings of the National Academy of Sciences, 2002, 99(14): 9121-9126.</br></br>
 +
 +
Thieffry D, Huerta A M, Pérez‐Rueda E, et al. From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli[J]. Bioessays, 1998, 20(5): 433-440.</br></br>
 +
 +
 +
Eberhart R, Kennedy J. A new optimizer using particle swarm theory[C]//Micro Machine and Human Science, 1995. MHS'95., Proceedings of the Sixth International Symposium on. IEEE, 1995: 39-43.</br></br>
 +
 +
 +
Jacob F, Perrin D, Sánchez C, et al. L'opéron: groupe de gènes à expression coordonnée par un opérateur [CR Acad. Sci. Paris 250 (1960) 1727–1729][J]. Comptes rendus biologies, 2005, 328(6): 514-520.</br></br>
 +
 +
 +
Needleman S B, Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of molecular biology, 1970, 48(3): 443-453.</br></br>
 +
 +
 +
Gama-Castro S, Jiménez-Jacinto V, Peralta-Gil M, et al. RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation[J]. Nucleic acids research, 2008, 36(suppl 1): D120-D124.</br></br>
 +
 +
Martınez-Antonio A, Collado-Vides J. Identifying global regulators in transcriptional regulatory networks in bacteria[J]. Current opinion in microbiology, 2003, 6(5): 482-489.</br></br>
 +
 +
 +
Salgado H, Moreno-Hagelsieb G, Smith T F, et al. Operons in Escherichia coli: genomic analyses and predictions[J]. Proceedings of the National Academy of Sciences, 2000, 97(12): 6652-6657.</br></br>
 +
 +
 +
Thieffry D, Salgado H, Huerta A M, et al. Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12[J]. Bioinformatics, 1998, 14(5): 391-400.
 +
 +
</p>
 +
 +
<div class="jobs_trigger" style="display:none;"></div>
 +
<div class="jobs_item" style="display: none;"><p></p></div>
-
  </div>
 
</body>
</body>
</html>
</html>

Latest revision as of 00:46, 29 October 2013

Header2


Methodologies

Methodologies

In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.


There are four parts of methodologies: Database, Operon Theory and Regulatory Model, Forward Analysis and Reverse Analysis.

Database

Abstract
Fetch Regulation
Fetch Gene Info
Fetch Promoter Info
Integration

Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface's reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software.

  The format of all_info database:
    No.    promoter_sequence    gene_sequence    gene_name    ID    left_position    right_position    promoter_name     description
The fetching module generates three files: old_GRN, all_info and uncertain_database.

Operon Theory and Regulatory Model

Operon Theory
Regulatory Model
Similarity and Homology

Forward Analysis

Construct New GRN
Network Model
Evaluate Network

Reverse Analysis

Virtual Gene
Expression Range
Particle Swarm Optimaztion
Locate Optimal Target

Reference

Lei Z, Dai Y. Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction[J]. BMC bioinformatics, 2006, 7(1): 491.

Ramoni M F, Sebastiani P, Kohane I S. Cluster analysis of gene expression dynamics[J]. Proceedings of the National Academy of Sciences, 2002, 99(14): 9121-9126.

Thieffry D, Huerta A M, Pérez‐Rueda E, et al. From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli[J]. Bioessays, 1998, 20(5): 433-440.

Eberhart R, Kennedy J. A new optimizer using particle swarm theory[C]//Micro Machine and Human Science, 1995. MHS'95., Proceedings of the Sixth International Symposium on. IEEE, 1995: 39-43.

Jacob F, Perrin D, Sánchez C, et al. L'opéron: groupe de gènes à expression coordonnée par un opérateur [CR Acad. Sci. Paris 250 (1960) 1727–1729][J]. Comptes rendus biologies, 2005, 328(6): 514-520.

Needleman S B, Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of molecular biology, 1970, 48(3): 443-453.

Gama-Castro S, Jiménez-Jacinto V, Peralta-Gil M, et al. RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation[J]. Nucleic acids research, 2008, 36(suppl 1): D120-D124.

Martınez-Antonio A, Collado-Vides J. Identifying global regulators in transcriptional regulatory networks in bacteria[J]. Current opinion in microbiology, 2003, 6(5): 482-489.

Salgado H, Moreno-Hagelsieb G, Smith T F, et al. Operons in Escherichia coli: genomic analyses and predictions[J]. Proceedings of the National Academy of Sciences, 2000, 97(12): 6652-6657.

Thieffry D, Salgado H, Huerta A M, et al. Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12[J]. Bioinformatics, 1998, 14(5): 391-400.