Team:USTC-Software/Project/Method

From 2013.igem.org

(Difference between revisions)
Line 32: Line 32:
         <ul>
         <ul>
                                
                                
-
           <li><a href="#2.1" class="button">2.1 Sequence</a></br>
+
           <li><a href="#sequence" class="button">2.1 Sequence</a></br>
-
                 <a href="#ui" class="button" id="subbutton">2.1.1 Needleman-Wunsch Algorithm</a></br>
+
                 <a href="#nwa" class="button" id="subbutton">2.1.1 Needleman-Wunsch Algorithm</a></br>
-
                 <a href="#2.1.2" class="button" id="subbutton">2.1.2 A Supplementary Game</a>
+
                 <a href="#asg" class="button" id="subbutton">2.1.2 A Supplementary Game</a>
           </li>
           </li>
-
           <li><a href="#2.2" class="button">2.2 Filtering</a></br>
+
           <li><a href="#filtering" class="button">2.2 Filtering</a></br>
-
                 <a href="#2.2.1" class="button" id="subbutton">2.2.1 Random Noise</a></br>
+
                 <a href="#rn" class="button" id="subbutton">2.2.1 Random Noise</a></br>
-
                 <a href="#2.2.2" class="button" id="subbutton">2.2.2 Filter</a>
+
                 <a href="#filter" class="button" id="subbutton">2.2.2 Filter</a>
         </li>
         </li>
-
           <li><a href="#2.3" class="button">2.3 Regulation Calculation</a></li>
+
           <li><a href="#rc" class="button">2.3 Regulation Calculation</a></li>
           <li><a href="#main" class="button">Top</a></li>
           <li><a href="#main" class="button">Top</a></li>
Line 249: Line 249:
     </p>
     </p>
     <h3>2 Simalarity Analysis</h3>
     <h3>2 Simalarity Analysis</h3>
-
     <p align="justify"><div id="2.1"><h4>2.1 Sequence</h4></div></br>
+
     <p align="justify"><div id="sequence"><h4>2.1 Sequence</h4></div></br>
-
       <div id="2.1.1"><h5>2.1.1 Needleman-Wunsch Algorithm</h5></div>
+
       <div id="nwa"><h5>2.1.1 Needleman-Wunsch Algorithm</h5></div>
       The Needleman-Wunsch algorithm was first published in1970 by Saul B. Needleman and Christian D. Wunsch. It performs a global alignment of two sequences and is mostly used in bioinformatics to align protein or nucleotide sequence. Our software applied this algorithm in the alignment of DNA and amino acid sequences.<br/><br/>
       The Needleman-Wunsch algorithm was first published in1970 by Saul B. Needleman and Christian D. Wunsch. It performs a global alignment of two sequences and is mostly used in bioinformatics to align protein or nucleotide sequence. Our software applied this algorithm in the alignment of DNA and amino acid sequences.<br/><br/>
Line 307: Line 307:
                               CGAGAC - - GT - - -
                               CGAGAC - - GT - - -
       </em></strong></p>
       </em></strong></p>
-
       <div id="2.1.2"><h5>2.1.2 A Supplementary Game</h5></div>
+
       <div id="asg"><h5>2.1.2 A Supplementary Game</h5></div>
       <p align="justify">The rows and columns in the GRN matrix can be regarded as vectors containing the regulated or the regulating information. The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. Biologists usually think the more similar two sequences are, the more likely they have similar behaviors. Whether the ratio of genes with similar behaviors is positively correlated with gene similarity is essential to our project. So we obtained 1.6 million sets of data by pairwise alignment of all the 1748 units in the GRN of K-12. Each set of data consists of gene similarity and behavior similarity. The result is analyzed and plotted in the figure. The linear fit shows that the ratio is positively correlated with the similarity.</p><br/>
       <p align="justify">The rows and columns in the GRN matrix can be regarded as vectors containing the regulated or the regulating information. The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. Biologists usually think the more similar two sequences are, the more likely they have similar behaviors. Whether the ratio of genes with similar behaviors is positively correlated with gene similarity is essential to our project. So we obtained 1.6 million sets of data by pairwise alignment of all the 1748 units in the GRN of K-12. Each set of data consists of gene similarity and behavior similarity. The result is analyzed and plotted in the figure. The linear fit shows that the ratio is positively correlated with the similarity.</p><br/>
Line 314: Line 314:
       <p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p>
       <p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p>
-
     <div id="2.2"><h4>2.2 Filtering</h4></div>
+
     <div id="filtering"><h4>2.2 Filtering</h4></div>
-
     <div id="2.2.1"><h5>2.2.1 Random Noise</h5></div>
+
     <div id="rn"><h5>2.2.1 Random Noise</h5></div>
     <p class="bodytext"></p><p align="justify">Normally, the similarity of two sequences will not be zero. Some computational
     <p class="bodytext"></p><p align="justify">Normally, the similarity of two sequences will not be zero. Some computational
experiments were carried out to study the random sequence similarities. We randomly
experiments were carried out to study the random sequence similarities. We randomly
Line 324: Line 324:
<img src="https://static.igem.org/mediawiki/igem.org/8/89/USTC_Software_Figure_4.png" />
<img src="https://static.igem.org/mediawiki/igem.org/8/89/USTC_Software_Figure_4.png" />
<p><strong>Figure 5.</strong> Random similarity distribution</p></div>
<p><strong>Figure 5.</strong> Random similarity distribution</p></div>
-
     <div id="2.2.2"><h5>2.2.2 Filter</h5></div>
+
     <div id="filter"><h5>2.2.2 Filter</h5></div>
     <p align="justify">We need the genes highly similar to the exogenous one to interact with it. The program will
     <p align="justify">We need the genes highly similar to the exogenous one to interact with it. The program will
align the exogenous gene(query) with genes in the network(subject) and get the original
align the exogenous gene(query) with genes in the network(subject) and get the original
Line 339: Line 339:
An example about filtring and consistency is presented in “Example”.
An example about filtring and consistency is presented in “Example”.
</p>
</p>
-
     <div id="2.3"><h4>2.3 Regulation Calculation</h4></div>
+
     <div id="rc"><h4>2.3 Regulation Calculation</h4></div>
     <p align="justify">If there is a three-unit network and they interact with each other as it is shown in the figure.
     <p align="justify">If there is a three-unit network and they interact with each other as it is shown in the figure.
The regulation is described by the GRN matrix.</p>
The regulation is described by the GRN matrix.</p>

Revision as of 01:22, 28 October 2013

Header2


Methodologies

Methodologies

In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.

There are four parts of methodologies: Database, Operon Theory and Regulatory Model, Forward Analysis and Reverse Analysis.

Database

Abstract
Fetch Regulation
Fetch Gene Info
Fetch Promoter Info
Integration

Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface's reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software.

  The format of all_info database:
    No.    promoter_sequence    gene_sequence    gene_name    ID    left_position    right_position    promoter_name     description
The fetching module generates three files: old_GRN, all_info and uncertain_database.

Operon Theory and Regulatory Model

Operon Theory
Regulatory Model
Similarity and Homology

Forward Analysis

Construct New GRN
Network Model
Evaluate Network

Reverse Analysis

Virtual Gene
Expression Range
Particle Swarm Optimaztion
Locate Optimal Target