Team:USTC-Software/Project/Method

From 2013.igem.org

(Difference between revisions)
Line 233: Line 233:
<div class="jobs_trigger"><strong>Construct New GRN</strong></div>
<div class="jobs_trigger"><strong>Construct New GRN</strong></div>
   <div class="jobs_item" style="display: none;">
   <div class="jobs_item" style="display: none;">
-
     <h3>User Input</h3>
+
     <h3>1 User Input</h3>
     <p align="justify">
     <p align="justify">
       Some genes' regulation could be get from experiment. So, if users could get the unknow regulation between new gene and old ones, they could manually set the interactions which do not need model. Those regulations will be used in later simulation.
       Some genes' regulation could be get from experiment. So, if users could get the unknow regulation between new gene and old ones, they could manually set the interactions which do not need model. Those regulations will be used in later simulation.
     </p>
     </p>
-
     <h3>Simalarity Analysis</h3>
+
     <h3>2 Simalarity Analysis</h3>
-
     <p align="justify"><b>1.Sequence</b></br>
+
     <p align="justify"><b>2.1 Sequence</b></br>
-
       <h4>Needleman-Wunsch Algorithm</h4>
+
       <h4>2.1.1 Needleman-Wunsch Algorithm</h4>
       The Needleman-Wunsch algorithm was first published in1970 by Saul B. Needleman and Christian D. Wunsch. It performs a global alignment of two sequences and is mostly used in bioinformatics to align protein or nucleotide sequence. Our software applied this algorithm in the alignment of DNA and amino acid sequences.<br/><br/>
       The Needleman-Wunsch algorithm was first published in1970 by Saul B. Needleman and Christian D. Wunsch. It performs a global alignment of two sequences and is mostly used in bioinformatics to align protein or nucleotide sequence. Our software applied this algorithm in the alignment of DNA and amino acid sequences.<br/><br/>
Line 296: Line 296:
                               CGAGAC - - GT - - -
                               CGAGAC - - GT - - -
       </em></strong></p>
       </em></strong></p>
-
       <h4>A Supplementary Game</h4>
+
       <h4>2.1.2 A Supplementary Game</h4>
       <p align="justify">The rows and columns in the GRN matrix can be regarded as vectors containing the regulated or the regulating information. The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. Biologists usually think the more similar two sequences are, the more likely they have similar behaviors. Whether the ratio of genes with similar behaviors is positively correlated with gene similarity is essential to our project. So we obtained 1.6 million sets of data by pairwise alignment of all the 1748 units in the GRN of K-12. Each set of data consists of gene similarity and behavior similarity. The result is analyzed and plotted in the figure. The linear fit shows that the ratio is positively correlated with the similarity.</p><br/>
       <p align="justify">The rows and columns in the GRN matrix can be regarded as vectors containing the regulated or the regulating information. The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. Biologists usually think the more similar two sequences are, the more likely they have similar behaviors. Whether the ratio of genes with similar behaviors is positively correlated with gene similarity is essential to our project. So we obtained 1.6 million sets of data by pairwise alignment of all the 1748 units in the GRN of K-12. Each set of data consists of gene similarity and behavior similarity. The result is analyzed and plotted in the figure. The linear fit shows that the ratio is positively correlated with the similarity.</p><br/>
Line 303: Line 303:
       <p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p>
       <p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p>
   <p>
   <p>
-
     <b>2.Filtering</b></p>
+
     <b>2.2 Filtering</b></p>
-
     <h4>Random Noise</h4>
+
     <h4>2.2.1 Random Noise</h4>
     <p class="bodytext"></p><p align="justify">Normally, the similarity of two sequences will not be zero. Some computational
     <p class="bodytext"></p><p align="justify">Normally, the similarity of two sequences will not be zero. Some computational
experiments were carried out to study the random sequence similarities. We randomly
experiments were carried out to study the random sequence similarities. We randomly
Line 313: Line 313:
<img src="https://static.igem.org/mediawiki/igem.org/8/89/USTC_Software_Figure_4.png" />
<img src="https://static.igem.org/mediawiki/igem.org/8/89/USTC_Software_Figure_4.png" />
<p><strong>Figure 5.</strong> Random similarity distribution</p></div>
<p><strong>Figure 5.</strong> Random similarity distribution</p></div>
-
     <h4>Filter</h4>
+
     <h4>2.2.2 Filter</h4>
     <p align="justify">We need the genes highly similar to the exogenous one to interact with it. The program will
     <p align="justify">We need the genes highly similar to the exogenous one to interact with it. The program will
align the exogenous gene(query) with genes in the network(subject) and get the original
align the exogenous gene(query) with genes in the network(subject) and get the original
Line 328: Line 328:
An example about filtring and consistency is presented in “Example”.
An example about filtring and consistency is presented in “Example”.
</p>
</p>
-
     <p><b>3.Regulation Calculation</b></p>
+
     <p><b>2.3 Regulation Calculation</b></p>
     <p align="justify">If there is a three-unit network and they interact with each other as it is shown in the figure.
     <p align="justify">If there is a three-unit network and they interact with each other as it is shown in the figure.
The regulation is described by the GRN matrix.</p>
The regulation is described by the GRN matrix.</p>
Line 364: Line 364:
</div>
</div>
<p><strong>Figure 8.</strong> Construct New GRN</p>
<p><strong>Figure 8.</strong> Construct New GRN</p>
-
     <h3>Clustering</h3>
+
     <h3>3 Clustering</h3>
     <p>
     <p>
       Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.</br></br>
       Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics.</br></br>

Revision as of 14:43, 27 October 2013

Slide

Take a gNAP before wearing your gloves! Genetic Network Analyze and Predict
The sketch and final GUI of gNAP!
We compare the result of our software with gene expression profile in literature.
We are USTC-Software!


Methodologies

Methodologies

In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.

There are four parts of methodologies: Database, Operon Theory and Regulatory Model, Forward Analysis and Reverse Analysis.

Database

Abstract
Fetch Regulation
Fetch Gene Info
Fetch Promoter Info
Integration

Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface's reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software.

  The format of all_info database:
    No.    promoter_sequence    gene_sequence    gene_name    ID    left_position    right_position    promoter_name     description
The fetching module generates three files: old_GRN, all_info and uncertain_database.

Operon Theory and Regulatory Model

Operon Theory
Regulatory Model
Similarity and Homology

Forward Analysis

Construct New GRN
Network Model
Evaluate Network

Reverse Analysis

Virtual Gene
Expression Range
Particle Swarm Optimaztion
Locate Optimal Target