Team:USTC-Software/Project/Method
From 2013.igem.org
Line 249: | Line 249: | ||
</p> | </p> | ||
<h3>2 Simalarity Analysis</h3> | <h3>2 Simalarity Analysis</h3> | ||
- | <p align="justify" id="2.1"><h4>2.1 Sequence</h4></br> | + | <p align="justify"><div id="2.1"><h4>2.1 Sequence</h4></div></br> |
- | < | + | <div id="2.1.1"><h5>2.1.1 Needleman-Wunsch Algorithm</h5></div> |
The Needleman-Wunsch algorithm was first published in1970 by Saul B. Needleman and Christian D. Wunsch. It performs a global alignment of two sequences and is mostly used in bioinformatics to align protein or nucleotide sequence. Our software applied this algorithm in the alignment of DNA and amino acid sequences.<br/><br/> | The Needleman-Wunsch algorithm was first published in1970 by Saul B. Needleman and Christian D. Wunsch. It performs a global alignment of two sequences and is mostly used in bioinformatics to align protein or nucleotide sequence. Our software applied this algorithm in the alignment of DNA and amino acid sequences.<br/><br/> | ||
Line 307: | Line 307: | ||
CGAGAC - - GT - - - | CGAGAC - - GT - - - | ||
</em></strong></p> | </em></strong></p> | ||
- | < | + | <div id="2.1.2"><h5>2.1.2 A Supplementary Game</h5></div> |
<p align="justify">The rows and columns in the GRN matrix can be regarded as vectors containing the regulated or the regulating information. The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. Biologists usually think the more similar two sequences are, the more likely they have similar behaviors. Whether the ratio of genes with similar behaviors is positively correlated with gene similarity is essential to our project. So we obtained 1.6 million sets of data by pairwise alignment of all the 1748 units in the GRN of K-12. Each set of data consists of gene similarity and behavior similarity. The result is analyzed and plotted in the figure. The linear fit shows that the ratio is positively correlated with the similarity.</p><br/> | <p align="justify">The rows and columns in the GRN matrix can be regarded as vectors containing the regulated or the regulating information. The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. Biologists usually think the more similar two sequences are, the more likely they have similar behaviors. Whether the ratio of genes with similar behaviors is positively correlated with gene similarity is essential to our project. So we obtained 1.6 million sets of data by pairwise alignment of all the 1748 units in the GRN of K-12. Each set of data consists of gene similarity and behavior similarity. The result is analyzed and plotted in the figure. The linear fit shows that the ratio is positively correlated with the similarity.</p><br/> | ||
Line 313: | Line 313: | ||
<p><strong>Figure 4.</strong>Linear fit of ratio-similarity relationship.</p></div> | <p><strong>Figure 4.</strong>Linear fit of ratio-similarity relationship.</p></div> | ||
<p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p> | <p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p> | ||
- | + | ||
- | < | + | <div id="2.2"><h4>2.2 Filtering</h4></div> |
- | < | + | <div id="2.2.1"><h5>2.2.1 Random Noise</h5></div> |
<p class="bodytext"></p><p align="justify">Normally, the similarity of two sequences will not be zero. Some computational | <p class="bodytext"></p><p align="justify">Normally, the similarity of two sequences will not be zero. Some computational | ||
experiments were carried out to study the random sequence similarities. We randomly | experiments were carried out to study the random sequence similarities. We randomly | ||
Line 324: | Line 324: | ||
<img src="https://static.igem.org/mediawiki/igem.org/8/89/USTC_Software_Figure_4.png" /> | <img src="https://static.igem.org/mediawiki/igem.org/8/89/USTC_Software_Figure_4.png" /> | ||
<p><strong>Figure 5.</strong> Random similarity distribution</p></div> | <p><strong>Figure 5.</strong> Random similarity distribution</p></div> | ||
- | < | + | <div id="2.2.2"><h5>2.2.2 Filter</h5></div> |
<p align="justify">We need the genes highly similar to the exogenous one to interact with it. The program will | <p align="justify">We need the genes highly similar to the exogenous one to interact with it. The program will | ||
align the exogenous gene(query) with genes in the network(subject) and get the original | align the exogenous gene(query) with genes in the network(subject) and get the original | ||
Line 339: | Line 339: | ||
An example about filtring and consistency is presented in “Example”. | An example about filtring and consistency is presented in “Example”. | ||
</p> | </p> | ||
- | < | + | <div id="2.3"><h4>2.3 Regulation Calculation</h4></div> |
<p align="justify">If there is a three-unit network and they interact with each other as it is shown in the figure. | <p align="justify">If there is a three-unit network and they interact with each other as it is shown in the figure. | ||
The regulation is described by the GRN matrix.</p> | The regulation is described by the GRN matrix.</p> |
Revision as of 01:15, 28 October 2013
Methodologies
In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.
There are four parts of methodologies: Database, Operon Theory and Regulatory Model, Forward Analysis and Reverse Analysis.
Database
Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface's reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software. The format of all_info database: No. promoter_sequence gene_sequence gene_name ID left_position right_position promoter_name description The fetching module generates three files: old_GRN, all_info and uncertain_database.