Team:USTC-Software/Project/Method
From 2013.igem.org
Line 245: | Line 245: | ||
</em></strong></p> | </em></strong></p> | ||
- | <p>To find the highest score of alignment, in this algorithm, a two dimensional matrix F with sequences and scores was allocated. The score in row i, column j is denoted by Fij. There is one column for each character in sequence A and one row for each character in sequence B. Therefore, if we align sequences with sizes of n and m, the amount of memory taken up here is O(n,m).<br/><br/> | + | <p align="justify">To find the highest score of alignment, in this algorithm, a two dimensional matrix F with sequences and scores was allocated. The score in row i, column j is denoted by Fij. There is one column for each character in sequence A and one row for each character in sequence B. Therefore, if we align sequences with sizes of n and m, the amount of memory taken up here is O(n,m).<br/><br/> |
As the algorithm going on, Fij was calculated to be the optimal score by the principle as following:<br/> | As the algorithm going on, Fij was calculated to be the optimal score by the principle as following:<br/> | ||
Line 271: | Line 271: | ||
</div> | </div> | ||
- | <p>After the matrix F was computed, Fnm would be the maximum score among all possible alignment.<br/><br/> | + | <p align="justify">After the matrix F was computed, Fnm would be the maximum score among all possible alignment.<br/><br/> |
If you want to see the optimal alignment, you can trace back from Fnm by comparing three possible sources mentioned in the above code (Match, Insert and Delete). If Match, then Aj and Bi are aligned, if Insert, Bi was aligned with a gap and if Delete, then Aj and a gap are aligned. Also, you may find there are not only one optimal alignment.<br/><br/> | If you want to see the optimal alignment, you can trace back from Fnm by comparing three possible sources mentioned in the above code (Match, Insert and Delete). If Match, then Aj and Bi are aligned, if Insert, Bi was aligned with a gap and if Delete, then Aj and a gap are aligned. Also, you may find there are not only one optimal alignment.<br/><br/> | ||
Line 294: | Line 294: | ||
" /> | " /> | ||
<p><strong>Figure 4.</strong>Linear fit of ratio-similarity relationship.</p></div> | <p><strong>Figure 4.</strong>Linear fit of ratio-similarity relationship.</p></div> | ||
- | <p>Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p> | + | <p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p> |
</div> | </div> | ||
Line 344: | Line 344: | ||
The regulation is described by the GRN matrix.</p> | The regulation is described by the GRN matrix.</p> | ||
<div align="center"><img src="https://static.igem.org/mediawiki/igem.org/8/8a/USTC_Software_Figure_5.png" /> | <div align="center"><img src="https://static.igem.org/mediawiki/igem.org/8/8a/USTC_Software_Figure_5.png" /> | ||
- | <p | + | <p align="justify"><strong>Figure 6.</strong> Example network and its GRN matrix.</p></div> |
- | <p | + | <p align="justify">If D is the exogenous unit, we can obtain three similarity data sets of D with the units in the |
original GRN: | original GRN: | ||
<li style="margin-left:40px;">Promoter sequence similarity</li> | <li style="margin-left:40px;">Promoter sequence similarity</li> | ||
Line 356: | Line 356: | ||
<div align="center"><img src="https://static.igem.org/mediawiki/igem.org/9/97/USTC_Software_Figure_6.png" /> | <div align="center"><img src="https://static.igem.org/mediawiki/igem.org/9/97/USTC_Software_Figure_6.png" /> | ||
<p><strong>Figure 7.</strong> Mathematical Equivalence</p></div> | <p><strong>Figure 7.</strong> Mathematical Equivalence</p></div> | ||
- | <p>When filling the column, D is compared with the regulators of the unit in each row. The | + | <p align="justify">When filling the column, D is compared with the regulators of the unit in each row. The |
regulations in the row are consider separately and marked as “positive group” and | regulations in the row are consider separately and marked as “positive group” and | ||
“negative group”. The average similarity of each group represents the distance between | “negative group”. The average similarity of each group represents the distance between |
Revision as of 15:39, 27 September 2013
Methodologies
In order to simulate the GRN’s working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm. There are five parts of methodologies: Fetch Database, Alignment Analyze, New Network Construction, Network Model and Predict.
Fetch Database
Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface’s reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software. The format of all_info database: No. promoter_sequence gene_sequence gene_name ID left_position right_position promoter_name description The fetching module generates three files: old_GRN, all_info and uncertain_database.
Operon Theory and Regulatory Model
New Network Construction
If there is a three-unit network and they interact with each other as it is shown in the figure. The regulation is described by the GRN matrix.
Figure 6. Example network and its GRN matrix.
If D is the exogenous unit, we can obtain three similarity data sets of D with the units in the original GRN:
The construction is equivalent to add a new column and a row into the original matrix.
Figure 7. Mathematical Equivalence
When filling the column, D is compared with the regulators of the unit in each row. The
regulations in the row are consider separately and marked as “positive group” and
“negative group”. The average similarity of each group represents the distance between
the exogenous unit and the group. D is supposed to have the larger one’s regulatory
direction(positive or negative). The regulatory intensity is the weight average regulation of
the chose group. The weight here is the amino acid sequence similarity.
There are two conditions when fill the new row:
1. There are units having the same promoter as the exogenous unit.
2. There is no units having the same promoter as the exogenous unit.
In condition 1, the units sharing the same promoter with the new member are picked out,
and the following steps are the same as the construction of the column. The difference is
the similarity used here is the gene sequence similarity. As explained in the regulation
model part, the promoter is the main regulatory region, but the following sequence is also
considered. Now the promoter is the same, so what we focus on are the gene sequences.
In condition 2, the process is almost the same as constructing the new column. Promoter
similarity is used because it is the main region.
Figure 8. Construct New GRN
Network Model
Network analysis includes finding stable condition of network, adding new gene, finding new stable condition and changes from original condition to new condition. We use densities of materials to describe network condition. If all material densities are time-invariant, we can say the network condition is stable.
Predict
In some cases, importing exogenous gene is for enhancing or suppressing the expression of some specific genes in engineered bacteria itself. But it is hard to choose an appropriate regulatory gene. Our software analyzes the GRN forward as well as simulates by optimization algorithm backward for giving a reference of choosing to the users. Our software not only focused on the direct regulation but also focused on the global GRN. In the same time, controlling the expression of multiple genes in network has been realized by global prediction. What’s more, Particle Swarm Optimization (PSO) Algorithm makes it possible.
Database
This file contains the regulation between Transcription Factors.
This file contains the regulation between Transcription Factors and Genes
This file contains the information about all genes in E-coli K-12
This file contains the information about all promoters in E-coli K-12
This file contains the information about all Transcription Units in E-coli K-12