Team:USTC-Software/Project/Method
From 2013.igem.org
Line 53: | Line 53: | ||
<div id="abstract"> | <div id="abstract"> | ||
<h1 align="justify">Methodologies</h1> | <h1 align="justify">Methodologies</h1> | ||
- | <p align="justify">In order to simulate the | + | <p align="justify">In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.</br><br/> |
There are five parts of methodologies: Fetch Database, Alignment Analyze, New Network Construction, Network Model and Predict. | There are five parts of methodologies: Fetch Database, Alignment Analyze, New Network Construction, Network Model and Predict. | ||
</p> | </p> | ||
Line 73: | Line 73: | ||
<div class="jobs_trigger"><strong>Fetch Regulation</strong></div> | <div class="jobs_trigger"><strong>Fetch Regulation</strong></div> | ||
<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">In GRN, there are two kinds of files: <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_tf.txt">TF to TF</a> and <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt">TF to Gene</a>. Since the database about the regulation between TFs and Genes contains only one-way interaction, the matrix of GRN is a rectangle.</br></br> | <div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">In GRN, there are two kinds of files: <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_tf.txt">TF to TF</a> and <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt">TF to Gene</a>. Since the database about the regulation between TFs and Genes contains only one-way interaction, the matrix of GRN is a rectangle.</br></br> | ||
- | First of all, read the regulation relationship of TFs. Our software filters the documentation of RegulonDB on the head of all files and then reads the name of regulate and regulated TF, which is also the name of its genes, one by one. In the same time, our software numerates the genes and stores their names into an | + | First of all, read the regulation relationship of TFs. Our software filters the documentation of RegulonDB on the head of all files and then reads the name of regulate and regulated TF, which is also the name of its genes, one by one. In the same time, our software numerates the genes and stores their names into an objects' array of genetic data. </br></br> |
The format of regulation database:</br> | The format of regulation database:</br> | ||
TF_name TF_name +/-/+-</br></br> | TF_name TF_name +/-/+-</br></br> | ||
- | The regulation of TFs has been put into a square matrix whose row is the regulator and column is the one regulated by. To make our GRN as complete as possible, the regulation between TF and genes has joined into the matrix. The one-way interaction results that we must read the TF in order to fulfill the regulator before completing the TF to | + | The regulation of TFs has been put into a square matrix whose row is the regulator and column is the one regulated by. To make our GRN as complete as possible, the regulation between TF and genes has joined into the matrix. The one-way interaction results that we must read the TF in order to fulfill the regulator before completing the TF to gene's regulation in the same way of TF to TF. </br></br> |
The format of regulation database:</br> | The format of regulation database:</br> | ||
TF_name Gene_name +/-/+-</br></br> | TF_name Gene_name +/-/+-</br></br> | ||
Line 95: | Line 95: | ||
ID_assigned_by_RegulonDB Gene_name Left_end_position Right_end_position DNA_strand Product_type Product_name Start_codon_sequence Stop_codon_sequence Gene_sequence</br></br> | ID_assigned_by_RegulonDB Gene_name Left_end_position Right_end_position DNA_strand Product_type Product_name Start_codon_sequence Stop_codon_sequence Gene_sequence</br></br> | ||
- | The label of the map vector is gene name which will be picked out based on the names read in regulation matrix before. It is really fast using the binary tree method to find the specific genetic information and store them into a specific object. Those information includes gene ID, left position, right position, gene description and gene sequence. The gene ID is used to link to | + | The label of the map vector is gene name which will be picked out based on the names read in regulation matrix before. It is really fast using the binary tree method to find the specific genetic information and store them into a specific object. Those information includes gene ID, left position, right position, gene description and gene sequence. The gene ID is used to link to RegulonDB's gene details; The left position is used to find its specific transcription unit; The right position is used to figure out the base amount; The description of genes is used to distinguish the RNA and protein; The sequence is used to predict the regulation by alignment. |
</p> | </p> | ||
Line 103: | Line 103: | ||
<div class="jobs_trigger"> <strong>Fetch Promoter Info</strong></div> | <div class="jobs_trigger"> <strong>Fetch Promoter Info</strong></div> | ||
- | <div class="jobs_item" style="display: none;"><p align="justify">All promoter information has been deposited into a file named promoter_info which could be downloaded <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/PromoterSet.txt">here</a>. But we also need transcription unit information because the information files about promoter do not contain all | + | <div class="jobs_item" style="display: none;"><p align="justify">All promoter information has been deposited into a file named promoter_info which could be downloaded <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/PromoterSet.txt">here</a>. But we also need transcription unit information because the information files about promoter do not contain all genes' names backward. “TU Info” file, which can be downloaded <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/TUSet.txt">here</a>, contains the starting position of each TU and its promoter name. Our software picks out the starting position into a integer array. Using the left position picked out in gene info, our software would find out which unit the gene belongs to through dichotomy method and then stores the name of promoter into corresponding object.</br></br> |
The format of TU info database:</br> | The format of TU info database:</br> | ||
Operon_name Unit_name promoter_name Transcription_start_site ......</br></br> | Operon_name Unit_name promoter_name Transcription_start_site ......</br></br> | ||
- | The principle of fetching information of promoters is same as fetching | + | The principle of fetching information of promoters is same as fetching genes's. Our software stores the promoter information from the file named “promoter_info” in a “map” which could be used to pick out the promoter sequence by searching promoter name through binary tree method.</br></br> |
The format of Promoter Info database:</br> | The format of Promoter Info database:</br> | ||
Promoter_ID_assigned_by_RegulonDB Promoter_name</br></br> | Promoter_ID_assigned_by_RegulonDB Promoter_name</br></br> | ||
- | The sequence of promoter will be used in the alignment method in next module which could make a prediction of exogenous | + | The sequence of promoter will be used in the alignment method in next module which could make a prediction of exogenous genes' regulation pattern. |
</p> </div> | </p> </div> | ||
Line 118: | Line 118: | ||
<div class="jobs_trigger"> <strong>Integration</strong></div> | <div class="jobs_trigger"> <strong>Integration</strong></div> | ||
<div class="jobs_item" style="display: block;"><p align="justify"> | <div class="jobs_item" style="display: block;"><p align="justify"> | ||
- | Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical | + | Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface's reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software.</br></br> |
The format of all_info database:</br> | The format of all_info database:</br> | ||
No. promoter_sequence gene_sequence gene_name ID left_position right_position promoter_name description</br> | No. promoter_sequence gene_sequence gene_name ID left_position right_position promoter_name description</br> | ||
Line 142: | Line 142: | ||
<div class="jobs_item" style="display: none;"><p class="bodytext"></p> | <div class="jobs_item" style="display: none;"><p class="bodytext"></p> | ||
<p align="justify">In genetics, an operon is a functioning unit of genomic DNA containing a cluster of genes | <p align="justify">In genetics, an operon is a functioning unit of genomic DNA containing a cluster of genes | ||
- | under the control of a single regulatory signal or promoter. | + | under the control of a single regulatory signal or promoter. The genes contained in the |
- | operon are either expressed together or not at all. | + | operon are either expressed together or not at all. Several genes must be both cotranscribed |
and co-regulated to define an operon.<br /><br /> | and co-regulated to define an operon.<br /><br /> | ||
- | The first time | + | The first time "operon" was proposed is in a paper of French Academic Science, 1960. |
The lac operon of the model bacterium E. coli was discovered and provides a typical | The lac operon of the model bacterium E. coli was discovered and provides a typical | ||
example of operon function. It consists a promoter, an operator, three structural genes and | example of operon function. It consists a promoter, an operator, three structural genes and | ||
Line 359: | Line 359: | ||
regulations in the row are consider separately and marked as “positive group” and | regulations in the row are consider separately and marked as “positive group” and | ||
“negative group”. The average similarity of each group represents the distance between | “negative group”. The average similarity of each group represents the distance between | ||
- | the exogenous unit and the group. D is supposed to have the larger | + | the exogenous unit and the group. D is supposed to have the larger one's regulatory |
direction(positive or negative). The regulatory intensity is the weight average regulation of | direction(positive or negative). The regulatory intensity is the weight average regulation of | ||
the chose group. The weight here is the amino acid sequence similarity.<br /><br /> | the chose group. The weight here is the amino acid sequence similarity.<br /><br /> | ||
Line 399: | Line 399: | ||
<img src="https://static.igem.org/mediawiki/2013/e/e0/USTC_Software_1.png" style="width:600px;"/> | <img src="https://static.igem.org/mediawiki/2013/e/e0/USTC_Software_1.png" style="width:600px;"/> | ||
<br/></br> | <br/></br> | ||
- | The left side of the equation is the derivative x(density) on t(time).”qi”,”pi”,”ri”,”mi”,”ni” are parameters, which determine the intensity of regulation."ri" is degradation rate. Mji is exponent. M is a matrix whose dimensions are equivalent to R's. Mji is 0 or ranges from 0.5 to 1.2 or ranges from -1.2 to 0.5. For the material of original network, if Rji=1,Mji ranges from 0.5 to 1.2;if Rji=-1, Mji ranges from -1.2 to -0.5; if Rji=2;Mji ranges from -1.2 to 0.5 or 0.5 to 1. These | + | The left side of the equation is the derivative x(density) on t(time).”qi”,”pi”,”ri”,”mi”,”ni” are parameters, which determine the intensity of regulation."ri" is degradation rate. Mji is exponent. M is a matrix whose dimensions are equivalent to R's. Mji is 0 or ranges from 0.5 to 1.2 or ranges from -1.2 to 0.5. For the material of original network, if Rji=1,Mji ranges from 0.5 to 1.2;if Rji=-1, Mji ranges from -1.2 to -0.5; if Rji=2;Mji ranges from -1.2 to 0.5 or 0.5 to 1. These Mjis' absolute values are given randomly by program. If Rji=0, Mji=0. |
</br>For the new material, | </br>For the new material, | ||
<br/></br> | <br/></br> | ||
Line 410: | Line 410: | ||
<div class="jobs_trigger"><strong>Find Stable Network Condition</strong></div> | <div class="jobs_trigger"><strong>Find Stable Network Condition</strong></div> | ||
<div class="jobs_item" style="display: none;"><p align="justify"> | <div class="jobs_item" style="display: none;"><p align="justify"> | ||
- | Stable condition is the condition in which densities are time-invariant. We store material densities in a vector and solve the differential equations with | + | Stable condition is the condition in which densities are time-invariant. We store material densities in a vector and solve the differential equations with Euler's formula, which is like below |
<br/></br> | <br/></br> | ||
<img src="https://static.igem.org/mediawiki/2013/e/e6/USTC_Software_3.png" style="width:600px;"/> | <img src="https://static.igem.org/mediawiki/2013/e/e6/USTC_Software_3.png" style="width:600px;"/> | ||
Line 454: | Line 454: | ||
<div class="jobs_trigger"><strong>Predict Abstract</strong></div> | <div class="jobs_trigger"><strong>Predict Abstract</strong></div> | ||
- | <div class="jobs_item" style="display: block;"><p align="justify">In some cases, importing exogenous gene is for enhancing or suppressing the expression of some specific genes in engineered bacteria itself. But it is hard to choose an appropriate regulatory gene. Our software analyzes the GRN forward as well as simulates by optimization algorithm backward for giving a reference of choosing to the users. Our software not only focused on the direct regulation but also focused on the global GRN. In the same time, controlling the expression of multiple genes in network has been realized by global prediction. | + | <div class="jobs_item" style="display: block;"><p align="justify">In some cases, importing exogenous gene is for enhancing or suppressing the expression of some specific genes in engineered bacteria itself. But it is hard to choose an appropriate regulatory gene. Our software analyzes the GRN forward as well as simulates by optimization algorithm backward for giving a reference of choosing to the users. Our software not only focused on the direct regulation but also focused on the global GRN. In the same time, controlling the expression of multiple genes in network has been realized by global prediction. What's more, Particle Swarm Optimization (PSO) Algorithm makes it possible.</p> |
</div> | </div> | ||
Line 460: | Line 460: | ||
<div class="jobs_trigger"><strong>Input Target</strong></div> | <div class="jobs_trigger"><strong>Input Target</strong></div> | ||
<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">Before prediction, the expression of specific genes which the experimenter needs should be input into our software as well as the improvement or depression. The number of target gene is SEVEN at most.</br></br> | <div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">Before prediction, the expression of specific genes which the experimenter needs should be input into our software as well as the improvement or depression. The number of target gene is SEVEN at most.</br></br> | ||
- | It is a must that figuring out the strongest and weakest expression strength before inputting the extreme cases into the target expression. The way to find out the strongest and weakest expression is modeling the | + | It is a must that figuring out the strongest and weakest expression strength before inputting the extreme cases into the target expression. The way to find out the strongest and weakest expression is modeling the GRN's steady state by a large amount of random regulation from -1 and 1. On the other hand, the expression of genes unpicked by the users should be stable as much as possible. The initial strength of expression is calculated by modeling the original GRN with Hill's equation. |
</p> | </p> | ||
</div> | </div> | ||
Line 466: | Line 466: | ||
<div class="jobs_trigger"><strong>Particle Swarm Optimization</strong></div> | <div class="jobs_trigger"><strong>Particle Swarm Optimization</strong></div> | ||
<div class="jobs_item" style="display: none;"><p align="justify"> | <div class="jobs_item" style="display: none;"><p align="justify"> | ||
- | For getting the best regulation, our software uses PSO algorithm based on 30 particles to simulate the | + | For getting the best regulation, our software uses PSO algorithm based on 30 particles to simulate the GRN's changing. First of all, the interactions of regulator and regulated-by have been put into those particles in random so that each particle will have the whole set of regulation. Secondly, the variance between target expressions and stable expression of new GRN have been regarded as the optimize requirements in PSO algorithm. As a result, the minimal variance of 30 particles is the global optimum and the minimal variance of the procession in one particle is the local optimum. Then, taking a step towards global and local optimum as well as considering the inertia and perturbation avoids falling into the sub-optimal condition.</br></br> |
At last, when the variance of expression reaches an acceptable range, our software picks out and saves the best global optimum particle following by the movement of those particles stop.</br></br> | At last, when the variance of expression reaches an acceptable range, our software picks out and saves the best global optimum particle following by the movement of those particles stop.</br></br> | ||
We constantly revises the factors in PSO algorithm by machine learning method for accurate simulation with a fast PSO particle-motion equation. At the same time, our software also filter the result of regulatory value which is more intuitive. | We constantly revises the factors in PSO algorithm by machine learning method for accurate simulation with a fast PSO particle-motion equation. At the same time, our software also filter the result of regulatory value which is more intuitive. |
Revision as of 16:56, 27 September 2013
Methodologies
In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.
There are five parts of methodologies: Fetch Database, Alignment Analyze, New Network Construction, Network Model and Predict.
Fetch Database
Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface's reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software. The format of all_info database: No. promoter_sequence gene_sequence gene_name ID left_position right_position promoter_name description The fetching module generates three files: old_GRN, all_info and uncertain_database.
Operon Theory and Regulatory Model
New Network Construction
If there is a three-unit network and they interact with each other as it is shown in the figure. The regulation is described by the GRN matrix.
Figure 6. Example network and its GRN matrix.
If D is the exogenous unit, we can obtain three similarity data sets of D with the units in the original GRN:
The construction is equivalent to add a new column and a row into the original matrix.
Figure 7. Mathematical Equivalence
When filling the column, D is compared with the regulators of the unit in each row. The
regulations in the row are consider separately and marked as “positive group” and
“negative group”. The average similarity of each group represents the distance between
the exogenous unit and the group. D is supposed to have the larger one's regulatory
direction(positive or negative). The regulatory intensity is the weight average regulation of
the chose group. The weight here is the amino acid sequence similarity.
There are two conditions when fill the new row:
1. There are units having the same promoter as the exogenous unit.
2. There is no units having the same promoter as the exogenous unit.
In condition 1, the units sharing the same promoter with the new member are picked out,
and the following steps are the same as the construction of the column. The difference is
the similarity used here is the gene sequence similarity. As explained in the regulation
model part, the promoter is the main regulatory region, but the following sequence is also
considered. Now the promoter is the same, so what we focus on are the gene sequences.
In condition 2, the process is almost the same as constructing the new column. Promoter
similarity is used because it is the main region.
Figure 8. Construct New GRN
Network Model
Network analysis includes finding stable condition of network, adding new gene, finding new stable condition and changes from original condition to new condition. We use densities of materials to describe network condition. If all material densities are time-invariant, we can say the network condition is stable.
Predict
In some cases, importing exogenous gene is for enhancing or suppressing the expression of some specific genes in engineered bacteria itself. But it is hard to choose an appropriate regulatory gene. Our software analyzes the GRN forward as well as simulates by optimization algorithm backward for giving a reference of choosing to the users. Our software not only focused on the direct regulation but also focused on the global GRN. In the same time, controlling the expression of multiple genes in network has been realized by global prediction. What's more, Particle Swarm Optimization (PSO) Algorithm makes it possible.
Database
This file contains the regulation between Transcription Factors.
This file contains the regulation between Transcription Factors and Genes
This file contains the information about all genes in E-coli K-12
This file contains the information about all promoters in E-coli K-12
This file contains the information about all Transcription Units in E-coli K-12