Team:USTC-Software/Project/Method
From 2013.igem.org
(10 intermediate revisions not shown) | |||
Line 32: | Line 32: | ||
<ul> | <ul> | ||
- | <li><a href="# | + | <!--li><a href="#sequence" class="button">2.1 Sequence</a></br> |
- | <a href="# | + | <a href="#nwa" class="button" id="subbutton">2.1.1 Needleman-Wunsch Algorithm</a></br> |
- | <a href="# | + | <a href="#asg" class="button" id="subbutton">2.1.2 A Supplementary Game</a> |
</li> | </li> | ||
- | <li><a href="# | + | <li><a href="#filtering" class="button">2.2 Filtering</a></br> |
- | <a href="# | + | <a href="#rn" class="button" id="subbutton">2.2.1 Random Noise</a></br> |
- | <a href="# | + | <a href="#filter" class="button" id="subbutton">2.2.2 Filter</a> |
</li> | </li> | ||
- | <li><a href="# | + | <li><a href="#rc" class="button">2.3 Regulation Calculation</a></li> |
- | <li><a href="#main" class="button">Top</a></li> | + | <li><a href="#main" class="button">Top</a></li--> |
+ | |||
+ | |||
+ | <li><a href="#Fetch_Database" class="button">Database</a></li> | ||
+ | <li><a href="#Alignment_Analyze" class="button">Operon Theory and Regulatory Model</a></li> | ||
+ | <li><a href="#fa" class="button">Forward Analysis</a></li> | ||
+ | <li><a href="#ra" class="button">Reverse Analysis</a></li> | ||
+ | <li><a href="#reference" class="button">Reference</a></li> | ||
+ | <li><a href="#main" class="button">Top</a></li> | ||
</ul> | </ul> | ||
Line 68: | Line 76: | ||
<p align="justify">In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.</br><br/> | <p align="justify">In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.</br><br/> | ||
- | <img src="https://static.igem.org/mediawiki/2013/6/6b/USTC_Software_FLOW.png" style="width:1000px;"/> | + | <img src="https://static.igem.org/mediawiki/2013/6/6b/USTC_Software_FLOW.png" style="width:1000px;"/></br> |
There are four parts of methodologies: Database, Operon Theory and Regulatory Model, Forward Analysis and Reverse Analysis. | There are four parts of methodologies: Database, Operon Theory and Regulatory Model, Forward Analysis and Reverse Analysis. | ||
Line 92: | Line 100: | ||
The format of regulation database:</br> | The format of regulation database:</br> | ||
TF_name TF_name +/-/+-</br></br> | TF_name TF_name +/-/+-</br></br> | ||
- | <img src="https://static.igem.org/mediawiki/2013/6/69/USTC_Software_TT.jpg"/> | + | <div align="center"><img src="https://static.igem.org/mediawiki/2013/6/69/USTC_Software_TT.jpg"/></div></br> |
The regulation of TFs has been put into a square matrix whose row is the regulator and column is the one regulated by. To make our GRN as complete as possible, the regulation between TF and genes has joined into the matrix. The one-way interaction results that we must read the TF in order to fulfill the regulator before completing the TF to gene's regulation in the same way of TF to TF. </br></br> | The regulation of TFs has been put into a square matrix whose row is the regulator and column is the one regulated by. To make our GRN as complete as possible, the regulation between TF and genes has joined into the matrix. The one-way interaction results that we must read the TF in order to fulfill the regulator before completing the TF to gene's regulation in the same way of TF to TF. </br></br> | ||
The format of regulation database:</br> | The format of regulation database:</br> | ||
TF_name Gene_name +/-/+-</br></br> | TF_name Gene_name +/-/+-</br></br> | ||
- | <img src="https://static.igem.org/mediawiki/2013/4/47/USTC_Software_TG.jpg"/> | + | <div align="center"><img src="https://static.igem.org/mediawiki/2013/4/47/USTC_Software_TG.jpg"/></div></br> |
At last, a regulatory matrix whose row represents regulate gene (TF) and whose column represents gene regulated by (TF+Gene) has been output into a file called “old_GRN” in root directory. The values in GRN matrix are regulations in which “1” means positive activation, “-1” means repression and “0” means no relationship. There have been some regulations both positive and negative identified regulations are determined by the experimental environment. As a result, our software picks out those uncertain genes and stores them into a file named “uncertain_database”.</br></br> | At last, a regulatory matrix whose row represents regulate gene (TF) and whose column represents gene regulated by (TF+Gene) has been output into a file called “old_GRN” in root directory. The values in GRN matrix are regulations in which “1” means positive activation, “-1” means repression and “0” means no relationship. There have been some regulations both positive and negative identified regulations are determined by the experimental environment. As a result, our software picks out those uncertain genes and stores them into a file named “uncertain_database”.</br></br> | ||
The format of uncertain database:</br> | The format of uncertain database:</br> | ||
Line 110: | Line 118: | ||
The format of Gene Info database:</br> | The format of Gene Info database:</br> | ||
ID_assigned_by_RegulonDB Gene_name Left_end_position Right_end_position DNA_strand Product_type Product_name Start_codon_sequence Stop_codon_sequence Gene_sequence</br></br> | ID_assigned_by_RegulonDB Gene_name Left_end_position Right_end_position DNA_strand Product_type Product_name Start_codon_sequence Stop_codon_sequence Gene_sequence</br></br> | ||
- | <img src="https://static.igem.org/mediawiki/2013/4/45/USTC_Software_GI.jpg"/> | + | <div align="center"><img src="https://static.igem.org/mediawiki/2013/4/45/USTC_Software_GI.jpg"/></div></br> |
The label of the map vector is gene name which will be picked out based on the names read in regulation matrix before. It is really fast using the binary tree method to find the specific genetic information and store them into a specific object. Those information includes gene ID, left position, right position, gene description and gene sequence. The gene ID is used to link to RegulonDB's gene details; The left position is used to find its specific transcription unit; The right position is used to figure out the base amount; The description of genes is used to distinguish the RNA and protein; The sequence is used to predict the regulation by alignment. | The label of the map vector is gene name which will be picked out based on the names read in regulation matrix before. It is really fast using the binary tree method to find the specific genetic information and store them into a specific object. Those information includes gene ID, left position, right position, gene description and gene sequence. The gene ID is used to link to RegulonDB's gene details; The left position is used to find its specific transcription unit; The right position is used to figure out the base amount; The description of genes is used to distinguish the RNA and protein; The sequence is used to predict the regulation by alignment. | ||
Line 122: | Line 130: | ||
The format of TU info database:</br> | The format of TU info database:</br> | ||
Operon_name Unit_name promoter_name Transcription_start_site ......</br></br> | Operon_name Unit_name promoter_name Transcription_start_site ......</br></br> | ||
- | <img src="https://static.igem.org/mediawiki/2013/1/1e/USTC_Software_TI.jpg"/> | + | <div align="center"><img src="https://static.igem.org/mediawiki/2013/1/1e/USTC_Software_TI.jpg"/></div></br> |
The principle of fetching information of promoters is same as fetching genes's. Our software stores the promoter information from the file named “promoter_info” in a “map” which could be used to pick out the promoter sequence by searching promoter name through binary tree method.</br></br> | The principle of fetching information of promoters is same as fetching genes's. Our software stores the promoter information from the file named “promoter_info” in a “map” which could be used to pick out the promoter sequence by searching promoter name through binary tree method.</br></br> | ||
The format of Promoter Info database:</br> | The format of Promoter Info database:</br> | ||
Promoter_ID_assigned_by_RegulonDB Promoter_name</br></br> | Promoter_ID_assigned_by_RegulonDB Promoter_name</br></br> | ||
- | <img src="https://static.igem.org/mediawiki/2013/8/8a/USTC_Software_PI.jpg"/> | + | <div align="center"><img src="https://static.igem.org/mediawiki/2013/8/8a/USTC_Software_PI.jpg"/></div></br> |
The sequence of promoter will be used in the alignment method in next module which could make a prediction of exogenous genes' regulation pattern. | The sequence of promoter will be used in the alignment method in next module which could make a prediction of exogenous genes' regulation pattern. | ||
</p> </div> | </p> </div> | ||
Line 249: | Line 257: | ||
</p> | </p> | ||
<h3>2 Simalarity Analysis</h3> | <h3>2 Simalarity Analysis</h3> | ||
- | <p align="justify" id=" | + | <p align="justify"><div id="sequence"><h4>2.1 Sequence</h4></div></br> |
- | < | + | <div id="nwa"><h5>2.1.1 Needleman-Wunsch Algorithm</h5></div> |
The Needleman-Wunsch algorithm was first published in1970 by Saul B. Needleman and Christian D. Wunsch. It performs a global alignment of two sequences and is mostly used in bioinformatics to align protein or nucleotide sequence. Our software applied this algorithm in the alignment of DNA and amino acid sequences.<br/><br/> | The Needleman-Wunsch algorithm was first published in1970 by Saul B. Needleman and Christian D. Wunsch. It performs a global alignment of two sequences and is mostly used in bioinformatics to align protein or nucleotide sequence. Our software applied this algorithm in the alignment of DNA and amino acid sequences.<br/><br/> | ||
Line 307: | Line 315: | ||
CGAGAC - - GT - - - | CGAGAC - - GT - - - | ||
</em></strong></p> | </em></strong></p> | ||
- | < | + | <div id="asg"><h5>2.1.2 A Supplementary Game</h5></div> |
<p align="justify">The rows and columns in the GRN matrix can be regarded as vectors containing the regulated or the regulating information. The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. Biologists usually think the more similar two sequences are, the more likely they have similar behaviors. Whether the ratio of genes with similar behaviors is positively correlated with gene similarity is essential to our project. So we obtained 1.6 million sets of data by pairwise alignment of all the 1748 units in the GRN of K-12. Each set of data consists of gene similarity and behavior similarity. The result is analyzed and plotted in the figure. The linear fit shows that the ratio is positively correlated with the similarity.</p><br/> | <p align="justify">The rows and columns in the GRN matrix can be regarded as vectors containing the regulated or the regulating information. The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. Biologists usually think the more similar two sequences are, the more likely they have similar behaviors. Whether the ratio of genes with similar behaviors is positively correlated with gene similarity is essential to our project. So we obtained 1.6 million sets of data by pairwise alignment of all the 1748 units in the GRN of K-12. Each set of data consists of gene similarity and behavior similarity. The result is analyzed and plotted in the figure. The linear fit shows that the ratio is positively correlated with the similarity.</p><br/> | ||
Line 313: | Line 321: | ||
<p><strong>Figure 4.</strong>Linear fit of ratio-similarity relationship.</p></div> | <p><strong>Figure 4.</strong>Linear fit of ratio-similarity relationship.</p></div> | ||
<p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p> | <p align="justify">Although there are examples that a slight change in DNA sequence will significantly change the property of the gene, for example, sickle-cell disease, the influence is usually determined by the location and scale of the mutation. So the result is still convincing to some degree.</p> | ||
- | + | ||
- | < | + | <div id="filtering"><h4>2.2 Filtering</h4></div> |
- | < | + | <div id="rn"><h5>2.2.1 Random Noise</h5></div> |
<p class="bodytext"></p><p align="justify">Normally, the similarity of two sequences will not be zero. Some computational | <p class="bodytext"></p><p align="justify">Normally, the similarity of two sequences will not be zero. Some computational | ||
experiments were carried out to study the random sequence similarities. We randomly | experiments were carried out to study the random sequence similarities. We randomly | ||
Line 324: | Line 332: | ||
<img src="https://static.igem.org/mediawiki/igem.org/8/89/USTC_Software_Figure_4.png" /> | <img src="https://static.igem.org/mediawiki/igem.org/8/89/USTC_Software_Figure_4.png" /> | ||
<p><strong>Figure 5.</strong> Random similarity distribution</p></div> | <p><strong>Figure 5.</strong> Random similarity distribution</p></div> | ||
- | < | + | <div id="filter"><h5>2.2.2 Filter</h5></div> |
<p align="justify">We need the genes highly similar to the exogenous one to interact with it. The program will | <p align="justify">We need the genes highly similar to the exogenous one to interact with it. The program will | ||
align the exogenous gene(query) with genes in the network(subject) and get the original | align the exogenous gene(query) with genes in the network(subject) and get the original | ||
Line 339: | Line 347: | ||
An example about filtring and consistency is presented in “Example”. | An example about filtring and consistency is presented in “Example”. | ||
</p> | </p> | ||
- | < | + | <div id="rc"><h4>2.3 Regulation Calculation</h4></div> |
<p align="justify">If there is a three-unit network and they interact with each other as it is shown in the figure. | <p align="justify">If there is a three-unit network and they interact with each other as it is shown in the figure. | ||
The regulation is described by the GRN matrix.</p> | The regulation is described by the GRN matrix.</p> | ||
Line 459: | Line 467: | ||
</div> | </div> | ||
+ | <h2 id="reference">Reference</h2> | ||
+ | |||
+ | <p align="justify"> | ||
+ | |||
+ | Lei Z, Dai Y. Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction[J]. BMC bioinformatics, 2006, 7(1): 491.</br></br> | ||
+ | |||
+ | |||
+ | Ramoni M F, Sebastiani P, Kohane I S. Cluster analysis of gene expression dynamics[J]. Proceedings of the National Academy of Sciences, 2002, 99(14): 9121-9126.</br></br> | ||
+ | |||
+ | Thieffry D, Huerta A M, Pérez‐Rueda E, et al. From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli[J]. Bioessays, 1998, 20(5): 433-440.</br></br> | ||
+ | |||
+ | |||
+ | Eberhart R, Kennedy J. A new optimizer using particle swarm theory[C]//Micro Machine and Human Science, 1995. MHS'95., Proceedings of the Sixth International Symposium on. IEEE, 1995: 39-43.</br></br> | ||
+ | |||
+ | |||
+ | Jacob F, Perrin D, Sánchez C, et al. L'opéron: groupe de gènes à expression coordonnée par un opérateur [CR Acad. Sci. Paris 250 (1960) 1727–1729][J]. Comptes rendus biologies, 2005, 328(6): 514-520.</br></br> | ||
+ | |||
+ | |||
+ | Needleman S B, Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of molecular biology, 1970, 48(3): 443-453.</br></br> | ||
+ | |||
+ | |||
+ | Gama-Castro S, Jiménez-Jacinto V, Peralta-Gil M, et al. RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation[J]. Nucleic acids research, 2008, 36(suppl 1): D120-D124.</br></br> | ||
+ | |||
+ | Martınez-Antonio A, Collado-Vides J. Identifying global regulators in transcriptional regulatory networks in bacteria[J]. Current opinion in microbiology, 2003, 6(5): 482-489.</br></br> | ||
+ | |||
+ | |||
+ | Salgado H, Moreno-Hagelsieb G, Smith T F, et al. Operons in Escherichia coli: genomic analyses and predictions[J]. Proceedings of the National Academy of Sciences, 2000, 97(12): 6652-6657.</br></br> | ||
+ | |||
+ | |||
+ | Thieffry D, Salgado H, Huerta A M, et al. Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12[J]. Bioinformatics, 1998, 14(5): 391-400. | ||
+ | |||
+ | </p> | ||
<div class="jobs_trigger" style="display:none;"></div> | <div class="jobs_trigger" style="display:none;"></div> | ||
- | + | <div class="jobs_item" style="display: none;"><p></p></div> | |
</body> | </body> | ||
</html> | </html> |
Latest revision as of 00:46, 29 October 2013
Methodologies
In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.
There are four parts of methodologies: Database, Operon Theory and Regulatory Model, Forward Analysis and Reverse Analysis.
Database
Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface's reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software. The format of all_info database: No. promoter_sequence gene_sequence gene_name ID left_position right_position promoter_name description The fetching module generates three files: old_GRN, all_info and uncertain_database.
Operon Theory and Regulatory Model
Forward Analysis
Reverse Analysis
Reference
Lei Z, Dai Y. Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction[J]. BMC bioinformatics, 2006, 7(1): 491. Ramoni M F, Sebastiani P, Kohane I S. Cluster analysis of gene expression dynamics[J]. Proceedings of the National Academy of Sciences, 2002, 99(14): 9121-9126. Thieffry D, Huerta A M, Pérez‐Rueda E, et al. From specific gene regulation to genomic networks: a global analysis of transcriptional regulation in Escherichia coli[J]. Bioessays, 1998, 20(5): 433-440. Eberhart R, Kennedy J. A new optimizer using particle swarm theory[C]//Micro Machine and Human Science, 1995. MHS'95., Proceedings of the Sixth International Symposium on. IEEE, 1995: 39-43. Jacob F, Perrin D, Sánchez C, et al. L'opéron: groupe de gènes à expression coordonnée par un opérateur [CR Acad. Sci. Paris 250 (1960) 1727–1729][J]. Comptes rendus biologies, 2005, 328(6): 514-520. Needleman S B, Wunsch C D. A general method applicable to the search for similarities in the amino acid sequence of two proteins[J]. Journal of molecular biology, 1970, 48(3): 443-453. Gama-Castro S, Jiménez-Jacinto V, Peralta-Gil M, et al. RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and Textpresso navigation[J]. Nucleic acids research, 2008, 36(suppl 1): D120-D124. Martınez-Antonio A, Collado-Vides J. Identifying global regulators in transcriptional regulatory networks in bacteria[J]. Current opinion in microbiology, 2003, 6(5): 482-489. Salgado H, Moreno-Hagelsieb G, Smith T F, et al. Operons in Escherichia coli: genomic analyses and predictions[J]. Proceedings of the National Academy of Sciences, 2000, 97(12): 6652-6657. Thieffry D, Salgado H, Huerta A M, et al. Prediction of transcriptional regulatory sites in the complete genome sequence of Escherichia coli K-12[J]. Bioinformatics, 1998, 14(5): 391-400.