From 2013.igem.org

(Difference between revisions)

Revision as of 02:41, 26 September 2013

Slide

Take a gNAP before wearing your gloves! Genetic Network Analyze and Predict

The sketch and final GUI of gNAP!

We compare the result of our software with gene expression profile in literature.

We are USTC-Software!

Methodologies

Abstract
Fetch Database
Alignment Analyze
New Network Construction
Network Model
Predict

Methodologies

In order to simulate the GRN’s working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.
There are five parts of methodologies: Fetch Database, Alignment Analyze, New Network Construction, Network Model and Predict.

Fetch Database

Fetch Database Abstract

Fetching Regulation

Fetching Gene Info

Fetching Promoter Info

Integration

Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface’s reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software.

The format of all_info database:
No. promoter_sequence gene_sequence gene_name ID left_position right_position promoter_name description
The fetching module generates three files: old_GRN, all_info and uncertain_database.

Alignment Analyze

An example

Models

Prediction Model

Mathematical Description of The Network

Sequence similarity

New Network Construction

Filter

Construct A New Regulated Vector

Construct A New Regulating Vector

A Supplementary Game: Test of The Model

The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. A more intuitive way is using the vectorial angle to measured the similarity of two behaviors. But there are some zero vectors in the gene regulatory network which usually means the units either play the role of target or the regulator.
[Pic. 4 GRN matrix, target vector, regulator vector and their dot product]
We have tested the hypothesis by analyzing all 1748 regulation units of Escherichia coli, K-12, recorded in RegulonDB. By pairwise comparison of all these units, about 1.6 million sets of data was obtained. Each set of data consists of promoter sequence similarity, protein coding sequence similarity and behavior similarity. We hope to find some structure in the data that supports our hypothesis. And it is lucky enough to find there is a tendency showing the relationship between sequence similarity and behavior similarity(Pic. 2).
[Pic. 2 Sequence similarity and behavior similarity]
Sequence similarity is set as x axis and behavior similarity is set as y axis. Obviously sequence similarity is continuous-valued (from 0 to 1) and behavior similarity is discrete-valued. Values of behavior similarity determined by the dimension(N) of the vector are between -N and N. According to the result, promoter sequence similarity mainly distributes from 0.4 to 0.6, protein coding sequence similarity mainly distributes from 0 to 0.7 and behavior similarity mainly distributes from -3 to 5. As it is shown in Picture 4, high behavior similarity is partial to high sequence similarity. Peak value of behavior similarity, 17, appears where sequence similarity is 0.537. When behavior similarity value is fixed, for example, set behavior similarity as 8, it is obvious that the higher the sequence similarity is, the more intensive the dots are.

Network Model

Network Model Abstract

Network analysis includes finding stable condition of network, adding new gene, finding new stable condition and changes from original condition to new condition. We use densities of materials to describe network condition. If all material densities are time-invariant, we can say the network condition is stable.

Hill Equations

Find Stable Network Condition

Find Changes From Original Stable Condition to New Condition

Predict

Predict Abstract

In some cases, importing exogenous gene is for enhancing or suppressing the expression of some specific genes in engineered bacteria itself. But it is hard to choose an appropriate regulatory gene. Our software analyzes the GRN forward as well as simulates by optimization algorithm backward for giving a reference of choosing to the users. Our software not only focused on the direct regulation but also focused on the global GRN. In the same time, controlling the expression of multiple genes in network has been realized by global prediction. What’s more, Particle Swarm Optimization (PSO) Algorithm makes it possible.

Input Target

Particle Swarm Optimization

Filter

@@ Line 71: / Line 71: @@
    <div id="jobs_container">
 	         <div class="jobs_trigger"><strong>Fetching Regulation</strong></div>
-		 		<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">In GRN, there are two kinds of files: <a id="out" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_tf.txt">TF to TF</a> and <a id="out" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt">TF to Gene</a>. Since the database about the regulation between TFs and Genes contains only one-way interaction, the matrix of GRN is a rectangle.</br>
+		 		<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">In GRN, there are two kinds of files: <a id="out" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_tf.txt">TF to TF</a> and <a id="out" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt">TF to Gene</a>. Since the database about the regulation between TFs and Genes contains only one-way interaction, the matrix of GRN is a rectangle.</br></br>
-First of all, read the regulation relationship of TFs. Our software filters the documentation of RegulonDB on the head of all files and then reads the name of regulate and regulated TF, which is also the name of its genes, one by one. In the same time, our software numerates the genes and stores their names into an objects’ array of genetic data. </br>
+First of all, read the regulation relationship of TFs. Our software filters the documentation of RegulonDB on the head of all files and then reads the name of regulate and regulated TF, which is also the name of its genes, one by one. In the same time, our software numerates the genes and stores their names into an objects’ array of genetic data. </br></br>
 The format of regulation database:</br>
 TF_name &nbsp;&nbsp;&nbsp;TF_name &nbsp;&nbsp;&nbsp;+/-/+-</br></br>
-The regulation of TFs has been put into a square matrix whose row is the regulator and column is the one regulated by. To make our GRN as complete as possible, the regulation between TF and genes has joined into the matrix. The one-way interaction results that we must read the TF in order to fulfill the regulator before completing the TF to gene’s regulation in the same way of TF to TF. The format of regulation database:</br>
+The regulation of TFs has been put into a square matrix whose row is the regulator and column is the one regulated by. To make our GRN as complete as possible, the regulation between TF and genes has joined into the matrix. The one-way interaction results that we must read the TF in order to fulfill the regulator before completing the TF to gene’s regulation in the same way of TF to TF. </br></br>
+The format of regulation database:</br>
 TF_name &nbsp;&nbsp;&nbsp;Gene_name &nbsp;&nbsp;&nbsp;+/-/+-</br></br>
-At last, a regulatory matrix whose row represents regulate gene (TF) and whose column represents gene regulated by (TF+Gene) has been output into a file called “old_GRN” in root directory. The values in GRN matrix are regulations in which “1” means positive activation, “-1” means repression and “0” means no relationship. There have been some regulations both positive and negative identified regulations are determined by the experimental environment. As a result, our software picks out those uncertain genes and stores them into a file named “uncertain_database”.</br>
+At last, a regulatory matrix whose row represents regulate gene (TF) and whose column represents gene regulated by (TF+Gene) has been output into a file called “old_GRN” in root directory. The values in GRN matrix are regulations in which “1” means positive activation, “-1” means repression and “0” means no relationship. There have been some regulations both positive and negative identified regulations are determined by the experimental environment. As a result, our software picks out those uncertain genes and stores them into a file named “uncertain_database”.</br></br>
 The format of uncertain database:</br>
 ? &nbsp;&nbsp;&nbsp;Gene_name->Gene_name</br></br>
@@ Line 89: / Line 90: @@
 				<div class="jobs_trigger"><strong> Fetching Gene Info</strong></div>
 				<div class="jobs_item" style="display: none;"><p align="justify">
-All gene information has been deposited into a file named gene_info which could be downloaded here[]. In order of picking out the genes in GRN as fast as possible, all genetic information are stored in a “map”. “Map” is just like a dictionary yet its words are names of genes and its descriptions of words are replaced by genetic information. By using binary tree method, it is very fast to searth the “word” wanted in the “dictionary”. As tested, the speed of binary tree method built-in “map” function is 720 times faster than traversal method.</br>
+All gene information has been deposited into a file named gene_info which could be downloaded here[]. In order of picking out the genes in GRN as fast as possible, all genetic information are stored in a “map”. “Map” is just like a dictionary yet its words are names of genes and its descriptions of words are replaced by genetic information. By using binary tree method, it is very fast to searth the “word” wanted in the “dictionary”. As tested, the speed of binary tree method built-in “map” function is 720 times faster than traversal method.</br></br>
 The format of Gene Info database:</br>
 ID_assigned_by_RegulonDB &nbsp;&nbsp;&nbsp;Gene_name &nbsp;&nbsp;&nbsp;Left_end_position &nbsp;&nbsp;&nbsp;Right_end_position &nbsp;&nbsp;&nbsp;DNA_strand &nbsp;&nbsp;&nbsp;Product_type &nbsp;&nbsp;&nbsp;Product_name &nbsp;&nbsp;&nbsp;Start_codon_sequence&nbsp;&nbsp;&nbsp;  Stop_codon_sequence &nbsp;&nbsp;&nbsp;Gene_sequence</br></br>
@@ Line 101: / Line 102: @@
               <div class="jobs_trigger"> <strong>Fetching Promoter Info</strong></div>
-		        <div class="jobs_item" style="display: none;"><p align="justify">All promoter information has been deposited into a file named promoter_info which could be downloaded here[]. But we also need transcription unit information because the information files about promoter do not contain all genes’ names backward. “TU Info” file, which can be downloaded here[], contains the starting position of each TU and its promoter name. Our software picks out the starting position into a integer array. Using the left position picked out in gene info, our software would find out which unit the gene belongs to through dichotomy method and then stores the name of promoter into corresponding object.</br>
+		        <div class="jobs_item" style="display: none;"><p align="justify">All promoter information has been deposited into a file named promoter_info which could be downloaded here[]. But we also need transcription unit information because the information files about promoter do not contain all genes’ names backward. “TU Info” file, which can be downloaded here[], contains the starting position of each TU and its promoter name. Our software picks out the starting position into a integer array. Using the left position picked out in gene info, our software would find out which unit the gene belongs to through dichotomy method and then stores the name of promoter into corresponding object.</br></br>
 The format of TU info database:</br>
 Operon_name &nbsp;&nbsp;&nbsp;Unit_name &nbsp;&nbsp;&nbsp;promoter_name &nbsp;&nbsp;&nbsp;Transcription_start_site ......</br></br>
-The principle of fetching information of promoters is same as fetching genes’s. Our software stores the promoter information from the file named “promoter_info” in a “map” which could be used to pick out the promoter sequence by searching promoter name through binary tree method.</br>
+The principle of fetching information of promoters is same as fetching genes’s. Our software stores the promoter information from the file named “promoter_info” in a “map” which could be used to pick out the promoter sequence by searching promoter name through binary tree method.</br></br>
 The format of Promoter Info database:</br>
 Promoter_ID_assigned_by_RegulonDB &nbsp;&nbsp;&nbsp;Promoter_name</br></br>
@@ Line 116: / Line 117: @@
 				<div class="jobs_trigger"> <strong>Integration</strong></div>
 				<div class="jobs_item" style="display: block;"><p align="justify">
-Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface’s reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software.
+Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface’s reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software.</br></br>
 The format of all_info database:</br>
 No. &nbsp;&nbsp;&nbsp;promoter_sequence &nbsp;&nbsp;&nbsp;gene_sequence &nbsp;&nbsp;&nbsp;gene_name &nbsp;&nbsp;&nbsp;ID &nbsp;&nbsp;&nbsp;left_position &nbsp;&nbsp;&nbsp;right_position &nbsp;&nbsp;&nbsp;promoter_name &nbsp;&nbsp;&nbsp;description</br>
@@ Line 138: / Line 139: @@
    <div id="jobs_container">
 	         <div class="jobs_trigger"><strong>An example</strong></div>
-		 		<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">We would like to start with a simple example. A cell operates like a basketball team. Imagine you are a manager of a team who wants to bring in some talent players making up a “big three” and build a champion-potential team this season. Before you pay the sky-high bills for the “big three”, you can evaluate the effect of the talent introduction. New members‘ records are good reference, but not the whole thing.</br>
+		 		<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">We would like to start with a simple example. A cell operates like a basketball team. Imagine you are a manager of a team who wants to bring in some talent players making up a “big three” and build a champion-potential team this season. Before you pay the sky-high bills for the “big three”, you can evaluate the effect of the talent introduction. New members‘ records are good reference, but not the whole thing.</br></br>
-There are various factors influencing the effect of introduction. Let’s carefully choose one of the most profound aspects and focus on the relationship of the members. In the original team, you are familiar with all players’ characteristics, their roles in the team and the coach’s style, i.e. you have the information of the original player interaction network. </br>
+There are various factors influencing the effect of introduction. Let’s carefully choose one of the most profound aspects and focus on the relationship of the members. In the original team, you are familiar with all players’ characteristics, their roles in the team and the coach’s style, i.e. you have the information of the original player interaction network. </br></br>
-You know Alex is a good shooter and Bob is a strong centre forward. Carl is your target player. Carl is famous for his shooting skills and appears dominant in the court. In other words, Carl shows more obvious similarity with Alex but a low level of similarity with Bob. Then, in the new player interaction network, Carl might play a role 80% like Alex and 20% like Bob. He is similar with Alex and Bob, however different. That’s an analysis and prediction at the global point of view.</br>
+You know Alex is a good shooter and Bob is a strong centre forward. Carl is your target player. Carl is famous for his shooting skills and appears dominant in the court. In other words, Carl shows more obvious similarity with Alex but a low level of similarity with Bob. Then, in the new player interaction network, Carl might play a role 80% like Alex and 20% like Bob. He is similar with Alex and Bob, however different. That’s an analysis and prediction at the global point of view.</br></br>
-[ Pic.1 Alex, Bob and new member Carl]</br>
+[ Pic.1 Alex, Bob and new member Carl]</br></br>
 Just like the basketball team example, researchers often need to insert exogenous genes(new players) into a cell(original team) to achieve a specific goal(win the champion). In the past, the behaviors of exogenous genes are mainly speculated by wet lab experiments. Now we are trying to give an answer before wearing laboratory gloves. </br>
@@ Line 266: / Line 267: @@
 	         <div class="jobs_trigger"><strong>Hill Equations</strong></div>
-		 		<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">Regulation relationship in genetic network includes positive regulation, negative regulation, positive-or-negative regulation and no regulation. We store regulation relationship in matrix R. Rji means the unit in line j and row i. For the material of original network, Rji=1 means material i enhance material j, Rji=-1 means material i repress material j, Rji=0 means material i has no influence on material j, Rji=2 means material i enhance or repress material j. For the new material, Rji ranges from -1 to 1. Rji<0 means the possibility of positive regulation is Rji; Rji>0 means the possibility of negative regulation is –Rji; Rji=0 means there is no regulation from i to j.;
+		 		<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">Regulation relationship in genetic network includes positive regulation, negative regulation, positive-or-negative regulation and no regulation. We store regulation relationship in matrix R. Rji means the unit in line j and row i. For the material of original network, Rji=1 means material i enhance material j, Rji=-1 means material i repress material j, Rji=0 means material i has no influence on material j, Rji=2 means material i enhance or repress material j. For the new material, Rji ranges from -1 to 1. Rji<0 means the possibility of positive regulation is Rji; Rji>0 means the possibility of negative regulation is –Rji; Rji=0 means there is no regulation from i to j.
 We use Hill equations to describe intensity of regulation. Equations are like following:
-<br/>
+<br/></br>
 <img src="https://static.igem.org/mediawiki/2013/e/e0/USTC_Software_1.png" style="width:600px;"/>
-<br/>
+<br/></br>
-The left side of the equation is the derivative x(density) on t(time).”qi”,”pi”,”ri”,”mi”,”ni” are parameters, which determine the intensity of regulation."ri" is degradation rate. Mji is exponent. M is a matrix whose dimensions are equivalent to R's. Mji is 0 or ranges from 0.5 to 1.2 or ranges from -1.2 to 0.5. For the material of original network, if Rji=1,Mji ranges from 0.5 to 1.2;if Rji=-1, Mji ranges from -1.2 to -0.5; if Rji=2;Mji ranges from -1.2 to 0.5 or 0.5 to 1. These Mjis’ absolute values are given randomly by program. If Rji=0, Mji=0.	For the new material,
+The left side of the equation is the derivative x(density) on t(time).”qi”,”pi”,”ri”,”mi”,”ni” are parameters, which determine the intensity of regulation."ri" is degradation rate. Mji is exponent. M is a matrix whose dimensions are equivalent to R's. Mji is 0 or ranges from 0.5 to 1.2 or ranges from -1.2 to 0.5. For the material of original network, if Rji=1,Mji ranges from 0.5 to 1.2;if Rji=-1, Mji ranges from -1.2 to -0.5; if Rji=2;Mji ranges from -1.2 to 0.5 or 0.5 to 1. These Mjis’ absolute values are given randomly by program. If Rji=0, Mji=0.
-<br/>
+</br>For the new material,
+<br/></br>
 <img src="https://static.igem.org/mediawiki/2013/6/64/USTC_Software_2.png"/>
-<br/>
+<br/></br>
 </p>
@@ Line 282: / Line 284: @@
 				<div class="jobs_item" style="display: none;"><p align="justify">
 Stable condition is the condition in which densities are time-invariant. We store material densities in a vector and solve the differential equations with Euler’s formula, which is like below
-<br/>
+<br/></br>
 <img src="https://static.igem.org/mediawiki/2013/e/e6/USTC_Software_3.png" style="width:600px;"/>
-<br/>
+<br/></br>
 We know the network will be stable at last, so every material density has a limitation.
@@ Line 293: / Line 295: @@
               <div class="jobs_trigger"> <strong>Find Changes From Original Stable Condition to New Condition</strong></div>
-		        <div class="jobs_item" style="display: none;"><p align="justify">Record the original stable condition, set new material density to 0 and this is the new initial density vector. Solve new equations and record density vectors before the new condition is stable and store these data in a text file.</br>
+		        <div class="jobs_item" style="display: none;"><p align="justify">Record the original stable condition, set new material density to 0 and this is the new initial density vector. Solve new equations and record density vectors before the new condition is stable and store these data in a text file.</br></br>
 To evaluate the new network, we introduce the grading system.
-<br/>
+<br/></br>
 <img src="https://static.igem.org/mediawiki/2013/3/32/USTC_Software_4.png" style="width:600px;"/>
 <img src="https://static.igem.org/mediawiki/2013/b/bc/USTC_Software_5.png" style="width:600px;"/>
-<br/>
+<br/></br>
-"xi" and "Xi" are densities of material i, which is not the new material."ny" is the number of materials. The more new densities are close to the original, the less the influence the cell endues. In general, cells close to the original cell are more likely to survive than those who are far different from the original cell. That is the thought of the grading system.</br>
+"xi" and "Xi" are densities of material i, which is not the new material."ny" is the number of materials. The more new densities are close to the original, the less the influence the cell endues. In general, cells close to the original cell are more likely to survive than those who are far different from the original cell. That is the thought of the grading system.</br></br>
 We did a lot of running and found that the “AbsValue” ranges from 0 to 370, so "ScoreA" ranges from 0 to 4.9.We get the integer part and store it in an array, which has five sections. Generate 100 or 200 matrix M from matrix R and run the original and new network for each M, so we can get 100 or 200 of "ScoreA"s. The section which has maximum "ScoreA"s is the eventual score.
 </p>
@@ Line 330: / Line 332: @@
 	         <div class="jobs_trigger"><strong>Input Target</strong></div>
-		 		<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">Before prediction, the expression of specific genes which the experimenter needs should be input into our software as well as the improvement or depression. The number of target gene is SEVEN at most.</br>
+		 		<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">Before prediction, the expression of specific genes which the experimenter needs should be input into our software as well as the improvement or depression. The number of target gene is SEVEN at most.</br></br>
 It is a must that figuring out the strongest and weakest expression strength before inputting the extreme cases into the target expression. The way to find out the strongest and weakest expression is modeling the GRN’s steady state by a large amount of random regulation from -1 and 1. On the other hand, the expression of genes unpicked by the users should be stable as much as possible. The initial strength of expression is calculated by modeling the original GRN with Hill’s equation.
 </p>
@@ Line 337: / Line 339: @@
 				<div class="jobs_trigger"><strong>Particle Swarm Optimization</strong></div>
 				<div class="jobs_item" style="display: none;"><p align="justify">
-For getting the best regulation, our software uses PSO algorithm based on 30 particles to simulate the GRN’s changing. First of all, the interactions of regulator and regulated-by have been put into those particles in random so that each particle will have the whole set of regulation. Secondly, the variance between target expressions and stable expression of new GRN have been regarded as the optimize requirements in PSO algorithm. As a result, the minimal variance of 30 particles is the global optimum and the minimal variance of the procession in one particle is the local optimum. Then, taking a step towards global and local optimum as well as considering the inertia and perturbation avoids falling into the sub-optimal condition.</br>
+For getting the best regulation, our software uses PSO algorithm based on 30 particles to simulate the GRN’s changing. First of all, the interactions of regulator and regulated-by have been put into those particles in random so that each particle will have the whole set of regulation. Secondly, the variance between target expressions and stable expression of new GRN have been regarded as the optimize requirements in PSO algorithm. As a result, the minimal variance of 30 particles is the global optimum and the minimal variance of the procession in one particle is the local optimum. Then, taking a step towards global and local optimum as well as considering the inertia and perturbation avoids falling into the sub-optimal condition.</br></br>
-At last, when the variance of expression reaches an acceptable range, our software picks out and saves the best global optimum particle following by the movement of those particles stop.</br>
+At last, when the variance of expression reaches an acceptable range, our software picks out and saves the best global optimum particle following by the movement of those particles stop.</br></br>
 We constantly revises the factors in PSO algorithm by machine learning method for accurate simulation with a fast PSO particle-motion equation. At the same time, our software also filter the result of regulatory value which is more intuitive.
 </p>

Team:USTC-Software/Project/Method

From 2013.igem.org

Revision as of 02:41, 26 September 2013

Methodologies

Fetch Database

Alignment Analyze

New Network Construction

Network Model

Predict