Team:USTC-Software/Project/Method
From 2013.igem.org
Line 90: | Line 90: | ||
<div class="jobs_trigger"><strong> Fetching Gene Info</strong></div> | <div class="jobs_trigger"><strong> Fetching Gene Info</strong></div> | ||
<div class="jobs_item" style="display: none;"><p align="justify"> | <div class="jobs_item" style="display: none;"><p align="justify"> | ||
- | All gene information has been deposited into a file named gene_info which could be downloaded <a href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/Gene_sequence.txt">here</a>. In order of picking out the genes in GRN as fast as possible, all genetic information are stored in a “map”. “Map” is just like a dictionary yet its words are names of genes and its descriptions of words are replaced by genetic information. By using binary tree method, it is very fast to searth the “word” wanted in the “dictionary”. As tested, the speed of binary tree method built-in “map” function is 720 times faster than traversal method.</br></br> | + | All gene information has been deposited into a file named gene_info which could be downloaded <a id="out" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/Gene_sequence.txt">here</a>. In order of picking out the genes in GRN as fast as possible, all genetic information are stored in a “map”. “Map” is just like a dictionary yet its words are names of genes and its descriptions of words are replaced by genetic information. By using binary tree method, it is very fast to searth the “word” wanted in the “dictionary”. As tested, the speed of binary tree method built-in “map” function is 720 times faster than traversal method.</br></br> |
The format of Gene Info database:</br> | The format of Gene Info database:</br> | ||
ID_assigned_by_RegulonDB Gene_name Left_end_position Right_end_position DNA_strand Product_type Product_name Start_codon_sequence Stop_codon_sequence Gene_sequence</br></br> | ID_assigned_by_RegulonDB Gene_name Left_end_position Right_end_position DNA_strand Product_type Product_name Start_codon_sequence Stop_codon_sequence Gene_sequence</br></br> | ||
Line 102: | Line 102: | ||
<div class="jobs_trigger"> <strong>Fetching Promoter Info</strong></div> | <div class="jobs_trigger"> <strong>Fetching Promoter Info</strong></div> | ||
- | <div class="jobs_item" style="display: none;"><p align="justify">All promoter information has been deposited into a file named promoter_info which could be downloaded <a href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/PromoterSet.txt">here</a>. But we also need transcription unit information because the information files about promoter do not contain all genes’ names backward. “TU Info” file, which can be downloaded <a href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/TUSet.txt">here</a>, contains the starting position of each TU and its promoter name. Our software picks out the starting position into a integer array. Using the left position picked out in gene info, our software would find out which unit the gene belongs to through dichotomy method and then stores the name of promoter into corresponding object.</br></br> | + | <div class="jobs_item" style="display: none;"><p align="justify">All promoter information has been deposited into a file named promoter_info which could be downloaded <a id="out" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/PromoterSet.txt">here</a>. But we also need transcription unit information because the information files about promoter do not contain all genes’ names backward. “TU Info” file, which can be downloaded <a id="out" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/TUSet.txt">here</a>, contains the starting position of each TU and its promoter name. Our software picks out the starting position into a integer array. Using the left position picked out in gene info, our software would find out which unit the gene belongs to through dichotomy method and then stores the name of promoter into corresponding object.</br></br> |
The format of TU info database:</br> | The format of TU info database:</br> | ||
Operon_name Unit_name promoter_name Transcription_start_site ......</br></br> | Operon_name Unit_name promoter_name Transcription_start_site ......</br></br> |
Revision as of 02:49, 26 September 2013
Methodologies
In order to simulate the GRN’s working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm. There are five parts of methodologies: Fetch Database, Alignment Analyze, New Network Construction, Network Model and Predict.
Fetch Database
Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface’s reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software. The format of all_info database: No. promoter_sequence gene_sequence gene_name ID left_position right_position promoter_name description The fetching module generates three files: old_GRN, all_info and uncertain_database.
Alignment Analyze
New Network Construction
The behavior similarity of two units can be described by the dot product of two regulated vectors or two regulating vectors. A more intuitive way is using the vectorial angle to measured the similarity of two behaviors. But there are some zero vectors in the gene regulatory network which usually means the units either play the role of target or the regulator. [Pic. 4 GRN matrix, target vector, regulator vector and their dot product] We have tested the hypothesis by analyzing all 1748 regulation units of Escherichia coli, K-12, recorded in RegulonDB. By pairwise comparison of all these units, about 1.6 million sets of data was obtained. Each set of data consists of promoter sequence similarity, protein coding sequence similarity and behavior similarity. We hope to find some structure in the data that supports our hypothesis. And it is lucky enough to find there is a tendency showing the relationship between sequence similarity and behavior similarity(Pic. 2). [Pic. 2 Sequence similarity and behavior similarity] Sequence similarity is set as x axis and behavior similarity is set as y axis. Obviously sequence similarity is continuous-valued (from 0 to 1) and behavior similarity is discrete-valued. Values of behavior similarity determined by the dimension(N) of the vector are between -N and N. According to the result, promoter sequence similarity mainly distributes from 0.4 to 0.6, protein coding sequence similarity mainly distributes from 0 to 0.7 and behavior similarity mainly distributes from -3 to 5. As it is shown in Picture 4, high behavior similarity is partial to high sequence similarity. Peak value of behavior similarity, 17, appears where sequence similarity is 0.537. When behavior similarity value is fixed, for example, set behavior similarity as 8, it is obvious that the higher the sequence similarity is, the more intensive the dots are.
Network Model
Network analysis includes finding stable condition of network, adding new gene, finding new stable condition and changes from original condition to new condition. We use densities of materials to describe network condition. If all material densities are time-invariant, we can say the network condition is stable.
Predict
In some cases, importing exogenous gene is for enhancing or suppressing the expression of some specific genes in engineered bacteria itself. But it is hard to choose an appropriate regulatory gene. Our software analyzes the GRN forward as well as simulates by optimization algorithm backward for giving a reference of choosing to the users. Our software not only focused on the direct regulation but also focused on the global GRN. In the same time, controlling the expression of multiple genes in network has been realized by global prediction. What’s more, Particle Swarm Optimization (PSO) Algorithm makes it possible.