Team:USTC-Software/Project/Method

From 2013.igem.org

(Difference between revisions)
Line 72: Line 72:
   <div id="jobs_container">
   <div id="jobs_container">
        <div class="jobs_trigger"><strong>Fetch Regulation</strong></div>
        <div class="jobs_trigger"><strong>Fetch Regulation</strong></div>
-
<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">In GRN, there are two kinds of files: <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_tf.txt">TF to TF</a> and <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt">TF to Gene</a>. Since the database about the regulation between TFs and Genes contains only one-way interaction, the matrix of GRN is a rectangle.</br></br>
+
<div class="jobs_item" style="display: none;"><p class="bodytext"></p><p align="justify">In GRN, there are two kinds of files: <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_tf.txt">TF to TF</a> and <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/network_tf_gene.txt">TF to Gene</a>. Since the database about the regulation between TFs and Genes contains only one-way interaction, the matrix of GRN is a rectangle.</br></br>
First of all, read the regulation relationship of TFs. Our software filters the documentation of RegulonDB on the head of all files and then reads the name of regulate and regulated TF, which is also the name of its genes, one by one. In the same time, our software numerates the genes and stores their names into an objects' array of genetic data. </br></br>
First of all, read the regulation relationship of TFs. Our software filters the documentation of RegulonDB on the head of all files and then reads the name of regulate and regulated TF, which is also the name of its genes, one by one. In the same time, our software numerates the genes and stores their names into an objects' array of genetic data. </br></br>
&nbsp;&nbsp;The format of regulation database:</br>
&nbsp;&nbsp;The format of regulation database:</br>
Line 91: Line 91:
<div class="jobs_trigger"><strong> Fetch Gene Info</strong></div>
<div class="jobs_trigger"><strong> Fetch Gene Info</strong></div>
<div class="jobs_item" style="display: none;"><p align="justify">
<div class="jobs_item" style="display: none;"><p align="justify">
-
All gene information has been deposited into a file named gene_info which could be downloaded <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/Gene_sequence.txt">here</a>. In order of picking out the genes in GRN as fast as possible, all genetic information are stored in a “map”. “Map” is just like a dictionary yet its words are names of genes and its descriptions of words are replaced by genetic information. By using binary tree method, it is very fast to search the “word” wanted in the “dictionary”. As tested, the speed of binary tree method built-in “map” function is 720 times faster than traversal method.</br></br>
+
All gene information has been deposited into a file named gene_info which could be downloaded <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/Gene_sequence.txt">here</a>. In order of picking out the genes in GRN as fast as possible, all genetic information are stored in a “map”. “Map” is just like a dictionary yet its words are names of genes and its descriptions of words are replaced by genetic information. By using binary tree method, it is very fast to search the “word” wanted in the “dictionary”. As tested, the speed of binary tree method built-in “map” function is 720 times faster than traversal method.</br></br>
&nbsp;&nbsp;The format of Gene Info database:</br>
&nbsp;&nbsp;The format of Gene Info database:</br>
&nbsp;&nbsp;&nbsp;&nbsp;ID_assigned_by_RegulonDB &nbsp;&nbsp;&nbsp;Gene_name &nbsp;&nbsp;&nbsp;Left_end_position &nbsp;&nbsp;&nbsp;Right_end_position &nbsp;&nbsp;&nbsp;DNA_strand &nbsp;&nbsp;&nbsp;Product_type &nbsp;&nbsp;&nbsp;&nbsp;Product_name &nbsp;&nbsp;&nbsp;Start_codon_sequence&nbsp;&nbsp;&nbsp;  Stop_codon_sequence &nbsp;&nbsp;&nbsp;Gene_sequence</br></br>
&nbsp;&nbsp;&nbsp;&nbsp;ID_assigned_by_RegulonDB &nbsp;&nbsp;&nbsp;Gene_name &nbsp;&nbsp;&nbsp;Left_end_position &nbsp;&nbsp;&nbsp;Right_end_position &nbsp;&nbsp;&nbsp;DNA_strand &nbsp;&nbsp;&nbsp;Product_type &nbsp;&nbsp;&nbsp;&nbsp;Product_name &nbsp;&nbsp;&nbsp;Start_codon_sequence&nbsp;&nbsp;&nbsp;  Stop_codon_sequence &nbsp;&nbsp;&nbsp;Gene_sequence</br></br>
Line 103: Line 103:
                  
                  
             <div class="jobs_trigger"> <strong>Fetch Promoter Info</strong></div>
             <div class="jobs_trigger"> <strong>Fetch Promoter Info</strong></div>
-
        <div class="jobs_item" style="display: none;"><p align="justify">All promoter information has been deposited into a file named promoter_info which could be downloaded <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/PromoterSet.txt">here</a>. But we also need transcription unit information because the information files about promoter do not contain all genes' names backward. “TU Info” file, which can be downloaded <a id="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/TUSet.txt">here</a>, contains the starting position of each TU and its promoter name. Our software picks out the starting position into a integer array. Using the left position picked out in gene info, our software would find out which unit the gene belongs to through dichotomy method and then stores the name of promoter into corresponding object.</br></br>
+
        <div class="jobs_item" style="display: none;"><p align="justify">All promoter information has been deposited into a file named promoter_info which could be downloaded <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/PromoterSet.txt">here</a>. But we also need transcription unit information because the information files about promoter do not contain all genes' names backward. “TU Info” file, which can be downloaded <a class="content" href="http://regulondb.ccg.unam.mx/menu/download/datasets/files/TUSet.txt">here</a>, contains the starting position of each TU and its promoter name. Our software picks out the starting position into a integer array. Using the left position picked out in gene info, our software would find out which unit the gene belongs to through dichotomy method and then stores the name of promoter into corresponding object.</br></br>
&nbsp;&nbsp;The format of TU info database:</br>
&nbsp;&nbsp;The format of TU info database:</br>
&nbsp;&nbsp;&nbsp;&nbsp;Operon_name &nbsp;&nbsp;&nbsp;Unit_name &nbsp;&nbsp;&nbsp;promoter_name &nbsp;&nbsp;&nbsp;Transcription_start_site ......</br></br>
&nbsp;&nbsp;&nbsp;&nbsp;Operon_name &nbsp;&nbsp;&nbsp;Unit_name &nbsp;&nbsp;&nbsp;promoter_name &nbsp;&nbsp;&nbsp;Transcription_start_site ......</br></br>

Revision as of 09:51, 19 October 2013

Slide

Take a gNAP before wearing your gloves! Genetic Network Analyze and Predict
The sketch and final GUI of gNAP!
We compare the result of our software with gene expression profile in literature.
We are USTC-Software!

Methodologies

Methodologies

In order to simulate the GRN's working and analyze the changing after exogenous gene imported, some advanced algorithms and classical methods are employed in the software. These algorithms and methods include Binary Tree method, Needle-Wunsch Algorithm, Decision Tree method, Hill Equation and PSO Algorithm.

There are five parts of methodologies: Fetch Database, Alignment Analyze, New Network Construction, Network Model and Predict.

Fetch Database

Fetch Database Abstract
Fetch Regulation
Fetch Gene Info
Fetch Promoter Info
Integration

Our software integrates all information we picked out about genes and generates a file named “all_info” —— all information about genes —— for the output graphical interface's reading. In the meanwhile, the array of objects containing all information has been stored in computer memory which greatly improve the computing speed of our software.

  The format of all_info database:
    No.    promoter_sequence    gene_sequence    gene_name    ID    left_position    right_position    promoter_name     description
The fetching module generates three files: old_GRN, all_info and uncertain_database.

Operon Theory and Regulatory Model

Operon Theory
Regulatory Model
Similarity and Homology
Needleman-Wunsch Algorithm
A Supplementary Game

New Network Construction

Random Noise
Filter
Construct new GRN

If there is a three-unit network and they interact with each other as it is shown in the figure. The regulation is described by the GRN matrix.

Figure 6. Example network and its GRN matrix.

If D is the exogenous unit, we can obtain three similarity data sets of D with the units in the original GRN:

  • Promoter sequence similarity
  • Gene sequence similarity
  • Amino acid sequence similarity.
  • The construction is equivalent to add a new column and a row into the original matrix.

    Figure 7. Mathematical Equivalence

    When filling the column, D is compared with the regulators of the unit in each row. The regulations in the row are consider separately and marked as “positive group” and “negative group”. The average similarity of each group represents the distance between the exogenous unit and the group. D is supposed to have the larger one's regulatory direction(positive or negative). The regulatory intensity is the weight average regulation of the chose group. The weight here is the amino acid sequence similarity.

    There are two conditions when fill the new row:
    1. There are units having the same promoter as the exogenous unit.
    2. There is no units having the same promoter as the exogenous unit.

    In condition 1, the units sharing the same promoter with the new member are picked out, and the following steps are the same as the construction of the column. The difference is the similarity used here is the gene sequence similarity. As explained in the regulation model part, the promoter is the main regulatory region, but the following sequence is also considered. Now the promoter is the same, so what we focus on are the gene sequences.

    In condition 2, the process is almost the same as constructing the new column. Promoter similarity is used because it is the main region.

    Figure 8. Construct New GRN

    Network Model

    Network Model Abstract

    Network analysis includes finding stable condition of network, adding new gene, finding new stable condition and changes from original condition to new condition. We use densities of materials to describe network condition. If all material densities are time-invariant, we can say the network condition is stable.

    Hill Equations
    Find Stable Network Condition
    Find Changes From Original Stable Condition to New Condition

    Predict

    Predict Abstract

    In some cases, importing exogenous gene is for enhancing or suppressing the expression of some specific genes in engineered bacteria itself. But it is hard to choose an appropriate regulatory gene. Our software analyzes the GRN forward as well as simulates by optimization algorithm backward for giving a reference of choosing to the users. Our software not only focused on the direct regulation but also focused on the global GRN. In the same time, controlling the expression of multiple genes in network has been realized by global prediction. What's more, Particle Swarm Optimization (PSO) Algorithm makes it possible.

    Input Target
    Particle Swarm Optimization
    Filter

    Database

    TF-TF

    This file contains the regulation between Transcription Factors.

    TF-Gene

    This file contains the regulation between Transcription Factors and Genes

    Gene Info

    This file contains the information about all genes in E-coli K-12

    Promoter Info

    This file contains the information about all promoters in E-coli K-12

    TU Info

    This file contains the information about all Transcription Units in E-coli K-12