Team:Shenzhen BGIC 0101/Tutorial

From 2013.igem.org

(Difference between revisions)
 
(36 intermediate revisions not shown)
Line 1: Line 1:
-
{{:Team:Shenzhen BGIC 0101/Templates/Header}}
+
{{:Team:Shenzhen_BGIC_0101/Templates/Header}}
-
<html lang="en">
+
<html>
-
    <head>
+
<script src="https://2013.igem.org/Team:Shenzhen_BGIC_0101/aaa?action=raw&ctype=text/javascript"></script>
-
<script type="text/javascript" src="https://2013.igem.org/Team:Shenzhen_BGIC_0101/js/Modernizrjs?action=raw&ctype=text/js"></script>
+
-
    </head>
+
-
+
-
    <body>
+
-
        <div class="container2">
+
-
<section class="tabs">
+
<body>
-
            <input id="tab-1" type="radio" name="radio-set" class="tab-selector-1" checked="checked" />
+
-
        <label for="tab-1" class="tab-label-1">NeoChr</label>
+
-
+
-
            <input id="tab-2" type="radio" name="radio-set" class="tab-selector-2" />
+
-
        <label for="tab-2" class="tab-label-2">NucleoMod</label>
+
-
+
-
            <input id="tab-3" type="radio" name="radio-set" class="tab-selector-3" />
+
-
        <label for="tab-3" class="tab-label-3">SegmMan</label>
+
-
+
-
            <input id="tab-4" type="radio" name="radio-set" class="tab-selector-4" />
+
-
        <label for="tab-4" class="tab-label-4">Others</label>
+
-
           
+
-
    <div class="clear-shadow"></div>
+
-
+
-
        <div class="content2">
+
-
+
-
+
-
+
-
        <div class="content-1">
+
-
<h1>NeoChr </h1>
+
-
<p>NeoChr module would assist users to grab related genes in different pathways manually, to rewire genes’ relationship logically*, and to replace genes with ortholog that score higher*. Firstly, it would allow users to define gene order and orientation in DRAG&DROP way. Secondly, decoupled these genes if have overlap and make all genes are non-redundancy. Finally, add chromosome features to build a new chromosome and show in the JBrowse. Moreover, users can drag a window in the JBrowse and delete any gene in the window.<br/>
+
-
Note: <br/>
+
-
*These function are unavailable now, please wait for version 2.<br/>
+
-
**You can also add any thing here including your own water mark.<br/></p>
+
-
<h2> Plugin Scripts </h2>
+
-
<p>This module contains three plugins: Decouple.pl, Add.pl and Delete.pl.</p>
+
-
<h3>1.1 Decouple.pl</h3>
+
-
<p>This plugin is to decouple the genes which have overlap gene regions. These overlapping genes can be decoupled if meet the following conditions: (1)If two genes have overlap gene regions, the latter gene 5’UTR does not cover the former gene initial codon (ATG); (2)Overlapping region initial coordinate is in the coding DNA sequences(CDS) of gene which is need to be decoupled; (3)The decouple site of CDS have synonymous substitute codon to replace; After decoupling, we use these non-redundancy genes to generate a GFF file and a FASTA file.</p>
+
-
<b><p>1.1.1 Internal operation </p></b>
+
-
<p>First, this plugin extracts base sequence from the genome file according to the gene order list, and records the gene order in the list. And then plugin records the annotation information according to the specie GFF file, moreover, plugin extends gene CDS upstream 600bp as 5’-UTR and downstream 100bp as 3’-UTR if the GFF file does not contain annotated these two features.<br/>
+
-
Second, this plugin detects the overlapping genes in the same chromosome. In case the overlapping genes are detected, it will judge whether the overlapping initial site is located in the CDS region, and identify the site is belong to phase0/1/2.<br/>
+
-
Third, the plugin attempts to synonymous substitute codon to break the initial codon intra the CDS. Printing information whether or not be decoupled successfully, such as:<br/>
+
-
<img src="https://static.igem.org/mediawiki/2013/e/e0/T1-1.png" alt="data" style="width: 750px" /><br/>
+
-
And non-redundancy genes are generated.<br/>
+
-
Finally, the plugin links non-redundancy genes to construct a new chromosome according to the gene order.
+
-
</p>
+
-
<b><p>1.2.1 Example</p></b>
+
-
<p>We have two input forms to execute the plugin:<br/>
+
-
1. Using string format as gene order list input form:<br/>
+
-
    perl GeneDecouple.pl --species saccharomyces_cerevisiae_chr --list_format string --gene_order="YAL054C -,YAL038W +,YBR019C -,YBR145W +,YCL040W +,YCR012W +,YCR105W +,YDL168W +,YPL017C -,YIL177C -,YIL177W-A +,YIL172C -,YIL171W-A +,” --geneset_dir ../gene_set --upstream_extend 600 --downstream_extend 100 --neo_chr_gff neochr.gff --neo_chr_fa neochr.fa<br/>
+
-
2. Using file format as gene order list input form:<br/>
+
-
erl GeneDecouple.pl --species saccharomyces_cerevisiae_chr --list_format file --gene_order gene_ordre.list --geneset_dir ../gene_set --upstream_extend 600 --downstream_extend 100 --neo_chr_gff neochr.gff --neo_chr_fa neochr.fa
+
-
</p>
+
-
<b><p>1.1.3 Parameters </p></b>
+
-
<table border="1">
+
-
<tr>
+
-
<th>Parameter</th>
+
-
<th>Description</th>
+
-
<th></th>
+
-
<th></th>
+
-
</tr>
+
-
  <tr>
+
-
    <th>list_format</th>
+
-
    <th>set the input form of gene order list</th>
+
-
<th>string</th>
+
-
<th>string/file</th>
+
-
  </tr>
+
-
 
+
-
  <tr>
+
-
    <th>gene_order</th>
+
-
    <th>set the input gene order list file(include pathway genes and addition genes)</th>
+
-
<th></th>
+
-
<th></th>
+
-
  </tr>
+
-
 
+
-
  <tr>
+
-
    <th>Parameter</th>
+
-
    <th>Description</th>
+
-
<th>Default</th>
+
-
<th>Selectable range</th>
+
-
  </tr>
+
-
 
+
-
  <tr>
+
-
    <th>geneset_dir</th>
+
-
    <th>set the species annotation directory</th>
+
-
<th>600</th>
+
-
<th></th>
+
-
  </tr>
+
-
 
+
-
  <tr>
+
-
    <th>upstream_extend</th>
+
-
    <th>set the length of gene downstram(bp)</th>
+
-
<th>100</th>
+
-
<th></th>
+
-
  </tr>
+
-
 
+
-
  <tr>
+
-
    <th>neo_chr_gff</th>
+
-
    <th>set the name of output neochr gff file</th>
+
-
<th></th>
+
-
<th></th>
+
-
  </tr>
+
-
 
+
-
  <tr>
+
-
    <th>neo_chr_fa</th>
+
-
    <th>set the name of output neochr fasta file</th>
+
-
<th></th>
+
-
<th></th>
+
-
  </tr>
+
-
  <tr>
+
-
    <th>help</th>
+
-
    <th>Show help information</th>
+
-
<th></th>
+
-
<th></th>
+
-
  </tr>
+
-
</table>
+
-
</p><br/>
+
-
    <b><p>1.4.1 The format of output file</p></b>
+
-
<p>The output files are standard GFF and FASTA format files which are decoupled.<br/>
+
-
&nbsp;&nbsp;1. decoupled GFF file<br/></p>
+
-
<img src="https://static.igem.org/mediawiki/2013/e/e5/T1-2.png" alt="data" style="width: 750px" /><<br/>
+
-
&nbsp;&nbsp;2.decoupled FASTA file<br/>
+
-
<img src="https://static.igem.org/mediawiki/2013/b/b2/T1-3.png" alt="data" style="width: 750px" /><br/>
+
-
    <h3>1.2 Add.pl </h3>
+
<br/><br/>
-
<p>This plugin will add the LoxPsym sequence and the customized left and right telomeres, centromere and autonomously replicating sequence (ARS) into the FASTA file and GFF file which are generated by Decouple.pl.</p>
+
<section class="container">
-
    <b><p>1.2.1 Internal operation </p></b>
+
<div class="one-third column">
-
<p>The plugin adds LoxPsym behind the first 3bp of 3’-UTR in each gene and adds telomere, centromere and ARS according this mode:<br/>
+
-
<b>left_telomere + gene1 + centromere + gene2 + ARS + gene3 + right_telomere</b><br/>
+
<a href="https://2013.igem.org/Team:Shenzhen_BGIC_0101/Tutorial/neochr" target="_blank">
-
The distance between centromere and ARS is less than 30Kb.<br/>
+
<img src="https://static.igem.org/mediawiki/2013/f/f9/Neochr_icon-purple.png" />
-
Finally, user can see the new added features chromosome according to the JBrowse.
+
        </a>
-
</p>
+
       
-
    <b><p>1.2.2 Example </p></b>
+
<h1>NeoChromosome</h1>
-
<p>perl 04.Add.pl --loxp loxPsym.feat --left_telomere UTC_left.feat --right_telomere UTC_right.feat --ars chromosome_I_ARS108.feature --centromere chromosome_I_centromere.feat --chr_gff neochr.gff --chr_seq neochr.fa --neochr_seq neochr.final.fa --neochr_gff neochr.final.gff<br/><br/>
+
<a href="https://static.igem.org/mediawiki/2013/9/9e/Tutorial-Neochr.pdf" target="_self"><h4>Download</h4></a>
-
All the feature file format is 4 lines format, for example:<br/>
+
</div>
-
&nbsp;&nbsp;name = site_specific_recombination_target_region<br/>
+
-
&nbsp;&nbsp;type = loxPsym<br/>
+
-
&nbsp;&nbsp;source = BIO<br/>
+
-
&nbsp;&nbsp;sequence = ATAACTTCGTATAATGTACATTATACGAAGTTAT<br/>
+
-
Note: the first line is the detail name of feature, the second line is the type of feature, the third line is the source of feature and the last line is the sequence of feature.
+
-
</p>
+
-
<b><p>1.2.3 Parameters</p></b>
+
-
<pre>
+
-
Parameter Description Default Selectable range
+
-
loxp set the sequence of loxp ATAACTTCGTATAATGTATGCTATACGAAGTTAT
+
-
left_telomere set the sequence of left telomere
+
-
right_telomere set the sequence of right telomere
+
-
chr_gff set the input neorchr_gff file
+
-
chr_seq set the input neorchr_gff file
+
-
neochr_seq set the name of output added loxps and telomeres neochr_fa file
+
-
neochr_gff set the name of output added loxps and telomeres neochr_gff file
+
-
</pre>
+
-
    <b><p>1.2.4 The format of output</p></b>
+
<div class="one-third column">
-
<p>The output files are standard GFF and FASTA format of adding features chromosome.<br/>
+
-
1. added features GFF file<br/>
+
<a href="https://2013.igem.org/Team:Shenzhen_BGIC_0101/Tutorial/nucleomod" target="_blank">
-
<img src="https://static.igem.org/mediawiki/2013/e/e0/T1-4.png" alt="data" style="width: 750px" ></a>
+
<img src="https://static.igem.org/mediawiki/2013/e/e7/Nucleomod_icon-purple.png" />
-
</p>
+
</a>
 +
     
 +
<h1>NucleoModifier</h1>
 +
 +
<a href="https://static.igem.org/mediawiki/2013/9/96/Tutorial-NucleoMod.pdf" target="_self"><h4>Download</h4></a>
 +
</div>
-
<h3>1.3 Delete.pl </h3>
 
-
<p>This plugin can modify the GFF and FASTA file which are generated by Add.pl according to the user drags a window in the JBrowse and delete any gene in the window.</p>
 
-
    <b><p>1.3.1 Internal operation </p></b>
 
-
<p>Firstly, user uses mouse to drag a window in the added features FASTA file which is showed in the JBrowse and JBrowse displays all the genes in this window.Secondly, user decides which genes is need to be delected from the new chromosome and plugin deletes genes from GFF file and modify FASTA in the same time.</p>
 
-
<b><p>1.3.2 Example </p></b>
 
-
<p>perl 05.delete.pl --delete="YAL054C,YAL038W" --neochr_gff neochr.refine.final.gff --neochr_fa neochr.refine.final.fa --slim_gff neochr.refine.delete.gff --slim_fa neochr.refine.delete.fa </p>
 
-
    <b><p>1.3.3 Parameters </p></b>
 
-
<p><pre>
 
-
Parameter Description Default Selectable range
 
-
delete Set the to be deleted gene list
 
-
neochr_gff Set the input GFF file which is generated by Add.pl
 
-
neochr_fa Set the input FASTA file which is generated by Add.pl
 
-
slim_gff Set the output GFF file
 
-
slim_fa Set the output FASTA file </pre></p>
 
-
    <b><p>1.3.4 The format of ouput</p></b>
+
<div class="one-third column">
-
<p>The output files are standard GFF and FASTA format of deleted genes chromosome.</p>
+
-
   
+
<a href="https://2013.igem.org/Team:Shenzhen_BGIC_0101/Tutorial/segmman" target="_blank">
-
</div>
+
<img src="https://static.igem.org/mediawiki/2013/9/9e/Segman_icon-purple.png" />
-
+
</a>
-
+
       
-
+
<h1>Segmentation</h1>
-
+
-
+
<a href="https://static.igem.org/mediawiki/2013/a/a8/Tutorial-Segmentation.pdf" target="_self"><h4>Download</h4></a>
-
+
        <br/><br/><br/>
-
+
</div>
-
+
-
+
-
+
-
+
-
        <div class="content-2">
+
-
<h1>NucleoMod </h2>
+
-
<p>NucleoMod can modify CDS based on synonymous mutation. It has 5 applications. Firstly, NucleoMod is used to design CRISPR sites on NeoChr so that we can silence the wild type genes. Secondly, it can erase specific enzyme sites according to the users' selection. Thirdly, users can create an enzyme site in selected region of specific genes. Fourthly, it can optimize the codon efficiency to increase the expression level. Finally, it can smash the tandem repeat bases to reduce the synthesis difficulty.</p>
+
-
            <h2> Plugins </h2>
+
-
<p>This module contains 5 plugins: CRISPR design, erase enzyme site, create enzyme site, codon optimization, repeat smash. All plugins are included in the main program.</p>
+
-
            <h3>2.1 CRISPR design</h3>
+
-
<p>This plugin is used to design CRISPR site of NeoChr genes so that we can silence the wild type genes. We use blast+ to ensure the uniqueness of CRISPR sites. If you are using more than one plugin at the same time, this plugin will start firstly and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
+
-
            <b><p>2.1.1 Internal operation</p></b>
+
-
<p>First, this plugin extracts sequence and annotation from the NeoChr FASTA file and GFF3 file, respectively. Regular expression will be applied to find the 23bp basic structure of CRISPR site, with a head of ‘G’ then following 20 facultative bases and finally followed by ‘GG’. All the sequences and locus will be record in an array. <br/>
+
-
Second, the blast+ will be used to check whether the 12bp sequences (from 9th to 20th) are uniq in the wild type genome. Only uniq sites will be reserved. <br/>
+
-
Third, synonymous substitution method will be applied to change one base between the 9th to 20th bases of the CRISPR structure. The result will be record in GFF as an element of gene. If –verbose is set, the designed number will be report in STDOUT.<br/>
+
-
Finally, if this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
+
-
            <b><p>2.1.2 Example</p></b>
+
-
<p>We have two input forms to execute the plugin:<br/>
+
-
Run CRISPR design plugin only:<br/>
+
-
perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -crisprnum 2 -database saccharomyces_cerevisiae_chr.fa</p>
+
-
            <b><p>2.1.3 Parameters</p></b>
+
-
<table>
+
-
<tr><td>Parameter</td><td>Description</td><td>Default</td><td>Selectable range</td></tr>
+
-
<tr><td>inputfa</td><td>The NeoChr sequence file in FASTA format</td><td></td><td>string</td></tr>
+
-
<tr><td>inputgff</td><td>The NeoChr annotation file in GFF3 format</td><td></td><td>string</td></tr>
+
-
<tr><td>outputgff</td><td>Output of new chromosome annotation in GFF3 format</td><td></td><td>string</td></tr>
+
-
<tr><td>outputfa</td><td>Output of new chromosome sequence in FASTA format</td><td></td><td>string</td></tr>
+
-
<tr><td>verbose</td><td>Output the detailed information in STDOUT</td><td>none</td><td>option</td></tr>
+
-
<tr><td>crisprnum</td><td>Number of CRISPR site to be design per gene</td><td></td><td>Int (>0)</td></tr>
+
-
<tr><td>database</td><td>The sequence of reference genome, used as blast+ database</td><td></td><td>string</td></tr>
+
-
<tr><td>help</td><td>Show help information</td><td></td><td></td></tr>
+
-
</table>
+
-
            <b><p>2.1.4 The format of output file</p></b>
+
-
<p>The output files are standard GFF and FASTA format files.<br/>
+
-
1. GFF file<br/>
+
-
<img src="https://static.igem.org/mediawiki/2013/2/26/T2-1.png" /><br/>
+
-
2. FASTA file<br/>
+
-
<img src="https://static.igem.org/mediawiki/2013/d/df/T2-2.png" /><br/>
+
-
3. Detailed information in STDOUT<br/>
+
-
<img src="https://static.igem.org/mediawiki/2013/f/fb/T2-3.png" />
+
-
</p>
+
-
            <h3>2.2 Erase enzyme site</h3>
+
-
<p>Given a list of restriction enzyme information, this plugin will erase the restriction sites in every gene. If you are using more than one plugin at the same time, this plugin will start after CRISPR design and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
+
-
            <b><p>2.2.1 Internal operation</p></b>
+
-
<p>The enzyme information will be extracted. (If the –borbrickstandard parameter is set, it will also remove EcoRI, XbaI, SpeI, PstI and NotI) The recognize site will be reformatted to regular expression and searched in the CDS regions.
+
-
Once a restriction site is matched, synonymous substitution method will be applied to try to erase the enzyme site. When the substitution is finished, the plugin will restart the next search from 1 base after the last matched position.
+
-
If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
+
-
            <b><p>2.2.2 Example</p></b>
+
-
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa –biobrickstandard [-delenzymelist enzyme.list ]<br/>
+
-
<br/>
+
-
Format of enzyme.list:<br/>
+
-
Company  enzyme_name  enzyme_site  …<br/>
+
-
  Eg. NEB      BamHI        G/GATCC</p>
+
-
            <b><p>2.2.3 Parameters</p></b>
+
-
<table>
+
-
<tr><td>Parameter</td><td>Description</td><td>Default</td><td>Selectable range</td></tr>
+
-
<tr><td>inputfa</td><td>The NeoChr sequence file in FASTA format</td><td></td><td>string</td></tr>
+
-
<tr><td>inputgff</td><td>The NeoChr annotation file in GFF3 format</td><td></td><td>string</td></tr>
+
-
<tr><td>outputgff</td><td>Output of new chromosome annotation in GFF3 format</td><td></td><td>string</td></tr>
+
-
<tr><td>outputfa</td><td>Output of new chromosome sequence in FASTA format</td><td></td><td>string</td></tr>
+
-
<tr><td>verbose</td><td>Output the detailed information in STDOUT</td><td>none</td><td>option</td></tr>
+
-
<tr><td>biobrickstandard</td><td>Erase the biobrick standard enzyme site</td><td>none</td><td>option</td></tr>
+
-
<tr><td>delenzymelist</td><td>The file of enzyme going to delete</td><td></td><td>string</td></tr>
+
-
<tr><td>detail</td><td>Show the erased enzyme site in new gff</td><td>none</td><td>option</td></tr>
+
-
<tr><td>help</td><td>Show help information</td><td></td><td></td></tr>
+
-
</table>
+
</section>
-
            <b><p>2.2.4 The format of output</p></b>
+
-
<p>The output files are standard GFF and FASTA format.<br/>
+
-
1. GFF file<br/>
+
-
2. FASTA file<br/>
 
-
3. Detailed information in STDOUT<br/>
+
<section class="container">
-
<img src="https://static.igem.org/mediawiki/2013/e/ea/T2-4.png" /><br/>
+
<div class="one-third column">
-
</p>
+
-
            <h3>2.3 Create enzyme site</h3>
+
<a href="https://2013.igem.org/Team:Shenzhen_BGIC_0101/Tutorial/olsdesigner" target="_blank">
-
<p>Given a list of restriction enzyme information, this plugin can create a new enzyme site in specific region of selected gene. If you are using more than one plugin at the same time, this plugin will start after erase enzyme site and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
+
<img src="https://static.igem.org/mediawiki/2013/e/ee/Olsdesigner.png" />
-
            <b><p>2.3.1 Internal operation</p></b>
+
</a>
-
<p>First, information of enzyme site will be extracted. According to 3 reading frames, a searching tree will be constructed and converted to regular expression.
+
-
The plugin will search the selected regions and then change the sequence to enzyme site by synonymous substitution method.
+
-
If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
+
-
            <b><p>2.3.2 Example</p></b>
+
-
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -addenzymelist enzyme.list -addenzymeconfig gene_id,start_pos,end_pos,enzyme_name</p>
+
-
            <b><p>2.3.3 Parameters</p></b>
+
-
<p>
+
-
<table>
+
-
<tr><td>Parameter</td><td>Description</td><td>Default</td><td>Selectable range</td></tr>
+
-
<tr><td>inputfa</td><td>The NeoChr sequence file in FASTA format</td><td></td><td>string</td></tr>
+
-
<tr><td>inputgff</td><td>The NeoChr annotation file in GFF3 format</td><td></td><td>string</td></tr>
+
-
<tr><td>outputgff</td><td>Output of new chromosome annotation in GFF3 format</td><td></td><td>string</td></tr>
+
-
<tr><td>outputfa</td><td>Output of new chromosome sequence in FASTA format</td><td></td><td>string</td></tr>
+
-
<tr><td>verbose</td><td>Output the detailed information in STDOUT</td><td>none</td><td>option</td></tr>
+
-
<tr><td>addenzymelist</td><td>The file of enzyme to get enzyme site information</td><td></td><td>string</td></tr>
+
-
<tr><td>addenzymeconfig</td><td>A array of string to specify enzyme and regions</td><td></td><td>string,int,int,string</td></tr>
+
-
<tr><td>help</td><td>Show help information</td><td></td><td></td></tr>
+
-
</table>
+
-
</p>
+
-
            <b><p>2.3.4 The format of ouput</p></b>
+
-
<p>The output files are standard GFF and FASTA format.<br/>
+
-
1. GFF file<br/>
+
-
2. FASTA file<br/>
+
<h1>OLS Designer</h1>
 +
 +
<a href="https://static.igem.org/mediawiki/2013/2/26/Tutorial-OLSDesigner.pdf" target="_self"><h4>Download</h4></a>
 +
</div>
-
3. Detailed information in STDOUT<br/>
+
<div class="one-third column">
-
<img src="https://static.igem.org/mediawiki/2013/7/75/T2-5.png" /><br/>
+
-
 
+
<a href="https://2013.igem.org/Team:Shenzhen_BGIC_0101/Tutorial/kgml" target="_blank">
-
</p>
+
<img src="https://static.igem.org/mediawiki/2013/0/02/Otherspuple.png" />
-
            <h3>2.4 Codon optimization</h3>
+
</a>
-
<p>Given a codon priority list, this plugin is used to optimize the codon so that we can increase the expression of selected genes. If you are using more than one plugin at the same time, this plugin will start after create enzyme site and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
+
     
-
            <b><p>2.4.1 Internal operation</p></b>
+
<h1>Kgml2Jason</h1>
-
<p>The codon with same amino acid will be separated into 3 ranks, best normal and worst. Every codon of selected gene will be check whether the codon is in best rank. The codon in normal or worst will be change to best rank by synonymous substitution method.
+
-
If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
+
<a href="https://static.igem.org/mediawiki/2013/f/fa/Turorial-Kgml2Jason.pdf" target="_self"><h4>Download</h4></a>
-
            <b><p>2.4.2 Example</p></b>
+
</div>
-
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -codonoptimize CodonPriority.txt -optimizeallgene [-optimizegenelist gene1,gene2,gene3 ]</p>
+
-
            <b><p>2.4.3 Parameters</p></b>
+
-
<p>
+
-
<table>
+
-
<tr><td>Parameter</td><td>Description</td><td>Default</td><td>Selectable range</td></tr>
+
-
<tr><td>inputfa</td><td>The NeoChr sequence file in FASTA format</td><td></td><td>string</td></tr>
+
-
<tr><td>inputgff</td><td>The NeoChr annotation file in GFF3 format</td><td></td><td>string</td></tr>
+
-
<tr><td>outputgff</td><td>Output of new chromosome annotation in GFF3 format</td><td></td><td>string</td></tr>
+
-
<tr><td>outputfa</td><td>Output of new chromosome sequence in FASTA format</td><td></td><td>string</td></tr>
+
-
<tr><td>verbose</td><td>Output the detailed information in STDOUT</td><td>none</td><td>option</td></tr>
+
-
<tr><td>codonoptimize</td><td>Codon priority list to get the ranking information</td><td></td><td>string</td></tr>
+
-
<tr><td>optimizeallgene</td><td>Optimize all genes in inputgff</td><td></td><td>option</td></tr>
+
-
<tr><td>optimizegenelist</td><td>A list of gene going to optimize,  separate by comma</td><td></td><td>string,string,string,...</td></tr>
+
-
<tr><td>detail</td><td>Show the optimization sequence in new gff</td><td>none</td><td>option</td></tr>
+
-
<tr><td>help</td><td>Show help information</td><td></td><td></td></tr>
+
-
</table>
+
-
</p>
+
-
            <b><p>2.4.4 The format of ouput</p></b>
+
-
<p>The output files are standard GFF and FASTA format.<br/>
+
-
1. GFF file<br/>
+
-
2 .FASTA file<br/>
+
-
3. Detailed information in STDOUT<br/>
+
-
<img src="https://static.igem.org/mediawiki/2013/a/a2/T2-6.png" />
+
-
</p>
+
-
            <h3>2.5 Repeat smash</h3>
+
-
<p>This plugin go through the CDS region to find out the tandem repeat bases. Synonymous substitution method will be applied to break long tandem repeat base to reduce the synthesis difficulty. If you are using more than one plugin at the same time, this plugin will start finally and then it will generate a new fasta file for sequence and gff file for annotation.</p>
+
-
            <b><p>2.5.1 Internal operation</p></b>
+
-
<p>Regular expression is used to find out the tandem repeat bases longer then specified length (usually longer than 5bp). From the third of the matched sequence, synonymous substitution method will be applied to break the tandem repeat bases.
+
-
If the substitution is successful and the rest sequence is still longer than the cutoff, then it will move to next 3 bases and do the same thing.
+
-
The sequence and annotation information will be recreated in FASTA and GFF format.</p>
+
-
            <b><p>2.3.2 Example</p></b>
+
-
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -repeatsmash 5</p>
+
-
            <b><p>2.3.3 Parameters</p></b>
+
-
<p>
+
-
<table>
+
-
<tr><td>Parameter</td><td>Description</td><td>Default</td><td>Selectable range</td></tr>
+
-
<tr><td>inputfa</td><td>The NeoChr sequence file in FASTA format</td><td></td><td>string</td></tr>
+
-
<tr><td>inputgff</td><td>The NeoChr annotation file in GFF3 format</td><td></td><td>string</td></tr>
+
-
<tr><td>outputgff</td><td>Output of new chromosome annotation in GFF3 format</td><td></td><td>string</td></tr>
+
-
<tr><td>outputfa</td><td>Output of new chromosome sequence in FASTA format</td><td></td><td>string</td></tr>
+
-
<tr><td>verbose</td><td>Output the detailed information in STDOUT</td><td>none</td><td>option</td></tr>
+
-
<tr><td>repeatsmash</td><td>The tandem repeat bases longer or equal to this cutoff will be smashed</td><td></td><td>int</td></tr>
+
-
<tr><td>detail</td><td>Show the repeat smash result in new gff</td><td>none</td><td>option</td></tr>
+
-
<tr><td>help</td><td>Show help information</td><td></td><td></td></tr>
+
-
</table>
+
-
</p>
+
-
            <b><p>2.3.4 The format of ouput</p></b>
+
-
<p>
+
-
The output files are standard GFF and FASTA format.<br/>
+
-
1. GFF file<br/>
+
-
2. FASTA file<br/>
+
-
3. Detailed information in STDOUT<br/>
+
-
<img src="https://static.igem.org/mediawiki/2013/8/83/T2-7.png" />
+
-
</p>
+
-
    </div>
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
        <div class="content-3">
+
-
<h1> SegmMan </h1>
+
-
<p>This module will cut chromosome into pieces with different sizes with Gibson, Goldengate, Homologous adaptors to them so that they are able to be assembled into whole experimentally.</p>
+
-
          <h2>Plugin Scripts</h2>
+
-
<br/>
+
-
          <h3>3-1. 01.whole2mega.pl</h3>
+
-
<p>This utility can split the whole chromosome ( at least 90kbp long ) into about 30k segments and add homologous overlap and adaptors, so that these fragments can be integrated into whole experimentally.</p>
+
-
          <b><p>Internal operation</p></b>
+
-
<p>First, this utility searches for the location of centromere and ARSs (autonomously replicating site). The minimal distance between centromere and ARS should NOT be larger than a defined megachunk which is about 30k long. <br/>
+
-
Second, this utility cuts out the first 30k sequence window containing the centromere and its adjacent ARS, and then adds this megachunk with two original markers and left, right telomeres.<br/>
+
-
Thirdly, this utility continues to cut more megachunks from the original one to both ends. But these megachunks are not independent, they all have about 1kbp overlaps. Moreover, these new splited window can be given only one marker alternately and only left or right telomere.<br/>
+
-
The output file will be dealed with 02.globalREmarkup.pl<br/>
+
-
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .</p>
+
-
          <b><p>Example (command line)</p></b>
+
-
<p>perl 01.whole2mega.pl –gff sce_chrI.gff -fa sce_chr01.fa -ol 1000 -ck 30000 -m1 LEU2 -m2 URA3 -m3 HIS3 -m4 TRP1 -ot sce_chrI.mega</p>
+
-
          <b><p>Parameters</p></b>
+
-
<table><tbody>
+
-
<tr><td></td><td></td><td>default</td><td>Option</td></tr>
+
-
<tr><td>gff</td><td>The gff file of the chromosome being restriction enzyme sites parsing</td><td></td><td></td></tr>
+
-
<tr><td>fa</td><td>The fasta file of the chromosome being restriction enzyme sites parsing
+
-
(The length of the chromosome is larger than 90k)</td><td></td><td></td></tr>
+
-
<tr><td>ol</td><td>The length of overlap between megachunks</td><td>1000bp</td><td></td></tr>
+
-
<tr><td>ck</td><td>The length of megachunks</td><td>30kbp</td><td></td></tr>
+
-
<tr><td>m1</td><td>The first marker for selection alternately</td><td>LEU2 (1797bp)</td><td>LEU2/URA3HIS3/TRP1</td></tr>
+
-
<tr><td>m2</td><td>The second marker for selection alternately
+
-
</td><td>URA3 (1112bp)</td><td>LEU2/URA3/HIS3/TRP1</td></tr>
+
-
<tr><td>m3</td><td>The first marker orinally residing in first 30k segmentation</td><td>HIS3 (1774bp)</td><td>LEU2/URA3/HIS3/TRP1</td></tr>
+
-
<tr><td>m4</td><td>The second marker orinally residing in first 30k segmentation</td><td>TRP1 (1467bp)</td><td>LEU2/URA3/HIS3/TRP1</td></tr>
+
-
<tr><td>ot</td><td>The output file </td><td>Prefix(fa filename)+ suffix(.mega)</td><td></td></tr>
+
-
</tbody></table>
+
-
          <b><p>The format of output:</p></b>
+
        <div class="one-third column">
-
<p>The output file is stored in /the path where you install GENOVO/Result/ 01.whole2mega.<br/>
+
-
Besides, there is screen output about the process state and result.<br/>
+
<a href="https://2013.igem.org/Team:Shenzhen_BGIC_0101/Tutorial/videoturorial" target="_blank">
-
1. Screen output<br/>
+
<img src="https://static.igem.org/mediawiki/2013/c/c1/Vedior.png" />
-
2. 01.state <br/>
+
</a>
-
&nbsp;Store the segmentation information<br/>
+
     
-
<table><tbody>
+
<h1>Video Tutorial</h1>
-
<tr><td>Megachunk_ID</td><td>Corresponding location in the designed chromosome</td></tr>
+
-
<tr><td>Part ID</td><td>Location in the segmentation</td></tr>
+
</div>
-
</tbody></table>
+
</section>
-
<img src="https://static.igem.org/mediawiki/2013/c/c1/T3-1.png" /><br/>
+
<br/><br/>
-
3 *.mega<br/>
+
</body>
-
&nbsp;Store the fasta information of the 30k segments<br/>
+
-
<img src="https://static.igem.org/mediawiki/2013/f/f0/T3-2.png" />
+
-
</p>
+
-
          <h3>3-2. 02.globalREmarkup.pl</h3>
+
-
<p>This utility will parse the exited restriction enzyme sites residing in the chromosome.</p>
+
-
          <b><p>Internal operation</p></b>
+
-
<p>This utility searches the exited restriction enzyme sites along the chromosome both plus strand and minus strand, after users define the list of enzymes.<br/>
+
-
Besides, we tried to find out all the potential restriction enzyme sites, so that maybe some unusual restriction enzyme sites can be created and let segmentation go. But because it had low efficiency, we’re still working on that.<br/>
+
-
The output file will be dealed with 03.mega2chunk2mini.pl<br/>
+
-
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .
+
-
</p>
+
-
          <b><p>Example (command line)</p></b>
+
-
<p>perl 02.globalREmarkup.pl -sg 01.whole2mega/sce_chrI.mega -re standard_and_IIB -ct Standard.ct –ot sce_chrI.mega.parse</p>
+
-
          <b><p>Parameters</p></b>
+
-
<p>
+
-
<table><tbody>
+
-
<tr><td></td><td></td><td>default</td><td>Option</td></tr>
+
-
<tr><td>sg</td><td>The fasta file of the 30k segmentation, the output of 01.wh2mega.pl</td><td></td><td></td></tr>
+
-
<tr><td>ps</td><td>The markup file of the 30k segmentation, the output of 02.globalREmarkup.pl</td><td></td><td></td></tr>
+
-
<tr><td>re</td><td>The restriction enzyme sites list. It is devided by different standards, type (IIP, IIA, IIB), cost (standard, nonexpensive) and etc.</td><td>Standard_and_IIB</td><td>IIP/IIA/IIB/Standard/
+
-
Nonexpensive/
+
-
Standard_IIB
+
-
Nonexpensive_IIB</td></tr>
+
-
<tr><td>a2</td><td>2k to 10k assembly strategy (Gibson or Goldengate)</td><td>Gibson</td><td>Gibson/ Goldengate</td></tr>
+
-
<tr><td>a10</td><td>10k to 30k assembly strategy (Gibson or Goldengate)</td><td>Goldengate</td><td>Gibson/ Goldengate</td></tr>
+
-
<tr><td>ckmax2</td><td>The maximum length of minichunks</td><td>2200 bp</td><td></td></tr>
+
-
<tr><td>ckmin2</td><td>The minimum length of minichunks </td><td>1800 bp</td><td></td></tr>
+
-
<tr><td>cknum</td><td>The number of minichunks in a chunk</td><td>5</td><td></td></tr>
+
-
 
+
-
</tbody></table>
+
-
Codon table list:<br/>
+
-
1 The Standard Code<br/>
+
-
2 The Vertebrate Mitochondrial Code<br/>
+
-
3 The Yeast Mitochondrial Code<br/>
+
-
4 The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code<br/>
+
-
5The Invertebrate Mitochondrial Code<br/>
+
-
6 The Ciliate, Dasycladacean and Hexamita Nuclear Code<br/>
+
-
7 The Echinoderm and Flatworm Mitochondrial Code<br/>
+
-
8 The Euplotid Nuclear Code<br/>
+
-
9 The Bacterial, Archaeal and Plant Plastid Code<br/>
+
-
10 The Alternative Yeast Nuclear Code<br/>
+
-
11 The Ascidian Mitochondrial Code<br/>
+
-
12 The Alternative Flatworm Mitochondrial Code<br/>
+
-
13 Blepharisma Nuclear Code<br/>
+
-
14 Chlorophycean Mitochondrial Code<br/>
+
-
15 Trematode Mitochondrial Code<br/>
+
-
16 Scenedesmus Obliquus Mitochondrial Code<br/>
+
-
17 Thraustochytrium Mitochondrial Code<br/>
+
-
18 Pterobranchia Mitochondrial Code<br/>
+
-
19 Candidate Division SR1 and Gracilibacteria Code<br/>
+
-
</p>
+
-
          <p>The format of utput</p>
+
-
<p>The output file is stored in /the path where you install GENOVO/Result/. 02.globalREmarkup.<br/>
+
-
Besides, there is screen output about the process state and result.<br/>
+
-
1. Screen output<br/>
+
-
2. *.parse<br/>
+
-
Store the exited enzyme recognition site in the megachunks<br/>
+
-
Enzyme ID Start End Recognition site Real site<br/>
+
-
<img src="https://static.igem.org/mediawiki/2013/b/bf/T3-3.png" />
+
-
</p>
+
-
          <b><p>3-3. 03.chunk_30k_10k_2k.pl</p></b>
+
-
<p>This utility can produce 2k minichunks with Gibson adaptors and 10k chunks with goldengate adaptors.</p>
+
-
          <p>Internal operation</p>
+
-
<p>This utility will segmentate the megachunk produced by 03.mega2chunk2mini.pl into 2k minichunks with Gibson assembly adaptors, so that they can be put together into 10k chunks.<br/>
+
-
First, this bin will search the inexistent restriction enzyme sites locally, and then decide the size of the minichunks according to the requirements from users, and add two same Gibson adaptors to each sides of minichunks.
+
-
Secondly, the second part of this bin will define the start and end point of the chunks as users asked and design goldengate assembly adaptors for the chunks.<br/>
+
-
The output file can be sent in gene synthesis company after human attention and double check.<br/>
+
-
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .</p>
+
-
          <b><p>Example (command line)</p></b>
+
-
<p>perl 03.mega2chunk2mini.pl -re standard_and_IIB -sg 01.whole2mega/sce_chr01_0.mega -ps 02.globalREmarkup/sce_chr01_0.parse  -ot 03.mega2chunk2mini</p>
+
-
          <b><p>Parameters</p></b>
+
-
<p>
+
-
<table><tbody>
+
-
<tr><td></td><td></td><td>default</td><td>Option</td></tr>
+
-
<tr><td>sg</td><td>The fasta file of the 30k segmentation, the output of 01.wh2mega.pl</td><td></td><td></td></tr>
+
-
<tr><td>ps</td><td>The markup file of the 30k segmentation, the output of 02.globalREmarkup.pl</td><td></td><td></td></tr>
+
-
<tr><td>re</td><td>The restriction enzyme sites list. It is devided by different standards, type (IIP, IIA, IIB), cost (standard, nonexpensive) and etc.</td><td>Standard_and_IIB</td><td>IIP/IIA/IIB/Standard/
+
-
Nonexpensive/
+
-
Standard_IIB
+
-
Nonexpensive_IIB</td></tr>
+
-
<tr><td>a2</td><td>2k to 10k assembly strategy (Gibson or Goldengate)</td><td>Gibson</td><td>Gibson/ Goldengate</td></tr>
+
-
<tr><td>a10</td><td>10k to 30k assembly strategy (Gibson or Goldengate)</td><td>Goldengate</td><td>Gibson/ Goldengate</td></tr>
+
-
<tr><td>ckmax2</td><td>The maximum length of minichunks</td><td>2200 bp</td><td></td></tr>
+
-
<tr><td>ckmin2</td><td>The minimum length of minichunks </td><td>1800 bp</td><td></td></tr>
+
-
<tr><td>cknum</td><td>The number of minichunks in a chunk</td><td>5</td><td></td></tr>
+
-
 
+
-
</tbody>
+
-
</table>
+
-
If parameter a2 is Gibson, then there are additional parameters:<br/>
+
-
<table><tbody>
+
-
<tr><td>ol2</td><td>The length of overlap</td><td>40 bp</td><td></td></tr>
+
-
<tr><td>tmax2</td><td>The maximum melting temperature of the overlap of minichunks</td><td>60℃</td><td></td></tr>
+
-
<tr><td>tmin2</td><td>The minimum melting temperature of the overlap of minichunks</td><td>56℃</td><td></td></tr>
+
-
<tr><td>fe2</td><td>The minimum free energy of the overlap of minichunks</td><td>-3</td><td></td></tr>
+
-
<tr><td>ex2</td><td>The type of exonuclease used for minichunks</td><td>T5</td><td>T5/T3</td></tr>
+
-
<tr><td>lo2</td><td>The minimum distance between minichunks overlap and loxpsym</td><td>40 bp</td><td></td></tr>
+
-
<tr><td>en2</td><td>The type of enzyme flanking minichunks</td><td>IIP</td><td></td></tr>
+
-
<tr><td>et2</td><td></td><td></td><td></td></tr>
+
-
<tr><td>ep2</td><td>The maximum unit price of enzyme used in minichunks digestion</td><td>0.5 $/unit</td><td></td></tr>
+
-
 
+
-
</tbody>
+
-
</table>
+
-
If parameter a10 is Goldengate, then there are additional parameters:<br/>
+
-
<table><tbody>
+
-
<tr><td>en10</td><td>The type of enzyme flanking chunks</td><td>IIB</td><td>IIA/IIB</td></tr>
+
-
<tr><td>et10</td><td>The temperature of enzyme used in chunks digestion</td><td>37℃</td><td></td></tr>
+
-
 
+
-
</tbody>
+
-
</table>
+
-
</p>
+
-
          <b><p>The format of ouput</p></b>
+
-
<p>The output file is stored in /the path where you install GENOVO/Result/. 03.mega2chunk2mini.<br/>
+
-
Besides, there is screen output about the process state and result.<br/>
+
-
1. Screen output<br/>
+
-
2. *.2kstate<br/>
+
-
Store the minichunks states.<br/>
+
-
<table><tr><td>Left IIP enzyme site</td><td>Right IIP enzyme site</td><td>Start</td><td>End</td><td>Size of minichunks</td><td>Melting temperature of overlap</td></tr>
+
-
</table>
+
-
<img src="https://static.igem.org/mediawiki/2013/b/bf/T3-3.png" /><br/>
+
-
3. *.10kstate<br/>
+
-
Store the chunks states<br/>
+
-
<table>
+
-
<tr><td>Left IIB enzyme site</td><td>Right IIB enzyme site</td><td>Start</td><td>End</td>Size of chunks<td></td></tr>
+
-
</table>
+
-
<img src="https://static.igem.org/mediawiki/2013/a/ad/T3-4.png" /><br/>
+
-
4. *.mini<br/>
+
-
Store the fasta of designed minichunks.<br/>
+
-
<img src="https://static.igem.org/mediawiki/2013/0/0f/T3-5.png" /><br/>
+
-
</p>
+
-
    </div>
+
-
+
-
+
-
+
-
+
-
+
-
+
-
    <div class="content-4">
+
-
<h2>Others</h2>
+
-
                        <p>You see? It's curious. Ted did figure it out - time travel. And when we get back, we gonna tell everyone. How it's possible, how it's done, what the dangers are. But then why fifty years in the future when the spacecraft encounters a black hole does the computer call it an 'unknown entry event'? Why don't they know? If they don't know, that means we never told anyone. And if we never told anyone it means we never made it back. Hence we die down here. Just as a matter of deductive logic.</p>
+
-
<h3>Get in touch</h3>
+
-
<p>Well, the way they make shows is, they make one show. That show's called a pilot. Then they show that show to the people who make shows, and on the strength of that one show they decide if they're going to make more shows. Some pilots get picked and become television programs. Some don't, become nothing. She starred in one of the ones that became nothing. </p>
+
-
    </div>
+
-
        </div>
+
-
</section>
+
-
        </div>
+
-
    </body>
+
</html>
</html>

Latest revision as of 21:39, 28 October 2013