Team:Shenzhen BGIC 0101/Tutorial/nucleomod

From 2013.igem.org

(Difference between revisions)
 
(One intermediate revision not shown)
Line 1: Line 1:
-
{{:Team:Shenzhen_BGIC_0101/Templates/Header}}
+
{{:Team:Shenzhen_BGIC_0101/Templates/Header}}
<html>
<html>
<body>
<body>
Line 12: Line 12:
             <h3>2.1 CRISPR design</h3>
             <h3>2.1 CRISPR design</h3>
<p>This plugin is used to design CRISPR site of NeoChr genes so that we can silence the wild type genes. We use blast+ to ensure the uniqueness of CRISPR sites. If you are using more than one plugin at the same time, this plugin will start firstly and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
<p>This plugin is used to design CRISPR site of NeoChr genes so that we can silence the wild type genes. We use blast+ to ensure the uniqueness of CRISPR sites. If you are using more than one plugin at the same time, this plugin will start firstly and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
-
             <b><p>2.1.1 Internal operation</p></b>
+
             <p><b>2.1.1 Internal operation</b></p>
<p>First, this plugin extracts sequence and annotation from the NeoChr FASTA file and GFF3 file, respectively. Regular expression will be applied to find the 23bp basic structure of CRISPR site, with a head of ‘G’ then following 20 facultative bases and finally followed by ‘GG’. All the sequences and locus will be record in an array. <br/>
<p>First, this plugin extracts sequence and annotation from the NeoChr FASTA file and GFF3 file, respectively. Regular expression will be applied to find the 23bp basic structure of CRISPR site, with a head of ‘G’ then following 20 facultative bases and finally followed by ‘GG’. All the sequences and locus will be record in an array. <br/>
Second, the blast+ will be used to check whether the 12bp sequences (from 9th to 20th) are uniq in the wild type genome. Only uniq sites will be reserved. <br/>
Second, the blast+ will be used to check whether the 12bp sequences (from 9th to 20th) are uniq in the wild type genome. Only uniq sites will be reserved. <br/>
Third, synonymous substitution method will be applied to change one base between the 9th to 20th bases of the CRISPR structure. The result will be record in GFF as an element of gene. If –verbose is set, the designed number will be report in STDOUT.<br/>
Third, synonymous substitution method will be applied to change one base between the 9th to 20th bases of the CRISPR structure. The result will be record in GFF as an element of gene. If –verbose is set, the designed number will be report in STDOUT.<br/>
Finally, if this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
Finally, if this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
-
             <b><p>2.1.2 Example</p></b>
+
             <p><b>2.1.2 Example</b></p>
<p>We have two input forms to execute the plugin:<br/>
<p>We have two input forms to execute the plugin:<br/>
Run CRISPR design plugin only:<br/>
Run CRISPR design plugin only:<br/>
perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -crisprnum 2 -database saccharomyces_cerevisiae_chr.fa</p>
perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -crisprnum 2 -database saccharomyces_cerevisiae_chr.fa</p>
-
             <b><p>2.1.3 Parameters</p></b>
+
             <p><b>2.1.3 Parameters</b></p>
<table border="1">
<table border="1">
<tr><th>Parameter</th><th>Description</th><th>Default</th><th>Selectable range</th></tr>
<tr><th>Parameter</th><th>Description</th><th>Default</th><th>Selectable range</th></tr>
Line 33: Line 33:
<tr><th>help</th><th>Show help information</th><th></th><th></th></tr>
<tr><th>help</th><th>Show help information</th><th></th><th></th></tr>
</table>
</table>
-
             <b><p>2.1.4 The format of output file</p></b>
+
             <p><b>2.1.4 The format of output file</b></p>
<p>The output files are standard GFF and FASTA format files.<br/>
<p>The output files are standard GFF and FASTA format files.<br/>
1. GFF file<br/>
1. GFF file<br/>
Line 44: Line 44:
             <h3>2.2 Erase enzyme site</h3>
             <h3>2.2 Erase enzyme site</h3>
<p>Given a list of restriction enzyme information, this plugin will erase the restriction sites in every gene. If you are using more than one plugin at the same time, this plugin will start after CRISPR design and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
<p>Given a list of restriction enzyme information, this plugin will erase the restriction sites in every gene. If you are using more than one plugin at the same time, this plugin will start after CRISPR design and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
-
             <b><p>2.2.1 Internal operation</p></b>
+
             <p><b>2.2.1 Internal operation</b></p>
<p>The enzyme information will be extracted. (If the –borbrickstandard parameter is set, it will also remove EcoRI, XbaI, SpeI, PstI and NotI) The recognize site will be reformatted to regular expression and searched in the CDS regions.
<p>The enzyme information will be extracted. (If the –borbrickstandard parameter is set, it will also remove EcoRI, XbaI, SpeI, PstI and NotI) The recognize site will be reformatted to regular expression and searched in the CDS regions.
Once a restriction site is matched, synonymous substitution method will be applied to try to erase the enzyme site. When the substitution is finished, the plugin will restart the next search from 1 base after the last matched position.
Once a restriction site is matched, synonymous substitution method will be applied to try to erase the enzyme site. When the substitution is finished, the plugin will restart the next search from 1 base after the last matched position.
If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
-
             <b><p>2.2.2 Example</p></b>
+
             <p><b>2.2.2 Example</b></p>
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa –biobrickstandard [-delenzymelist enzyme.list ]<br/>
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa –biobrickstandard [-delenzymelist enzyme.list ]<br/>
<br/>
<br/>
Line 54: Line 54:
Company  enzyme_name  enzyme_site  …<br/>
Company  enzyme_name  enzyme_site  …<br/>
  Eg. NEB      BamHI        G/GATCC</p>
  Eg. NEB      BamHI        G/GATCC</p>
-
             <b><p>2.2.3 Parameters</p></b>
+
             <p><b>2.2.3 Parameters</b></p>
<table border="1">
<table border="1">
<tr><th>Parameter</th><th>Description</th><th>Default</th><th>Selectable range</th></tr>
<tr><th>Parameter</th><th>Description</th><th>Default</th><th>Selectable range</th></tr>
Line 68: Line 68:
</table>
</table>
-
             <b><p>2.2.4 The format of output</p></b>
+
             <p><b>2.2.4 The format of output</b></p>
<p>The output files are standard GFF and FASTA format.<br/>
<p>The output files are standard GFF and FASTA format.<br/>
1. GFF file<br/>
1. GFF file<br/>
Line 79: Line 79:
             <h3>2.3 Create enzyme site</h3>
             <h3>2.3 Create enzyme site</h3>
<p>Given a list of restriction enzyme information, this plugin can create a new enzyme site in specific region of selected gene. If you are using more than one plugin at the same time, this plugin will start after erase enzyme site and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
<p>Given a list of restriction enzyme information, this plugin can create a new enzyme site in specific region of selected gene. If you are using more than one plugin at the same time, this plugin will start after erase enzyme site and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
-
             <b><p>2.3.1 Internal operation</p></b>
+
             <p><b>2.3.1 Internal operation</b></p>
<p>First, information of enzyme site will be extracted. According to 3 reading frames, a searching tree will be constructed and converted to regular expression.  
<p>First, information of enzyme site will be extracted. According to 3 reading frames, a searching tree will be constructed and converted to regular expression.  
The plugin will search the selected regions and then change the sequence to enzyme site by synonymous substitution method.
The plugin will search the selected regions and then change the sequence to enzyme site by synonymous substitution method.
If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
-
             <b><p>2.3.2 Example</p></b>
+
             <p><b>2.3.2 Example</b></p>
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -addenzymelist enzyme.list -addenzymeconfig gene_id,start_pos,end_pos,enzyme_name</p>
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -addenzymelist enzyme.list -addenzymeconfig gene_id,start_pos,end_pos,enzyme_name</p>
-
             <b><p>2.3.3 Parameters</p></b>
+
             <p><b>2.3.3 Parameters</b></p>
<p>
<p>
<table border="1">
<table border="1">
Line 99: Line 99:
</table>
</table>
</p>
</p>
-
             <b><p>2.3.4 The format of ouput</p></b>
+
             <p><b>2.3.4 The format of ouput</b></p>
<p>The output files are standard GFF and FASTA format.<br/>
<p>The output files are standard GFF and FASTA format.<br/>
1. GFF file<br/>
1. GFF file<br/>
Line 111: Line 111:
             <h3>2.4 Codon optimization</h3>
             <h3>2.4 Codon optimization</h3>
<p>Given a codon priority list, this plugin is used to optimize the codon so that we can increase the expression of selected genes. If you are using more than one plugin at the same time, this plugin will start after create enzyme site and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
<p>Given a codon priority list, this plugin is used to optimize the codon so that we can increase the expression of selected genes. If you are using more than one plugin at the same time, this plugin will start after create enzyme site and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.</p>
-
             <b><p>2.4.1 Internal operation</p></b>
+
             <p><b>2.4.1 Internal operation</b></p>
<p>The codon with same amino acid will be separated into 3 ranks, best normal and worst. Every codon of selected gene will be check whether the codon is in best rank. The codon in normal or worst will be change to best rank by synonymous substitution method.
<p>The codon with same amino acid will be separated into 3 ranks, best normal and worst. Every codon of selected gene will be check whether the codon is in best rank. The codon in normal or worst will be change to best rank by synonymous substitution method.
If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.</p>
-
             <b><p>2.4.2 Example</p></b>
+
             <p><b>2.4.2 Example</b></p>
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -codonoptimize CodonPriority.txt -optimizeallgene [-optimizegenelist gene1,gene2,gene3 ]</p>
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -codonoptimize CodonPriority.txt -optimizeallgene [-optimizegenelist gene1,gene2,gene3 ]</p>
-
             <b><p>2.4.3 Parameters</p></b>
+
             <p><b>2.4.3 Parameters</b></p>
<p>
<p>
<table border="1">
<table border="1">
Line 132: Line 132:
</table>
</table>
</p>
</p>
-
             <b><p>2.4.4 The format of ouput</p></b>
+
             <p><b>2.4.4 The format of ouput</b></p>
<p>The output files are standard GFF and FASTA format.<br/>
<p>The output files are standard GFF and FASTA format.<br/>
1. GFF file<br/>
1. GFF file<br/>
Line 141: Line 141:
             <h3>2.5 Repeat smash</h3>
             <h3>2.5 Repeat smash</h3>
<p>This plugin go through the CDS region to find out the tandem repeat bases. Synonymous substitution method will be applied to break long tandem repeat base to reduce the synthesis difficulty. If you are using more than one plugin at the same time, this plugin will start finally and then it will generate a new fasta file for sequence and gff file for annotation.</p>
<p>This plugin go through the CDS region to find out the tandem repeat bases. Synonymous substitution method will be applied to break long tandem repeat base to reduce the synthesis difficulty. If you are using more than one plugin at the same time, this plugin will start finally and then it will generate a new fasta file for sequence and gff file for annotation.</p>
-
             <b><p>2.5.1 Internal operation</p></b>
+
             <p><b>2.5.1 Internal operation</b></p>
<p>Regular expression is used to find out the tandem repeat bases longer then specified length (usually longer than 5bp). From the third of the matched sequence, synonymous substitution method will be applied to break the tandem repeat bases.  
<p>Regular expression is used to find out the tandem repeat bases longer then specified length (usually longer than 5bp). From the third of the matched sequence, synonymous substitution method will be applied to break the tandem repeat bases.  
If the substitution is successful and the rest sequence is still longer than the cutoff, then it will move to next 3 bases and do the same thing.  
If the substitution is successful and the rest sequence is still longer than the cutoff, then it will move to next 3 bases and do the same thing.  
The sequence and annotation information will be recreated in FASTA and GFF format.</p>
The sequence and annotation information will be recreated in FASTA and GFF format.</p>
-
             <b><p>2.3.2 Example</p></b>
+
             <p><b>2.3.2 Example</b></p>
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -repeatsmash 5</p>
<p>perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -repeatsmash 5</p>
-
             <b><p>2.3.3 Parameters</p></b>
+
             <p><b>2.3.3 Parameters</b></p>
<p>
<p>
<table border="1">
<table border="1">
Line 161: Line 161:
</table>
</table>
</p>
</p>
-
             <b><p>2.3.4 The format of ouput</p></b>
+
             <p><b>2.3.4 The format of ouput</b></p>
<p>
<p>
The output files are standard GFF and FASTA format.<br/>
The output files are standard GFF and FASTA format.<br/>

Latest revision as of 05:25, 28 October 2013






Tutorial



NucleoMod

NucleoMod can modify CDS based on synonymous mutation. It has 5 applications. Firstly, NucleoMod is used to design CRISPR sites on NeoChr so that we can silence the wild type genes. Secondly, it can erase specific enzyme sites according to the users' selection. Thirdly, users can create an enzyme site in selected region of specific genes. Fourthly, it can optimize the codon efficiency to increase the expression level. Finally, it can smash the tandem repeat bases to reduce the synthesis difficulty.

Plugins

This module contains 5 plugins: CRISPR design, erase enzyme site, create enzyme site, codon optimization, repeat smash. All plugins are included in the main program.

2.1 CRISPR design

This plugin is used to design CRISPR site of NeoChr genes so that we can silence the wild type genes. We use blast+ to ensure the uniqueness of CRISPR sites. If you are using more than one plugin at the same time, this plugin will start firstly and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.

2.1.1 Internal operation

First, this plugin extracts sequence and annotation from the NeoChr FASTA file and GFF3 file, respectively. Regular expression will be applied to find the 23bp basic structure of CRISPR site, with a head of ‘G’ then following 20 facultative bases and finally followed by ‘GG’. All the sequences and locus will be record in an array.
Second, the blast+ will be used to check whether the 12bp sequences (from 9th to 20th) are uniq in the wild type genome. Only uniq sites will be reserved.
Third, synonymous substitution method will be applied to change one base between the 9th to 20th bases of the CRISPR structure. The result will be record in GFF as an element of gene. If –verbose is set, the designed number will be report in STDOUT.
Finally, if this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.

2.1.2 Example

We have two input forms to execute the plugin:
Run CRISPR design plugin only:
perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -crisprnum 2 -database saccharomyces_cerevisiae_chr.fa

2.1.3 Parameters

ParameterDescriptionDefaultSelectable range
inputfaThe NeoChr sequence file in FASTA formatstring
inputgffThe NeoChr annotation file in GFF3 formatstring
outputgffOutput of new chromosome annotation in GFF3 formatstring
outputfaOutput of new chromosome sequence in FASTA formatstring
verboseOutput the detailed information in STDOUTnoneoption
crisprnumNumber of CRISPR site to be design per geneInt (>0)
databaseThe sequence of reference genome, used as blast+ databasestring
helpShow help information

2.1.4 The format of output file

The output files are standard GFF and FASTA format files.
1. GFF file

2. FASTA file

3. Detailed information in STDOUT

2.2 Erase enzyme site

Given a list of restriction enzyme information, this plugin will erase the restriction sites in every gene. If you are using more than one plugin at the same time, this plugin will start after CRISPR design and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.

2.2.1 Internal operation

The enzyme information will be extracted. (If the –borbrickstandard parameter is set, it will also remove EcoRI, XbaI, SpeI, PstI and NotI) The recognize site will be reformatted to regular expression and searched in the CDS regions. Once a restriction site is matched, synonymous substitution method will be applied to try to erase the enzyme site. When the substitution is finished, the plugin will restart the next search from 1 base after the last matched position. If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.

2.2.2 Example

perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa –biobrickstandard [-delenzymelist enzyme.list ]

Format of enzyme.list:
Company enzyme_name enzyme_site …
  Eg. NEB BamHI G/GATCC

2.2.3 Parameters

ParameterDescriptionDefaultSelectable range
inputfaThe NeoChr sequence file in FASTA formatstring
inputgffThe NeoChr annotation file in GFF3 formatstring
outputgffOutput of new chromosome annotation in GFF3 formatstring
outputfaOutput of new chromosome sequence in FASTA formatstring
verboseOutput the detailed information in STDOUTnoneoption
biobrickstandardErase the biobrick standard enzyme sitenoneoption
delenzymelistThe file of enzyme going to deletestring
detailShow the erased enzyme site in new gffnoneoption
helpShow help information

2.2.4 The format of output

The output files are standard GFF and FASTA format.
1. GFF file
2. FASTA file
3. Detailed information in STDOUT

2.3 Create enzyme site

Given a list of restriction enzyme information, this plugin can create a new enzyme site in specific region of selected gene. If you are using more than one plugin at the same time, this plugin will start after erase enzyme site and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.

2.3.1 Internal operation

First, information of enzyme site will be extracted. According to 3 reading frames, a searching tree will be constructed and converted to regular expression. The plugin will search the selected regions and then change the sequence to enzyme site by synonymous substitution method. If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.

2.3.2 Example

perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -addenzymelist enzyme.list -addenzymeconfig gene_id,start_pos,end_pos,enzyme_name

2.3.3 Parameters

ParameterDescriptionDefaultSelectable range
inputfaThe NeoChr sequence file in FASTA formatstring
inputgffThe NeoChr annotation file in GFF3 formatstring
outputgffOutput of new chromosome annotation in GFF3 formatstring
outputfaOutput of new chromosome sequence in FASTA formatstring
verboseOutput the detailed information in STDOUTnoneoption
addenzymelistThe file of enzyme to get enzyme site informationstring
addenzymeconfigA array of string to specify enzyme and regionsstring,int,int,string
helpShow help information

2.3.4 The format of ouput

The output files are standard GFF and FASTA format.
1. GFF file
2. FASTA file
3. Detailed information in STDOUT

2.4 Codon optimization

Given a codon priority list, this plugin is used to optimize the codon so that we can increase the expression of selected genes. If you are using more than one plugin at the same time, this plugin will start after create enzyme site and deliver the data to next plugin. Otherwise it will generate a new fasta file for sequence and gff file for annotation.

2.4.1 Internal operation

The codon with same amino acid will be separated into 3 ranks, best normal and worst. Every codon of selected gene will be check whether the codon is in best rank. The codon in normal or worst will be change to best rank by synonymous substitution method. If this plugin is the last module, the sequence and annotation information will be recreated in FASTA and GFF format.

2.4.2 Example

perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -codonoptimize CodonPriority.txt -optimizeallgene [-optimizegenelist gene1,gene2,gene3 ]

2.4.3 Parameters

ParameterDescriptionDefaultSelectable range
inputfaThe NeoChr sequence file in FASTA formatstring
inputgffThe NeoChr annotation file in GFF3 formatstring
outputgffOutput of new chromosome annotation in GFF3 formatstring
outputfaOutput of new chromosome sequence in FASTA formatstring
verboseOutput the detailed information in STDOUTnoneoption
codonoptimizeCodon priority list to get the ranking informationstring
optimizeallgeneOptimize all genes in inputgffoption
optimizegenelistA list of gene going to optimize, separate by commastring,string,string,...
detailShow the optimization sequence in new gffnoneoption
helpShow help information

2.4.4 The format of ouput

The output files are standard GFF and FASTA format.
1. GFF file
2 .FASTA file
3. Detailed information in STDOUT

2.5 Repeat smash

This plugin go through the CDS region to find out the tandem repeat bases. Synonymous substitution method will be applied to break long tandem repeat base to reduce the synthesis difficulty. If you are using more than one plugin at the same time, this plugin will start finally and then it will generate a new fasta file for sequence and gff file for annotation.

2.5.1 Internal operation

Regular expression is used to find out the tandem repeat bases longer then specified length (usually longer than 5bp). From the third of the matched sequence, synonymous substitution method will be applied to break the tandem repeat bases. If the substitution is successful and the rest sequence is still longer than the cutoff, then it will move to next 3 bases and do the same thing. The sequence and annotation information will be recreated in FASTA and GFF format.

2.3.2 Example

perl NucleoMod.pl -inputfa NeoChr.fa -inputgff NeoChr.gff -outputgff new_annotation.gff -outputfa new_chr.fa -repeatsmash 5

2.3.3 Parameters

ParameterDescriptionDefaultSelectable range
inputfaThe NeoChr sequence file in FASTA formatstring
inputgffThe NeoChr annotation file in GFF3 formatstring
outputgffOutput of new chromosome annotation in GFF3 formatstring
outputfaOutput of new chromosome sequence in FASTA formatstring
verboseOutput the detailed information in STDOUTnoneoption
repeatsmashThe tandem repeat bases longer or equal to this cutoff will be smashedint
detailShow the repeat smash result in new gffnoneoption
helpShow help information

2.3.4 The format of ouput

The output files are standard GFF and FASTA format.
1. GFF file
2. FASTA file
3. Detailed information in STDOUT