Team:Shenzhen BGIC 0101/Tutorial/segmman
From 2013.igem.org
(Created page with "{{:Team:Shenzhen_BGIC_0101/Templates/Header}} <html> <body> <b><h1>Tutorial</h1></b> <br/> <hr style="color:#7380AE; height:2px;" /> <h1> SegmMan </h1> <p>This module will cut c...") |
|||
(2 intermediate revisions not shown) | |||
Line 12: | Line 12: | ||
<h3>3-1. 01.whole2mega.pl</h3> | <h3>3-1. 01.whole2mega.pl</h3> | ||
<p>This utility can split the whole chromosome ( at least 90kbp long ) into about 30k segments and add homologous overlap and adaptors, so that these fragments can be integrated into whole experimentally.</p> | <p>This utility can split the whole chromosome ( at least 90kbp long ) into about 30k segments and add homologous overlap and adaptors, so that these fragments can be integrated into whole experimentally.</p> | ||
- | < | + | <p><b>Internal operation</b></p> |
<p>First, this utility searches for the location of centromere and ARSs (autonomously replicating site). The minimal distance between centromere and ARS should NOT be larger than a defined megachunk which is about 30k long. <br/> | <p>First, this utility searches for the location of centromere and ARSs (autonomously replicating site). The minimal distance between centromere and ARS should NOT be larger than a defined megachunk which is about 30k long. <br/> | ||
Second, this utility cuts out the first 30k sequence window containing the centromere and its adjacent ARS, and then adds this megachunk with two original markers and left, right telomeres.<br/> | Second, this utility cuts out the first 30k sequence window containing the centromere and its adjacent ARS, and then adds this megachunk with two original markers and left, right telomeres.<br/> | ||
Line 18: | Line 18: | ||
The output file will be dealed with 02.globalREmarkup.pl<br/> | The output file will be dealed with 02.globalREmarkup.pl<br/> | ||
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .</p> | For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .</p> | ||
- | < | + | <p><b>Example (command line)</b></p> |
<p>perl 01.whole2mega.pl –gff sce_chrI.gff -fa sce_chr01.fa -ol 1000 -ck 30000 -m1 LEU2 -m2 URA3 -m3 HIS3 -m4 TRP1 -ot sce_chrI.mega</p> | <p>perl 01.whole2mega.pl –gff sce_chrI.gff -fa sce_chr01.fa -ol 1000 -ck 30000 -m1 LEU2 -m2 URA3 -m3 HIS3 -m4 TRP1 -ot sce_chrI.mega</p> | ||
- | < | + | <p><b>Parameters</b></p> |
- | <table><tbody> | + | <table border="1"><tbody> |
<tr><th></th><th></th><th>default</th><th>Option</th></tr> | <tr><th></th><th></th><th>default</th><th>Option</th></tr> | ||
<tr><th>gff</th><th>The gff file of the chromosome being restriction enzyme sites parsing</th><th></th><th></th></tr> | <tr><th>gff</th><th>The gff file of the chromosome being restriction enzyme sites parsing</th><th></th><th></th></tr> | ||
Line 36: | Line 36: | ||
</tbody></table> | </tbody></table> | ||
- | < | + | <p><b>The format of output:</b></p> |
<p>The output file is stored in /the path where you install GENOVO/Result/ 01.whole2mega.<br/> | <p>The output file is stored in /the path where you install GENOVO/Result/ 01.whole2mega.<br/> | ||
Besides, there is screen output about the process state and result.<br/> | Besides, there is screen output about the process state and result.<br/> | ||
Line 42: | Line 42: | ||
2. 01.state <br/> | 2. 01.state <br/> | ||
Store the segmentation information<br/> | Store the segmentation information<br/> | ||
- | <table><tbody> | + | <table border="1"><tbody> |
<tr><th>Megachunk_ID</th><th>Corresponding location in the designed chromosome</th></tr> | <tr><th>Megachunk_ID</th><th>Corresponding location in the designed chromosome</th></tr> | ||
<tr><th>Part ID</th><th>Location in the segmentation</th></tr> | <tr><th>Part ID</th><th>Location in the segmentation</th></tr> | ||
</tbody></table> | </tbody></table> | ||
- | <img src="https://static.igem.org/mediawiki/2013/c/c1/T3-1.png" /><br/> | + | </br></br> |
+ | <img src="https://static.igem.org/mediawiki/2013/c/c1/T3-1.png" /></p><br/> | ||
3 *.mega<br/> | 3 *.mega<br/> | ||
Store the fasta information of the 30k segments<br/> | Store the fasta information of the 30k segments<br/> | ||
- | <img src="https://static.igem.org/mediawiki/2013/f/f0/T3-2.png" /> | + | <img src="https://static.igem.org/mediawiki/2013/f/f0/T3-2.png" /></p> |
</p> | </p> | ||
<h3>3-2. 02.globalREmarkup.pl</h3> | <h3>3-2. 02.globalREmarkup.pl</h3> | ||
<p>This utility will parse the exited restriction enzyme sites residing in the chromosome.</p> | <p>This utility will parse the exited restriction enzyme sites residing in the chromosome.</p> | ||
- | < | + | <p><b>Internal operation</b></p> |
<p>This utility searches the exited restriction enzyme sites along the chromosome both plus strand and minus strand, after users define the list of enzymes.<br/> | <p>This utility searches the exited restriction enzyme sites along the chromosome both plus strand and minus strand, after users define the list of enzymes.<br/> | ||
Besides, we tried to find out all the potential restriction enzyme sites, so that maybe some unusual restriction enzyme sites can be created and let segmentation go. But because it had low efficiency, we’re still working on that.<br/> | Besides, we tried to find out all the potential restriction enzyme sites, so that maybe some unusual restriction enzyme sites can be created and let segmentation go. But because it had low efficiency, we’re still working on that.<br/> | ||
Line 59: | Line 60: | ||
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE . | For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE . | ||
</p> | </p> | ||
- | < | + | <p><b>Example (command line)</b></p> |
<p>perl 02.globalREmarkup.pl -sg 01.whole2mega/sce_chrI.mega -re standard_and_IIB -ct Standard.ct –ot sce_chrI.mega.parse</p> | <p>perl 02.globalREmarkup.pl -sg 01.whole2mega/sce_chrI.mega -re standard_and_IIB -ct Standard.ct –ot sce_chrI.mega.parse</p> | ||
- | < | + | <p><b>Parameters</b></p> |
<p> | <p> | ||
- | <table><tbody> | + | <table border="1"><tbody> |
<tr><th></th><th></th><th>default</th><th>Option</th></tr> | <tr><th></th><th></th><th>default</th><th>Option</th></tr> | ||
<tr><th>sg</th><th>The fasta file of the 30k segmentation, the output of 01.wh2mega.pl</th><th></th><th></th></tr> | <tr><th>sg</th><th>The fasta file of the 30k segmentation, the output of 01.wh2mega.pl</th><th></th><th></th></tr> | ||
Line 78: | Line 79: | ||
</tbody></table> | </tbody></table> | ||
- | + | </br></br> | |
- | 1 The Standard Code | + | |
- | 2 The Vertebrate Mitochondrial Code | + | <pre> |
- | 3 The Yeast Mitochondrial Code | + | Codon table list: |
- | 4 The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code | + | 1. The Standard Code |
- | + | 2. The Vertebrate Mitochondrial Code | |
- | 6 The Ciliate, Dasycladacean and Hexamita Nuclear Code | + | 3. The Yeast Mitochondrial Code |
- | 7 The Echinoderm and Flatworm Mitochondrial Code | + | 4. The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code |
- | 8 The Euplotid Nuclear Code | + | 5. The Invertebrate Mitochondrial Code |
- | 9 The Bacterial, Archaeal and Plant Plastid Code | + | 6. The Ciliate, Dasycladacean and Hexamita Nuclear Code |
- | 10 The Alternative Yeast Nuclear Code | + | 7. The Echinoderm and Flatworm Mitochondrial Code |
- | 11 The Ascidian Mitochondrial Code | + | 8. The Euplotid Nuclear Code |
- | 12 The Alternative Flatworm Mitochondrial Code | + | 9. The Bacterial, Archaeal and Plant Plastid Code |
- | 13 Blepharisma Nuclear Code | + | 10. The Alternative Yeast Nuclear Code |
- | 14 Chlorophycean Mitochondrial Code | + | 11. The Ascidian Mitochondrial Code |
- | 15 Trematode Mitochondrial Code | + | 12. The Alternative Flatworm Mitochondrial Code |
- | 16 Scenedesmus Obliquus Mitochondrial Code | + | 13. Blepharisma Nuclear Code |
- | 17 Thraustochytrium Mitochondrial Code | + | 14. Chlorophycean Mitochondrial Code |
- | 18 Pterobranchia Mitochondrial Code | + | 15. Trematode Mitochondrial Code |
- | 19 Candidate Division SR1 and Gracilibacteria Code | + | 16. Scenedesmus Obliquus Mitochondrial Code |
- | </ | + | 17. Thraustochytrium Mitochondrial Code |
- | <p>The format of utput</p> | + | 18. Pterobranchia Mitochondrial Code |
+ | 19. Candidate Division SR1 and Gracilibacteria Code | ||
+ | </pre> | ||
+ | <p><b>The format of utput</b></p> | ||
<p>The output file is stored in /the path where you install GENOVO/Result/. 02.globalREmarkup.<br/> | <p>The output file is stored in /the path where you install GENOVO/Result/. 02.globalREmarkup.<br/> | ||
Besides, there is screen output about the process state and result.<br/> | Besides, there is screen output about the process state and result.<br/> | ||
Line 108: | Line 112: | ||
<img src="https://static.igem.org/mediawiki/2013/b/bf/T3-3.png" /> | <img src="https://static.igem.org/mediawiki/2013/b/bf/T3-3.png" /> | ||
</p> | </p> | ||
- | < | + | <p><b>3-3. 03.chunk_30k_10k_2k.pl</b></p> |
<p>This utility can produce 2k minichunks with Gibson adaptors and 10k chunks with goldengate adaptors.</p> | <p>This utility can produce 2k minichunks with Gibson adaptors and 10k chunks with goldengate adaptors.</p> | ||
<p>Internal operation</p> | <p>Internal operation</p> | ||
Line 116: | Line 120: | ||
The output file can be sent in gene synthesis company after human attention and double check.<br/> | The output file can be sent in gene synthesis company after human attention and double check.<br/> | ||
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .</p> | For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .</p> | ||
- | < | + | <p><b>Example (command line)</b></p> |
<p>perl 03.mega2chunk2mini.pl -re standard_and_IIB -sg 01.whole2mega/sce_chr01_0.mega -ps 02.globalREmarkup/sce_chr01_0.parse -ot 03.mega2chunk2mini</p> | <p>perl 03.mega2chunk2mini.pl -re standard_and_IIB -sg 01.whole2mega/sce_chr01_0.mega -ps 02.globalREmarkup/sce_chr01_0.parse -ot 03.mega2chunk2mini</p> | ||
- | < | + | <p><b>Parameters</b></p> |
<p> | <p> | ||
- | <table><tbody> | + | <table border="1"><tbody> |
<tr><th></th><th></th><th>default</th><th>Option</th></tr> | <tr><th></th><th></th><th>default</th><th>Option</th></tr> | ||
<tr><th>sg</th><th>The fasta file of the 30k segmentation, the output of 01.wh2mega.pl</th><th></th><th></th></tr> | <tr><th>sg</th><th>The fasta file of the 30k segmentation, the output of 01.wh2mega.pl</th><th></th><th></th></tr> | ||
Line 137: | Line 141: | ||
</table> | </table> | ||
If parameter a2 is Gibson, then there are additional parameters:<br/> | If parameter a2 is Gibson, then there are additional parameters:<br/> | ||
- | <table><tbody> | + | <table border="1"><tbody> |
<tr><th>ol2</th><th>The length of overlap</th><th>40 bp</th><th></th></tr> | <tr><th>ol2</th><th>The length of overlap</th><th>40 bp</th><th></th></tr> | ||
<tr><th>tmax2</th><th>The maximum melting temperature of the overlap of minichunks</th><th>60℃</th><th></th></tr> | <tr><th>tmax2</th><th>The maximum melting temperature of the overlap of minichunks</th><th>60℃</th><th></th></tr> | ||
Line 151: | Line 155: | ||
</table> | </table> | ||
If parameter a10 is Goldengate, then there are additional parameters:<br/> | If parameter a10 is Goldengate, then there are additional parameters:<br/> | ||
- | <table><tbody> | + | <table border="1"><tbody> |
<tr><th>en10</th><th>The type of enzyme flanking chunks</th><th>IIB</th><th>IIA/IIB</th></tr> | <tr><th>en10</th><th>The type of enzyme flanking chunks</th><th>IIB</th><th>IIA/IIB</th></tr> | ||
<tr><th>et10</th><th>The temperature of enzyme used in chunks digestion</th><th>37℃</th><th></th></tr> | <tr><th>et10</th><th>The temperature of enzyme used in chunks digestion</th><th>37℃</th><th></th></tr> | ||
Line 158: | Line 162: | ||
</table> | </table> | ||
</p> | </p> | ||
- | < | + | <p><b>The format of ouput</b></p> |
<p>The output file is stored in /the path where you install GENOVO/Result/. 03.mega2chunk2mini.<br/> | <p>The output file is stored in /the path where you install GENOVO/Result/. 03.mega2chunk2mini.<br/> | ||
Besides, there is screen output about the process state and result.<br/> | Besides, there is screen output about the process state and result.<br/> | ||
Line 164: | Line 168: | ||
2. *.2kstate<br/> | 2. *.2kstate<br/> | ||
Store the minichunks states.<br/> | Store the minichunks states.<br/> | ||
- | <table><tr><th>Left IIP enzyme site</th><th>Right IIP enzyme site</th><th>Start</th><th>End</th><th>Size of minichunks</th><th>Melting temperature of overlap</th></tr> | + | <table border="1"><tr><th>Left IIP enzyme site</th><th>Right IIP enzyme site</th><th>Start</th><th>End</th><th>Size of minichunks</th><th>Melting temperature of overlap</th></tr> |
</table> | </table> | ||
<img src="https://static.igem.org/mediawiki/2013/b/bf/T3-3.png" /><br/> | <img src="https://static.igem.org/mediawiki/2013/b/bf/T3-3.png" /><br/> | ||
3. *.10kstate<br/> | 3. *.10kstate<br/> | ||
Store the chunks states<br/> | Store the chunks states<br/> | ||
- | <table> | + | <table border="1"> |
- | <tr><th>Left IIB enzyme site</th><th>Right IIB enzyme site</th><th>Start</th><th>End</th> | + | <tr><th>Left IIB enzyme site</th><th>Right IIB enzyme site</th><th>Start</th><th>End</th><th>Size of chunks</th></tr> |
</table> | </table> | ||
<img src="https://static.igem.org/mediawiki/2013/a/ad/T3-4.png" /><br/> | <img src="https://static.igem.org/mediawiki/2013/a/ad/T3-4.png" /><br/> |
Latest revision as of 05:24, 28 October 2013
Tutorial
SegmMan
This module will cut chromosome into pieces with different sizes with Gibson, Goldengate, Homologous adaptors to them so that they are able to be assembled into whole experimentally.
Plugin Scripts
3-1. 01.whole2mega.pl
This utility can split the whole chromosome ( at least 90kbp long ) into about 30k segments and add homologous overlap and adaptors, so that these fragments can be integrated into whole experimentally.
Internal operation
First, this utility searches for the location of centromere and ARSs (autonomously replicating site). The minimal distance between centromere and ARS should NOT be larger than a defined megachunk which is about 30k long.
Second, this utility cuts out the first 30k sequence window containing the centromere and its adjacent ARS, and then adds this megachunk with two original markers and left, right telomeres.
Thirdly, this utility continues to cut more megachunks from the original one to both ends. But these megachunks are not independent, they all have about 1kbp overlaps. Moreover, these new splited window can be given only one marker alternately and only left or right telomere.
The output file will be dealed with 02.globalREmarkup.pl
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .
Example (command line)
perl 01.whole2mega.pl –gff sce_chrI.gff -fa sce_chr01.fa -ol 1000 -ck 30000 -m1 LEU2 -m2 URA3 -m3 HIS3 -m4 TRP1 -ot sce_chrI.mega
Parameters
default | Option | ||
---|---|---|---|
gff | The gff file of the chromosome being restriction enzyme sites parsing | ||
fa | The fasta file of the chromosome being restriction enzyme sites parsing (The length of the chromosome is larger than 90k) | ||
ol | The length of overlap between megachunks | 1000bp | |
ck | The length of megachunks | 30kbp | |
m1 | The first marker for selection alternately | LEU2 (1797bp) | LEU2/URA3HIS3/TRP1 |
m2 | The second marker for selection alternately | URA3 (1112bp) | LEU2/URA3/HIS3/TRP1 |
m3 | The first marker orinally residing in first 30k segmentation | HIS3 (1774bp) | LEU2/URA3/HIS3/TRP1 |
m4 | The second marker orinally residing in first 30k segmentation | TRP1 (1467bp) | LEU2/URA3/HIS3/TRP1 |
ot | The output file | Prefix(fa filename)+ suffix(.mega) |
The format of output:
The output file is stored in /the path where you install GENOVO/Result/ 01.whole2mega.
Besides, there is screen output about the process state and result.
1. Screen output
2. 01.state
Store the segmentation information
Megachunk_ID | Corresponding location in the designed chromosome |
---|---|
Part ID | Location in the segmentation |
3 *.mega
Store the fasta information of the 30k segments
3-2. 02.globalREmarkup.pl
This utility will parse the exited restriction enzyme sites residing in the chromosome.
Internal operation
This utility searches the exited restriction enzyme sites along the chromosome both plus strand and minus strand, after users define the list of enzymes.
Besides, we tried to find out all the potential restriction enzyme sites, so that maybe some unusual restriction enzyme sites can be created and let segmentation go. But because it had low efficiency, we’re still working on that.
The output file will be dealed with 03.mega2chunk2mini.pl
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .
Example (command line)
perl 02.globalREmarkup.pl -sg 01.whole2mega/sce_chrI.mega -re standard_and_IIB -ct Standard.ct –ot sce_chrI.mega.parse
Parameters
default | Option | ||
---|---|---|---|
sg | The fasta file of the 30k segmentation, the output of 01.wh2mega.pl | ||
ps | The markup file of the 30k segmentation, the output of 02.globalREmarkup.pl | ||
re | The restriction enzyme sites list. It is devided by different standards, type (IIP, IIA, IIB), cost (standard, nonexpensive) and etc. | Standard_and_IIB | IIP/IIA/IIB/Standard/ Nonexpensive/ Standard_IIB Nonexpensive_IIB |
a2 | 2k to 10k assembly strategy (Gibson or Goldengate) | Gibson | Gibson/ Goldengate |
a10 | 10k to 30k assembly strategy (Gibson or Goldengate) | Goldengate | Gibson/ Goldengate |
ckmax2 | The maximum length of minichunks | 2200 bp | |
ckmin2 | The minimum length of minichunks | 1800 bp | |
cknum | The number of minichunks in a chunk | 5 |
Codon table list: 1. The Standard Code 2. The Vertebrate Mitochondrial Code 3. The Yeast Mitochondrial Code 4. The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code 5. The Invertebrate Mitochondrial Code 6. The Ciliate, Dasycladacean and Hexamita Nuclear Code 7. The Echinoderm and Flatworm Mitochondrial Code 8. The Euplotid Nuclear Code 9. The Bacterial, Archaeal and Plant Plastid Code 10. The Alternative Yeast Nuclear Code 11. The Ascidian Mitochondrial Code 12. The Alternative Flatworm Mitochondrial Code 13. Blepharisma Nuclear Code 14. Chlorophycean Mitochondrial Code 15. Trematode Mitochondrial Code 16. Scenedesmus Obliquus Mitochondrial Code 17. Thraustochytrium Mitochondrial Code 18. Pterobranchia Mitochondrial Code 19. Candidate Division SR1 and Gracilibacteria Code
The format of utput
The output file is stored in /the path where you install GENOVO/Result/. 02.globalREmarkup.
Besides, there is screen output about the process state and result.
1. Screen output
2. *.parse
Store the exited enzyme recognition site in the megachunks
Enzyme ID Start End Recognition site Real site
3-3. 03.chunk_30k_10k_2k.pl
This utility can produce 2k minichunks with Gibson adaptors and 10k chunks with goldengate adaptors.
Internal operation
This utility will segmentate the megachunk produced by 03.mega2chunk2mini.pl into 2k minichunks with Gibson assembly adaptors, so that they can be put together into 10k chunks.
First, this bin will search the inexistent restriction enzyme sites locally, and then decide the size of the minichunks according to the requirements from users, and add two same Gibson adaptors to each sides of minichunks.
Secondly, the second part of this bin will define the start and end point of the chunks as users asked and design goldengate assembly adaptors for the chunks.
The output file can be sent in gene synthesis company after human attention and double check.
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .
Example (command line)
perl 03.mega2chunk2mini.pl -re standard_and_IIB -sg 01.whole2mega/sce_chr01_0.mega -ps 02.globalREmarkup/sce_chr01_0.parse -ot 03.mega2chunk2mini
Parameters
default | Option | ||
---|---|---|---|
sg | The fasta file of the 30k segmentation, the output of 01.wh2mega.pl | ||
ps | The markup file of the 30k segmentation, the output of 02.globalREmarkup.pl | ||
re | The restriction enzyme sites list. It is devided by different standards, type (IIP, IIA, IIB), cost (standard, nonexpensive) and etc. | Standard_and_IIB | IIP/IIA/IIB/Standard/ Nonexpensive/ Standard_IIB Nonexpensive_IIB |
a2 | 2k to 10k assembly strategy (Gibson or Goldengate) | Gibson | Gibson/ Goldengate |
a10 | 10k to 30k assembly strategy (Gibson or Goldengate) | Goldengate | Gibson/ Goldengate |
ckmax2 | The maximum length of minichunks | 2200 bp | |
ckmin2 | The minimum length of minichunks | 1800 bp | |
cknum | The number of minichunks in a chunk | 5 |
ol2 | The length of overlap | 40 bp | |
---|---|---|---|
tmax2 | The maximum melting temperature of the overlap of minichunks | 60℃ | |
tmin2 | The minimum melting temperature of the overlap of minichunks | 56℃ | |
fe2 | The minimum free energy of the overlap of minichunks | -3 | |
ex2 | The type of exonuclease used for minichunks | T5 | T5/T3 |
lo2 | The minimum distance between minichunks overlap and loxpsym | 40 bp | |
en2 | The type of enzyme flanking minichunks | IIP | |
et2 | |||
ep2 | The maximum unit price of enzyme used in minichunks digestion | 0.5 $/unit |
en10 | The type of enzyme flanking chunks | IIB | IIA/IIB |
---|---|---|---|
et10 | The temperature of enzyme used in chunks digestion | 37℃ |
The format of ouput
The output file is stored in /the path where you install GENOVO/Result/. 03.mega2chunk2mini.
Besides, there is screen output about the process state and result.
1. Screen output
2. *.2kstate
Store the minichunks states.
Left IIP enzyme site | Right IIP enzyme site | Start | End | Size of minichunks | Melting temperature of overlap |
---|
3. *.10kstate
Store the chunks states
Left IIB enzyme site | Right IIB enzyme site | Start | End | Size of chunks |
---|
4. *.mini
Store the fasta of designed minichunks.