Team:Shenzhen BGIC 0101/Tutorial/segmman

From 2013.igem.org






Tutorial



SegmMan

This module will cut chromosome into pieces with different sizes with Gibson, Goldengate, Homologous adaptors to them so that they are able to be assembled into whole experimentally.

Plugin Scripts


3-1. 01.whole2mega.pl

This utility can split the whole chromosome ( at least 90kbp long ) into about 30k segments and add homologous overlap and adaptors, so that these fragments can be integrated into whole experimentally.

Internal operation

First, this utility searches for the location of centromere and ARSs (autonomously replicating site). The minimal distance between centromere and ARS should NOT be larger than a defined megachunk which is about 30k long.
Second, this utility cuts out the first 30k sequence window containing the centromere and its adjacent ARS, and then adds this megachunk with two original markers and left, right telomeres.
Thirdly, this utility continues to cut more megachunks from the original one to both ends. But these megachunks are not independent, they all have about 1kbp overlaps. Moreover, these new splited window can be given only one marker alternately and only left or right telomere.
The output file will be dealed with 02.globalREmarkup.pl
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .

Example (command line)

perl 01.whole2mega.pl –gff sce_chrI.gff -fa sce_chr01.fa -ol 1000 -ck 30000 -m1 LEU2 -m2 URA3 -m3 HIS3 -m4 TRP1 -ot sce_chrI.mega

Parameters

defaultOption
gffThe gff file of the chromosome being restriction enzyme sites parsing
faThe fasta file of the chromosome being restriction enzyme sites parsing (The length of the chromosome is larger than 90k)
olThe length of overlap between megachunks1000bp
ckThe length of megachunks30kbp
m1The first marker for selection alternatelyLEU2 (1797bp)LEU2/URA3HIS3/TRP1
m2The second marker for selection alternately URA3 (1112bp)LEU2/URA3/HIS3/TRP1
m3The first marker orinally residing in first 30k segmentationHIS3 (1774bp)LEU2/URA3/HIS3/TRP1
m4The second marker orinally residing in first 30k segmentationTRP1 (1467bp)LEU2/URA3/HIS3/TRP1
otThe output file Prefix(fa filename)+ suffix(.mega)

The format of output:

The output file is stored in /the path where you install GENOVO/Result/ 01.whole2mega.
Besides, there is screen output about the process state and result.
1. Screen output
2. 01.state
 Store the segmentation information

Megachunk_IDCorresponding location in the designed chromosome
Part IDLocation in the segmentation



3 *.mega
 Store the fasta information of the 30k segments

3-2. 02.globalREmarkup.pl

This utility will parse the exited restriction enzyme sites residing in the chromosome.

Internal operation

This utility searches the exited restriction enzyme sites along the chromosome both plus strand and minus strand, after users define the list of enzymes.
Besides, we tried to find out all the potential restriction enzyme sites, so that maybe some unusual restriction enzyme sites can be created and let segmentation go. But because it had low efficiency, we’re still working on that.
The output file will be dealed with 03.mega2chunk2mini.pl
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .

Example (command line)

perl 02.globalREmarkup.pl -sg 01.whole2mega/sce_chrI.mega -re standard_and_IIB -ct Standard.ct –ot sce_chrI.mega.parse

Parameters

defaultOption
sgThe fasta file of the 30k segmentation, the output of 01.wh2mega.pl
psThe markup file of the 30k segmentation, the output of 02.globalREmarkup.pl
reThe restriction enzyme sites list. It is devided by different standards, type (IIP, IIA, IIB), cost (standard, nonexpensive) and etc.Standard_and_IIBIIP/IIA/IIB/Standard/ Nonexpensive/ Standard_IIB Nonexpensive_IIB
a22k to 10k assembly strategy (Gibson or Goldengate)GibsonGibson/ Goldengate
a1010k to 30k assembly strategy (Gibson or Goldengate)GoldengateGibson/ Goldengate
ckmax2The maximum length of minichunks2200 bp
ckmin2The minimum length of minichunks 1800 bp
cknumThe number of minichunks in a chunk5


Codon table list:
1. The Standard Code
2. The Vertebrate Mitochondrial Code
3. The Yeast Mitochondrial Code
4. The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code
5. The Invertebrate Mitochondrial Code
6. The Ciliate, Dasycladacean and Hexamita Nuclear Code
7. The Echinoderm and Flatworm Mitochondrial Code
8. The Euplotid Nuclear Code
9. The Bacterial, Archaeal and Plant Plastid Code
10. The Alternative Yeast Nuclear Code
11. The Ascidian Mitochondrial Code
12. The Alternative Flatworm Mitochondrial Code
13. Blepharisma Nuclear Code
14. Chlorophycean Mitochondrial Code
15. Trematode Mitochondrial Code
16. Scenedesmus Obliquus Mitochondrial Code
17. Thraustochytrium Mitochondrial Code
18. Pterobranchia Mitochondrial Code
19. Candidate Division SR1 and Gracilibacteria Code

The format of utput

The output file is stored in /the path where you install GENOVO/Result/. 02.globalREmarkup.
Besides, there is screen output about the process state and result.
1. Screen output
2. *.parse
Store the exited enzyme recognition site in the megachunks
Enzyme ID Start End Recognition site Real site

3-3. 03.chunk_30k_10k_2k.pl

This utility can produce 2k minichunks with Gibson adaptors and 10k chunks with goldengate adaptors.

Internal operation

This utility will segmentate the megachunk produced by 03.mega2chunk2mini.pl into 2k minichunks with Gibson assembly adaptors, so that they can be put together into 10k chunks.
First, this bin will search the inexistent restriction enzyme sites locally, and then decide the size of the minichunks according to the requirements from users, and add two same Gibson adaptors to each sides of minichunks. Secondly, the second part of this bin will define the start and end point of the chunks as users asked and design goldengate assembly adaptors for the chunks.
The output file can be sent in gene synthesis company after human attention and double check.
For more information about segmentation design, please refer to the page ASSEMBLY DESIGN PRINCIPLE .

Example (command line)

perl 03.mega2chunk2mini.pl -re standard_and_IIB -sg 01.whole2mega/sce_chr01_0.mega -ps 02.globalREmarkup/sce_chr01_0.parse -ot 03.mega2chunk2mini

Parameters

defaultOption
sgThe fasta file of the 30k segmentation, the output of 01.wh2mega.pl
psThe markup file of the 30k segmentation, the output of 02.globalREmarkup.pl
reThe restriction enzyme sites list. It is devided by different standards, type (IIP, IIA, IIB), cost (standard, nonexpensive) and etc.Standard_and_IIBIIP/IIA/IIB/Standard/ Nonexpensive/ Standard_IIB Nonexpensive_IIB
a22k to 10k assembly strategy (Gibson or Goldengate)GibsonGibson/ Goldengate
a1010k to 30k assembly strategy (Gibson or Goldengate)GoldengateGibson/ Goldengate
ckmax2The maximum length of minichunks2200 bp
ckmin2The minimum length of minichunks 1800 bp
cknumThe number of minichunks in a chunk5
If parameter a2 is Gibson, then there are additional parameters:
ol2The length of overlap40 bp
tmax2The maximum melting temperature of the overlap of minichunks60℃
tmin2The minimum melting temperature of the overlap of minichunks56℃
fe2The minimum free energy of the overlap of minichunks-3
ex2The type of exonuclease used for minichunksT5T5/T3
lo2The minimum distance between minichunks overlap and loxpsym40 bp
en2The type of enzyme flanking minichunksIIP
et2
ep2The maximum unit price of enzyme used in minichunks digestion0.5 $/unit
If parameter a10 is Goldengate, then there are additional parameters:
en10The type of enzyme flanking chunksIIBIIA/IIB
et10The temperature of enzyme used in chunks digestion37℃

The format of ouput

The output file is stored in /the path where you install GENOVO/Result/. 03.mega2chunk2mini.
Besides, there is screen output about the process state and result.
1. Screen output
2. *.2kstate
Store the minichunks states.

Left IIP enzyme siteRight IIP enzyme siteStartEndSize of minichunksMelting temperature of overlap

3. *.10kstate
Store the chunks states
Left IIB enzyme siteRight IIB enzyme siteStartEndSize of chunks

4. *.mini
Store the fasta of designed minichunks.