Team:Shenzhen BGIC 0101/Tutorial/olsdesigner

From 2013.igem.org

(Difference between revisions)
(Created page with "{{:Team:Shenzhen_BGIC_0101/Templates/Header}} <html> <body> <b><h1>Tutorial</h1></b> <br/> <hr style="color:#7380AE; height:2px;" /> <h1>OLS Designer</h1> <h2>Description of thi...")
Line 7: Line 7:
<h1>OLS Designer</h1>
<h1>OLS Designer</h1>
-
<h2>Description of this tool :<h2>
+
<h2>Description of this tool :</h2>
<p>This tool deconstructs a given set of genes to generate the sequences of oligonucleotides for synthesis on a chip.The purpose of this script is computationally designing the DNA chip for amplifying the oligonucleotide subpools, and assembling 500- to 800-bp constructs.</p>
<p>This tool deconstructs a given set of genes to generate the sequences of oligonucleotides for synthesis on a chip.The purpose of this script is computationally designing the DNA chip for amplifying the oligonucleotide subpools, and assembling 500- to 800-bp constructs.</p>
Line 17: Line 17:
<p>Each construct to be built must be split up into short overlapping fragments. Each frag-ment must, in turn, be flanked by the assembly- and plate-specific subpool priming sequences, as well as restriction sites for removing the priming sequences.</p>
<p>Each construct to be built must be split up into short overlapping fragments. Each frag-ment must, in turn, be flanked by the assembly- and plate-specific subpool priming sequences, as well as restriction sites for removing the priming sequences.</p>
-
<br/>
 
<p style="text-align:center;"><img src="https://static.igem.org/mediawiki/2013/2/2f/Ols1.png" alt="data" style="width: 750px" /></p><<br/>
<p style="text-align:center;"><img src="https://static.igem.org/mediawiki/2013/2/2f/Ols1.png" alt="data" style="width: 750px" /></p><<br/>
-
<br/>
 
<p>The gene-coding regions of the oligonucleotides within each assembly subpool partially overlap, allowing them to be assembled into the full-length construct using a high-fidelity polymerase. The gene-coding region is flanked by BtsI cut sites that permit enzymatic removal of the subpool-specific priming sites. The gene-coding region is also flanked by a pair of assembly-specific priming sites, which are shared by all the oligonucleotides within a particular assembly subpool. The assembly-specific priming sites are, in turn, flanked by a pair of plate-specific priming sites common to all the oligonucleotides within a particular plate-specific subpool. So there have two module in the script: one is ols pool generation script. </p>
<p>The gene-coding regions of the oligonucleotides within each assembly subpool partially overlap, allowing them to be assembled into the full-length construct using a high-fidelity polymerase. The gene-coding region is flanked by BtsI cut sites that permit enzymatic removal of the subpool-specific priming sites. The gene-coding region is also flanked by a pair of assembly-specific priming sites, which are shared by all the oligonucleotides within a particular assembly subpool. The assembly-specific priming sites are, in turn, flanked by a pair of plate-specific priming sites common to all the oligonucleotides within a particular plate-specific subpool. So there have two module in the script: one is ols pool generation script. </p>
<br/>
<br/>
Line 31: Line 29:
<p>There is one configuration entry per file. Make sure that you set:<br/>
<p>There is one configuration entry per file. Make sure that you set:<br/>
initialPlateNum(current entry) = initialPlateNum(previous entry) + (number of plates used up by sequences in previous entry)</p>
initialPlateNum(current entry) = initialPlateNum(previous entry) + (number of plates used up by sequences in previous entry)</p>
-
<br/>
 
<h3>c) cd to script directory in terminal and type “python gasp.py <configuration file >”</h3>
<h3>c) cd to script directory in terminal and type “python gasp.py <configuration file >”</h3>
Line 52: Line 49:
<br/>
<br/>
-
<h3>Example:<h3>
+
<h3>Example:</h3>
<p>Usage : perl Get_configFile.pl <option> <output file name><br/>
<p>Usage : perl Get_configFile.pl <option> <output file name><br/>
Perl Get_configFile.pl  -a 1 -b True -c 20 -d 2 -f 11 -i 4 -l input-seqs/yeast_chr1_3_16.all_bb.fasta -n 4 -u 100 -o test<br/>
Perl Get_configFile.pl  -a 1 -b True -c 20 -d 2 -f 11 -i 4 -l input-seqs/yeast_chr1_3_16.all_bb.fasta -n 4 -u 100 -o test<br/>
Line 60: Line 57:
<table border="1">
<table border="1">
<tr> <th></th> <th></th> <th>default</th></tr>
<tr> <th></th> <th></th> <th>default</th></tr>
-
<tr><th>a <th>plate position index of primer set when using fixed primer set</th> <th>0</th></tr>
+
<tr><th> a <th>plate position index of primer set when using fixed primer set</th> <th>0</th></tr>
-
<tr><th>b <th>true if all seqs in file get same primer set</th> <th>false</th></tr>
+
<tr><th> b <th>true if all seqs in file get same primer set</th> <th>false</th></tr>
-
<tr><th>c <th>Leeway in junction position that is allowed in searching for acceptable overlaps</th> <th>10</tr>
+
<tr><th> c <th>Leeway in junction position that is allowed in searching for acceptable overlaps</th> <th>10</tr>
-
<tr><th>d <th>plate # of primer set when using fixed primer set</th> <th>1</th></tr>
+
<tr><th> d <th>plate # of primer set when using fixed primer set</th> <th>1</th></tr>
-
<tr><th>f <th>length of oligo can be + or - this #</th> <th>10</th></tr>
+
<tr><th> f <th>length of oligo can be + or - this #</th> <th>10</th></tr>
-
<tr><th>h <th>Print help information</th> <th></th> </tr>
+
<tr><th> h <th>Print help information</th> <th></th> </tr>
-
<tr><th>i <th>96-well plates holding assembled constructs are numbered starting with this number (never set to 1)</th> <th>3</th></tr>
+
<tr><th> i <th>96-well plates holding assembled constructs are numbered starting with this number (never set to 1)</th> <th>3</th></tr>
-
<tr><th>l <th>the location of a fasta file containing seqs for the desired constructs</th> <th></th> </tr>
+
<tr><th> l <th>the location of a fasta file containing seqs for the desired constructs</th> <th></th> </tr>
-
<tr><th>n <th>Reject a decomposition if it contains a self-dimer with free energy in kcal/mol below this number</th> <th>3</th></tr>
+
<tr><th> n <th>Reject a decomposition if it contains a self-dimer with free energy in kcal/mol below this number</th> <th>3</th></tr>
-
<tr><th>u <th>oligoSizeMax</th> <th>200</th></tr>
+
<tr><th> u <th>oligoSizeMax</th> <th>200</th></tr>
-
<tr><th>o <th>The name of project</th> <th></th> </tr>
+
<tr><th> o <th>The name of project</th> <th></th> </tr>
</table>
</table>
Line 117: Line 114:
<p><pre>
<p><pre>
{
{
-
     "initialPlateNum": 4, # 96-well plates holding assembled constructs are numbered starting with this number  
+
     "initialPlateNum":  
-
     "buildSequencesFile": "input-seqs/yeast_chr1_3_16.all_bb.fasta", # the location of a fasta file containing seqs for the desired constructs
+
4, # 96-well plates holding assembled constructs are numbered starting with this number  
-
     "primerOutputFile": "output-files/primer-output.txt", # the location of a txt file which will contain primer sequence outputs
+
     "buildSequencesFile":  
-
     "oligoOutputFile": "output-files/oligo-output.fasta", # the location of a fasta file which will contain oligo sequence outputs for the OLS pool
+
"input-seqs/yeast_chr1_3_16.all_bb.fasta", # the location of a fasta file containing seqs for the desired constructs
-
     "RESpacing": [ # list of offsets of enzyme cut sites from the end of the corresponding enzyme recognition sites, with enzymes ordered as in REVector
+
     "primerOutputFile":  
-
        5,  
+
"output-files/primer-output.txt", # the location of a txt file which will contain primer sequence outputs
-
        2,  
+
     "oligoOutputFile":  
-
        5,  
+
"output-files/oligo-output.fasta", # the location of a fasta file which will contain oligo sequence outputs for the OLS pool
-
        4
+
     "RESpacing":  
-
    ],  
+
[ # list of offsets of enzyme cut sites from the end of the corresponding enzyme recognition sites, with enzymes ordered as in REVector
-
     "REVector": [ # list of restriction enzymes that gasp will search through if SearchForRe is set to "True"
+
5,  
-
        "BsaI",  
+
2,  
-
        "BtsI",  
+
5,  
-
        "BsmBI",  
+
4
-
        "BspQI"
+
],  
-
    ],  
+
     "REVector":  
-
     "SearchForRE": "True", # set this to "False" if you wish to control the exact restriction site which is used, specified by REToUse. Examples: usually set to "False" for DNA origami             scaffold sequences, usually set to "True" for synthetic genes for in-vivo use.
+
[ # list of restriction enzymes that gasp will search through if SearchForRe is set to "True"
-
     "REToUse": "", # Restriction enzyme to use if SearchForRE is set to "False", e.g., "BsaI". Leave blank if SearchForRE is set to "True". Usually set to "BsaI" for Shih lab DNA
+
"BsaI",  
-
                    origami scaffold sequences.
+
"BtsI",  
-
     "forwardPrimersLibraryFile": "primer-library/forward_finalprimers.fasta", # Fixed library of ~ 3000 orthogonal forward primers
+
"BsmBI",  
-
     "reversePrimersLibraryFile": "primer-library/reverse_finalprimers.fasta", # Fixed library of ~ 3000 orthogonal reverse primers
+
"BspQI"
-
     "avgoverlapsize": 20, # Average length of overlap region between adjacent oligos
+
],  
-
     "deltaGThresholdForOverlaps": -3, # Overlap is rejected if its hybridization free energy in kcal/mol is below this number
+
     "SearchForRE":  
-
     "selfDimersThreshold": 3, # Reject a decomposition if it contains a self-dimer with free energy in kcal/mol below this number
+
"True", # set this to "False" if you wish to control the exact restriction site which is used, specified by REToUse.
-
     "insertionSizeToKillRESite": 2, # Leave this set to 2 for now
+
        Examples: usually set to "False" for DNA origami scaffold sequences, usually set to "True" for synthetic genes for in-vivo use.
-
     "lengthleeway": 10, # Leeway in oligo length that is allowed in searching for acceptable overlaps
+
     "REToUse":  
-
     "overlaptemps": [ # Overlap regions must have a melting temperature in this range
+
"", # Restriction enzyme to use if SearchForRE is set to "False", e.g., "BsaI". Leave blank if SearchForRE is set to "True".  
-
        55,  
+
Usually set to "BsaI" for Shih lab DNA origami scaffold sequences.
-
        65
+
     "forwardPrimersLibraryFile":  
-
    ],  
+
"primer-library/forward_finalprimers.fasta", # Fixed library of ~ 3000 orthogonal forward primers
-
     "positionleeway": 10 # Leeway in junction position that is allowed in searching for acceptable overlaps
+
     "reversePrimersLibraryFile":  
 +
"primer-library/reverse_finalprimers.fasta", # Fixed library of ~ 3000 orthogonal reverse primers
 +
     "avgoverlapsize":  
 +
20, # Average length of overlap region between adjacent oligos
 +
     "deltaGThresholdForOverlaps":  
 +
-3, # Overlap is rejected if its hybridization free energy in kcal/mol is below this number
 +
     "selfDimersThreshold":  
 +
3, # Reject a decomposition if it contains a self-dimer with free energy in kcal/mol below this number
 +
     "insertionSizeToKillRESite":  
 +
2, # Leave this set to 2 for now
 +
     "lengthleeway":  
 +
10, # Leeway in oligo length that is allowed in searching for acceptable overlaps
 +
     "overlaptemps":  
 +
[ # Overlap regions must have a melting temperature in this range
 +
55,  
 +
65
 +
],  
 +
     "positionleeway":  
 +
10 # Leeway in junction position that is allowed in searching for acceptable overlaps
}
}
</pre></p>
</pre></p>

Revision as of 06:26, 28 October 2013






Tutorial



OLS Designer

Description of this tool :

This tool deconstructs a given set of genes to generate the sequences of oligonucleotides for synthesis on a chip.The purpose of this script is computationally designing the DNA chip for amplifying the oligonucleotide subpools, and assembling 500- to 800-bp constructs.


The major stages of synthesis pipeline are computational design, chip synthesis, serial PCRs that isolate the oligonucleotides necessary to build each construct, and assembly of the constructs. The key principle is that well-designed primers can amplify a desired subset of oligonucleotides and, thereby, dilute the undesired DNA to the point where it does not interfere with the downstream gene assembly reaction ( the major of scripts been developed by Nikolai Eroshenko et al..;2009 who form Harvard School of Engineering and Applied Sciences, Cambridge, Massachusetts ).We have automated these design steps with Biopython scripts (Cock et al., 2009)


Internal operation:

Each construct to be built must be split up into short overlapping fragments. Each frag-ment must, in turn, be flanked by the assembly- and plate-specific subpool priming sequences, as well as restriction sites for removing the priming sequences.

data

<

The gene-coding regions of the oligonucleotides within each assembly subpool partially overlap, allowing them to be assembled into the full-length construct using a high-fidelity polymerase. The gene-coding region is flanked by BtsI cut sites that permit enzymatic removal of the subpool-specific priming sites. The gene-coding region is also flanked by a pair of assembly-specific priming sites, which are shared by all the oligonucleotides within a particular assembly subpool. The assembly-specific priming sites are, in turn, flanked by a pair of plate-specific priming sites common to all the oligonucleotides within a particular plate-specific subpool. So there have two module in the script: one is ols pool generation script.


Another is primer-design-script. You can see basic information in publication and in the supplemental materials: Scalable gene synthesis by selective amplification of DNA pools from high-fidelity microchips Sriram Kosuri, Nikolai Eroshenko, Emily LeProust, Michael Super, Jeffrey Way, Jin Billy Li & George Church Nature Biotechnology (2010) 28:1295 doi:10.1038/nbt.1716 News & Views, Cover Art


To run the script separately :

a) Place input files in the input-seqs directory

b) Edit the configuration file plate-based-assembly-from-ols-pool-config

There is one configuration entry per file. Make sure that you set:
initialPlateNum(current entry) = initialPlateNum(previous entry) + (number of plates used up by sequences in previous entry)

c) cd to script directory in terminal and type “python gasp.py

d) pay attention to outputs in the form of:

oligo files (oligo sequences generated corresponding to each input file),
primer files (primer list corresponding to each input file),
report files (build seq by build seq delineation of primer sets to use, for each input file)



Workflow for gene synthesis from high-fidelity DNA microchips :

data

<

Shown here are the major steps and approximate timings of the entire gene(1G) synthesis process. The branch point reflects the choice of whether USER/DpnII processing (left branch after oligo synthesis) or type IIS enzymatic processing (right branch) are used for removing the amplification sites. The process outlines the final optimized form of the optimized protocols. The times given in parentheses are estimates that account for the time involved in both setting up and running the reactions.


Script: Get_configFile.pl

The function of this script is help user create configuration file which get Parameters form UI and then run auto.


Example:

Usage : perl Get_configFile.pl


Parameters:

default
a plate position index of primer set when using fixed primer set 0
b true if all seqs in file get same primer set false
c Leeway in junction position that is allowed in searching for acceptable overlaps 10
d plate # of primer set when using fixed primer set 1
f length of oligo can be + or - this # 10
h Print help information
i 96-well plates holding assembled constructs are numbered starting with this number (never set to 1) 3
l the location of a fasta file containing seqs for the desired constructs
n Reject a decomposition if it contains a self-dimer with free energy in kcal/mol below this number 3
u oligoSizeMax 200
o The name of project

UI design

data

<

In our project, we want to help user design a new genomic according to the three modules we private and then use this script to help us to design the oligonucleotide and the priming sequence. So after the users have designed the genomic what they need, the next step is choose this script in our software’s UI.


Dependencies

UnaFold and BioPython are required to run the Python scripts in this package.


Please read followings for installation before run script !!!

Prerequisites: UNAFoldand Biopython.


1、UnaFold
http://dinamelt.rit.albany.edu/download.php
2、Biopython:
http://biopython.org/DIST/docs/install/Installation.html


Help page:

The description of software:

GASP:Gene Assembly by Subpool PCR This set of scripts designs oligonucleotides that can be used to synthesize genes from high-complexity DNA pools.


Parameter description

The parameters, which are described in detail below, may have to be further adjusted if the DNA will be processed using methods that deviate from the workflow described here.
InitialPlaneNum: 96-well plates of assemblies will be numbered sequentially initiating at this value. This should never be set to 1, as plate #1 is reserved for construction primers.

avgoverlapsize: Each construct will be broken up into assembly oligos that will be fused using a polymerase. The fusion reaction requires priming through overlaps between neighboring oligos. This setting specifies the mean length of the overlap region.

deltaGThresholdForOverlaps: Rejects any overlaps with a secondary structure that has a hybridization free energy less than the value specified (in units of kcal/mol).

selfDimerThreshold: Rejects assembly oligos that have any self-dimerization configura-tions with a hybridization free energy less than the value specified (arbitrary units).

lengthleeway: Sets allowable variation in the length of the overlap regions.

positionleeway: Sets allowable variation in the assembly oligo junction position. Increas-ing this value results in a less constrained search space, but increases the computation time and increases variation in synthesized oligonucleotides’ lengths.

oligoSizeMax: The maximum oligo size that will be designed. This includes the full-length oligos that include the coding region, the restriction enzyme processing site, and the assembly-specific and plate-specific priming sites. This value should typically be constrained by the commercial synthesis platform used. Note that many of the oligos will be shorter than this maximum value.

seqsToAvoidInOverlapRegion: Specifies positions to be avoided in the overlap between neighboring assembly oligos. This should usually be left blank, but can be used in specialized applications, such as constructing proteins with known repeated regions.


Example :

EXPLANATION OF CONFIG FILE:

{
    "initialPlateNum": 
		4, # 96-well plates holding assembled constructs are numbered starting with this number 
    "buildSequencesFile": 
		"input-seqs/yeast_chr1_3_16.all_bb.fasta", # the location of a fasta file containing seqs for the desired constructs
    "primerOutputFile": 
		"output-files/primer-output.txt", # the location of a txt file which will contain primer sequence outputs
    "oligoOutputFile": 
		"output-files/oligo-output.fasta", # the location of a fasta file which will contain oligo sequence outputs for the OLS pool
    "RESpacing": 
		[ # list of offsets of enzyme cut sites from the end of the corresponding enzyme recognition sites, with enzymes ordered as in REVector
			5, 
			2, 
			5, 
			4
		], 
    "REVector": 
		[ # list of restriction enzymes that gasp will search through if SearchForRe is set to "True"
			"BsaI", 
			"BtsI", 
			"BsmBI", 
			"BspQI"
		], 
    "SearchForRE": 
		"True", # set this to "False" if you wish to control the exact restriction site which is used, specified by REToUse.
        Examples: usually set to "False" for DNA origami scaffold sequences, usually set to "True" for synthetic genes for in-vivo use.
    "REToUse": 
		"", # Restriction enzyme to use if SearchForRE is set to "False", e.g., "BsaI". Leave blank if SearchForRE is set to "True". 
		Usually set to "BsaI" for Shih lab DNA origami scaffold sequences.
    "forwardPrimersLibraryFile": 
		"primer-library/forward_finalprimers.fasta", # Fixed library of ~ 3000 orthogonal forward primers
    "reversePrimersLibraryFile": 
		"primer-library/reverse_finalprimers.fasta", # Fixed library of ~ 3000 orthogonal reverse primers
    "avgoverlapsize": 
		20, # Average length of overlap region between adjacent oligos
    "deltaGThresholdForOverlaps": 
		-3, # Overlap is rejected if its hybridization free energy in kcal/mol is below this number
    "selfDimersThreshold": 
		3, # Reject a decomposition if it contains a self-dimer with free energy in kcal/mol below this number
    "insertionSizeToKillRESite": 
		2, # Leave this set to 2 for now
    "lengthleeway": 
		10, # Leeway in oligo length that is allowed in searching for acceptable overlaps
    "overlaptemps": 
		[ # Overlap regions must have a melting temperature in this range
			55, 
			65
		], 
    "positionleeway": 
		10 # Leeway in junction position that is allowed in searching for acceptable overlaps
}

Result:

The first one will contain a report that contains: (1) The sequences to be synthesized on the DNA chip in FASTA format; (2) The plate-specific, position-specific, and construction primers needed to build the set of assemblies; (3) The plate-specific, position-specific, and construction primers that correlate with each individual assembly. The second e-mail will contain a FASTA file that contains the sequences that should be synthesized on the DNA chip.

Appendix:

data

<

Figure 1. Shown here is the format of input file in this program

data

<

Figure 2.SHORTENED_yeast_chr1_3_16all_bb-oligo-output.fasta

data

<

Figure 3.SHORTENED_yeast_chr1_3_16all_bb-primer-output.txt