<p><b>Figure 3.</b>This alignment shows one of the 19bp target sequences we picked in the EHEC Shiga toxin II gene. Notice how it starts with a T and is pyrimidine rich. This is the target sequence for the TALE <a href="http://parts.igem.org/wiki/index.php?title=Part:BBa_K1189033" >BBa_K1189033</a></p>
For the DNA detector in our system, we decided to use Transcription Activator-Like Effectors (TALEs). These are naturally occurring proteins that specifically bind to DNA. TALEs are an advantageous tool in synthetic biology because they can be modified to recognize a any chosen DNA sequence as long as it starts with a thymine. In our case, we want them to bind to a sequence in enterohemorrhagic E. coli (EHEC). We have engineered TALEs that are specially designed to bind segments of the stx2 gene in the EHEC genome (BBa_K1189032 and BBa_K1189033). To find out more about how we designed our TALEs, click here. This summer we also worked with TALEs from the iGEM Parts Registry to build and test our constructs, for our proof of concept. To see what we accomplished with our TALEs, check out our results section. To learn more about TALE proteins, see our background section.
Background
Transcription Activator-Like Effectors (TALEs) are proteins produced by bacteria of the genus Xanthomonas and secreted into plant cells. These naturally occurring TALEs play a key role in bacterial infection, as they are responsible for upregulation of the host genes required for pathogenic growth and expansion (Mussolino & Cathomen, 2012).
All TALEs are made up of three main parts: N-terminus, DNA binding domain, and C-terminus. The N-terminus contains a type III secretion signal (T3SS), which allows the proteins to be translocated from the bacterium and into the plant cell. The C-terminus contains nuclear localization signals (NLS) and an acidic activation domain (AAD). The DNA binding domain, also termed repeat region, mediates DNA recognition through tandem repeats of 33 to 35 amino acids residues each (Bogdanove et al., 2010). The binding domain usually comprises 15.5 to 19.5 single repeats (Figure 1). The last repeat, close to the C-terminus, is called “half-repeat,” because it generally is 20 amino acids in length. Although the modules have conserved sequences, polymorphisms are found in residues 12 and 13, also known as “repeat-variable di-residues” (RVD). RVDs can be specific for one nucleotide or a number of nucleotides; therefore, 19.5 repeat units target a specific 20-nucleotide sequence in the DNA (Mussolino & Cathomen, 2012).
When in contact with the DNA, the TALE aligns the N-terminal to the 5' end of the DNA and the C-terminal to the 3’ end of the DNA. Each repeat is made of two alpha helices connected by a three-residue loop, two of which amino acids comprise an RVD. Although both amino acids 12 and 13 are responsible for base specificity, the TALE-DNA interaction happens through intermolecular bonds between residue 13 and the target base in the major groove of the DNA. Residue 12 plays a role in stabilizing the RVD loop (Meckler et al., 2013).
Over 20 different RVDs have been identified in TAL effectors. However, four of them appear in 75% of the repeats: HD, NG, NI and NN (Bogdanove et al., 2010). Quantitative analysis of DNA-TALE interactions by Meckler et al. (2013) revealed that the binding affinity is affected by the RVDs in the following order (The letters in brackets show the base that the RVD binds to): NG (T) > HD (C) ~ NN (G)>> NI (A) > NK (G). NG, specific to thymine, and HD, specific to cytosine, are strong RVDs. NN binds both guanine and adenine, but it prefers guanine. NK also interacts with guanine, but with 103-fold lower affinity. NI is specific for adenine, but it has low affinity when compared to strong RVDs such as NG and HD (Meckler et al., 2013). Although less common, another naturally occurring RVD, NH, was described to bind strongly to guanine (Cong et al., 2013). NS binds to any of the four bases and it is present in naturally occurring TALEs such as AvrBs3 from Xanthomonas campestris (Boch et al., 2009).
In addition to RVDs, the DNA binding affinity is also subject to polarity effects. Point mutations at the 5’ end of the target sequence affect TALE-DNA recognition more than the ones at the 3’ end (Meckler et al., 2013). Taking this in consideration, recommendations for TALE design include incorporation of strong RVDs close to the N-terminus (Streubel et al., 2012).
Due to the modular and easy to engineer nature of TALEs they can be designed to bind virtually any DNA sequence, making them a powerful tool in synthetic biology. They have been extensively used for genome editing in the literature (Mussolino et al, 2011) by fusing DNA cleavage domains of endonucleases to serve as custom restriction enzymes (Beurdeley et al., 2013). These engineered proteins are termed TALENs or Transcription Activator-Like Effector Nucleases. TALENs are also used in gene knockout because they promote gene disruption (Bogdanove & Voytas, 2011). Slovenia 2012 iGEM team designed and created repressor TAL effectors by adding repressor and activator domains to TALEs to control gene expression.
Our team, however, proposes an innovative application for TAL effectors: Using TALE as a nucleotide biosensor. We engineered TALEs to detect entero-haemorrhagic bacteria in feces of BBa_K782004 in cattle populations. As sensors, TALEs can bind to specific regions of the Shiga toxin II gene (stx2) and capture the DNA of interest from a feces sample, making it available for a second TALE, whose binding domain is specific for another region of stx2. This second TALE is connected to a reporter, which makes the TALE-DNA interaction visible within a short period of time.
Engineered TALEs
Figure 2.This alignment shows one of the 19bp target sequences we picked in the EHEC Shiga toxin II gene. Notice how it starts with a T and is pyrimidine rich. This is the target sequence for the TALE BBa_K1189033
To design the TALEs that bind to EHEC we had to take a few things into consideration. Firstly, we had to carefully identify regions that would both be optimal for TALE binding and be specific to EHEC species. After thorough literature search we determined that Shiga Toxin, stx2 is a gene found in all EHEC organisms. Additionally, we insured that the TALE binding sequences were not found in organisms surrounding the cattle such as grass consumed by the cattle, the cow’s DNA, or another commensal microorganism in the cow gut. The stx2 gene produces shiga toxin that causes illness in humans. Shiga toxin is an AB5 toxin with 5 B subunits that bind to the globotriaosylceramide (Gb3) and then the A subunit is internalized by the host cell and cleaves ribosomal RNA. Cattle do not have the Gb3 as a surface marker and therefore carry EHEC asymptomatically. Since TALEs do not have 100% specificity we designed a two TALE system to increase the specificity and reduce the false negatives. We designed two TALEs that bind to two different regions on the stx2 gene.
In addition to maximizing specificity, we also tried to design both TALEs to have the highest possible binding affinity. Since our final system will be a strip that will be dipped into several solutions we require the TALEs to bind strongly to the DNA to keep it in place on our strip. Thus, if our TALEs do not bind strongly to the DNA, the system would produce false negatives. To increase the binding affinity of the TALEs, we picked the pyrimidine-rich regions of the stx2 gene. The relative binding affinity of different RVDs to their respective nucleotides is as follows: NG (1)> NN (0.18) ~ HD (0.16) >> NI (0.0016)> NK (0.00016) where they bind to T, G, C, A, and G, respectively (Meckler et al. 2013). Therefore, picking regions rich in thymines and cytosines would greatly enhance the binding affinity of the TALE to the DNA segment.
Another important factor in the selection of the two TALE target sequences was the distance between them. The distance had to be great enough so that the TALE and the FerriTALE could bind without steric hindrance but short enough so that the possibility of these two target sequences shearing and separating is reduced. In order for our system to work, the two regions of DNA that the TALEs bind to must remain contiguous, i.e., on the same stretch of DNA. Therefore, we picked target sequences as close to each other as possible to decrease the chances of a cut between the two target sites. We determined the ideal distance between the two binding sites were 200 bp.
The TALEs were designed to be extremely specific. Based on the considerations explained above, a number of possible target sites were selected. To determine the most specific pair, we conducted BLAST searches on each candidate TALE target sequence separately. As expected, we observed a huge number of alignments with EHEC strains. We also found some partial alignments in non-EHEC organisms, which were screened to determine if they can be found in the environment of the cattle. For example, consider the two selected target sites to be [1] and [2]. If [1] had a 90% alignment to a region in the human genome, we checked to see if [2] could also be found in the human genome. If it was found, then we checked to see whether both [1] and [2] were found on the same chromosome; if they were found on different chromosomes, meaning they were not physically attached, the system would not detect it as a false positive. If they were found on the same chromosome, we determined which end of the target sequences aligned with the similar DNA sequence. This is important as TALEs are polar and bind significantly stronger to the 5’ end of their target sequence compared to the 3’ end (Meckler et al. 2013). Therefore, if the alignment included the 0th to 10th nucleotide of the target sequence, that specific two target site combination could be ruled out. To get a better idea of how we designed the TALEs, please refer to Figure 2 below.
Figure 3 below shows one of the target sequences we picked and how it is common to many EHEC organisms.