From 2013.igem.org

Revision as of 03:01, 22 October 2013 by Azzucoloto (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Detector

For the DNA detector in our system, we decided to use Transcription Activator-Like Effectors (TALEs). These are naturally occurring proteins that specifically bind to DNA. TALEs are an advantageous tool in synthetic biology because they can be modified to recognize a any chosen DNA sequence as long as it starts with a thymine. In our case, we want them to bind to a sequence in enterohemorrhagic E. coli (EHEC). We have engineered TALEs that are specially designed to bind segments of the stx2 gene in the EHEC genome (BBa_K1189032 and BBa_K1189033). To find out more about how we designed our TALEs, click here. This summer we also worked with TALEs from the iGEM Parts Registry to build and test our constructs, for our proof of concept. To see what we accomplished with our TALEs, check out our results section. To learn more about TALE proteins, see our background section.

Background

Transcription Activator-Like Effectors (TALEs) are proteins produced by bacteria of the genus Xanthomonas and secreted into plant cells. These naturally occurring TALEs play a key role in bacterial infection, as they are responsible for upregulation of the host genes required for pathogenic growth and expansion (Mussolino & Cathomen, 2012).

All TALEs are made up of three main parts: N-terminus, DNA binding domain, and C-terminus. The N-terminus contains a type III secretion signal (T3SS), which allows the proteins to be translocated from the bacterium and into the plant cell. The C-terminus contains nuclear localization signals (NLS) and an acidic activation domain (AAD). The DNA binding domain, also termed repeat region, mediates DNA recognition through tandem repeats of 33 to 35 amino acids residues each (Bogdanove et al., 2010). The binding domain usually comprises 15.5 to 19.5 single repeats (Figure 1). The last repeat, close to the C-terminus, is called “half-repeat,” because it generally is 20 amino acids in length. Although the modules have conserved sequences, polymorphisms are found in residues 12 and 13, also known as “repeat-variable di-residues” (RVD). RVDs can be specific for one nucleotide or a number of nucleotides; therefore, 19.5 repeat units target a specific 20-nucleotide sequence in the DNA (Mussolino & Cathomen, 2012).

**Figure 1.**(A) Schematic representation of a TAL effector with the DNA binding domain in red. (B) 3D structure of TALEs obtained from our team’s work in Autodesk Maya. To learn more about our modeling, click **here**.

When in contact with the DNA, the TALE aligns the N-terminal to the 5' end of the DNA and the C-terminal to the 3’ end of the DNA. Each repeat is made of two alpha helices connected by a three-residue loop, two of which amino acids comprise an RVD. Although both amino acids 12 and 13 are responsible for base specificity, the TALE-DNA interaction happens through intermolecular bonds between residue 13 and the target base in the major groove of the DNA. Residue 12 plays a role in stabilizing the RVD loop (Meckler et al., 2013).

Over 20 different RVDs have been identified in TAL effectors. However, four of them appear in 75% of the repeats: HD, NG, NI and NN (Bogdanove et al., 2010). Quantitative analysis of DNA-TALE interactions by Meckler et al. (2013) revealed that the binding affinity is affected by the RVDs in the following order (The letters in brackets show the base that the RVD binds to): NG (T) > HD (C) ~ NN (G)>> NI (A) > NK (G). NG, specific to thymine, and HD, specific to cytosine, are strong RVDs. NN binds both guanine and adenine, but it prefers guanine. NK also interacts with guanine, but with 103-fold lower affinity. NI is specific for adenine, but it has low affinity when compared to strong RVDs such as NG and HD (Meckler et al., 2013). Although less common, another naturally occurring RVD, NH, was described to bind strongly to guanine (Cong et al., 2013). NS binds to any of the four bases and it is present in naturally occurring TALEs such as AvrBs3 from Xanthomonas campestris (Boch et al., 2009).

In addition to RVDs, the DNA binding affinity is also subject to polarity effects. Point mutations at the 5’ end of the target sequence affect TALE-DNA recognition more than the ones at the 3’ end (Meckler et al., 2013). Taking this in consideration, recommendations for TALE design include incorporation of strong RVDs close to the N-terminus (Streubel et al., 2012).

Because TAL effectors can be engineered to bind virtually any DNA sequence, they represent a powerful tool in synthetic biology. They have been extensively used in gene modulation by fusing an activator or a repressor to their C-terminus. Slovenia 2012 iGEM team designed and created repressor TAL effectors by adding KRAB repressor domains and activator TALEs through fusion VP16 activation domain.

Besides gene regulation, TALEs can be fused with DNA cleavage domains of endonucleases and serve as restriction enzymes (Beurdeley et al., 2013). These engineered proteins are termed TALENs or Transcription Activator-Like Effector Nucleases. TALENs can also be used in gene knockout as they are able to promote gene disruption (Bogdanove & Voytas, 2011).

Our team, however, proposes an innovative application for TAL effectors: Using TALE as a nucleotide biosensor. We engineered TALEs to detect entero-haemorrhagic bacteria in feces of BBa_K782004 in cattle populations. As sensors, TALEs can bind to specific regions of the Shiga toxin II gene (stx2) and capture the DNA of interest from a feces sample, making it available for a second TALE, whose binding domain is specific for another region of stx2. This second TALE is connected to a reporter, which makes the TALE-DNA interaction visible within a short period of time.

Engineered TALEs

In designing the TALEs, an important decision was to identify the target gene that would allow our system to detect entero-haemorrhagic E. coli. We determined that stx2 is a gene found in all entero-haemorrhagic organisms. This gene is responsible for production of Shiga toxin, which is the factor that causes illness in humans. We designed two TALEs that bind to two different regions of stx2 gene. Recall that the TALE-capture system detects DNA; therefore, if the TALEs are not designed to exclusively bind to the stx2 gene they could potentially produce a false positive by binding to other DNA sequences similar to the target sequence, such as that of a type of grass consumed by the cow, the cow’s own DNA, or another type of microorganism living inside the cow’s gut. TALEs are not perfect, they can bind to a DNA segment if its sequence is close enough to their target site. Therefore, we designed two TALEs for two separate segments of the stx2 gene, which should dramatically increase the specificity of our system.

In addition to maximizing specificity, we also tried to design both TALEs to have the highest possible binding affinity. The immobilized TALE has to be able to capture and hold a relatively large DNA strand in place even when the large mobile complex also docks onto the DNA. The mobile complex consists of a ferritin nanoparticle fused to twelve TALEs which bind to the other EHEC target region. So the mobile TALE has to be able to keep a very large complex in place when it binds to the DNA. Thus, if our TALEs do not bind strongly to the DNA, the system would produce false negatives. To increase the binding affinity of the TALEs, we picked the pyrimidine-rich regions of the stx2 gene. The relative binding affinity of different RVDs to their respective nucleotides is as follows: NG(1)> NN (0.18) ~ HD (0.16) >> NI (0.0016)> NK(0.00016) where they bind to T, G, C, A, and G, respectively (Meckler et al. 2013). Therefore, picking regions rich in thymines and cytosines would greatly enhance the binding affinity of the TALE to the DNA segment. Although NN also has a relatively high binding affinity, it was avoided as much as possible, as NN can also bind adenine.

Another important factor in the selection of the two TALE target sequences was the distance between them on the DNA. The distance had to be great enough so that the proteins fused to one TALE would not block the target sequence of the other TALE. On the other hand, because DNA can get sheared into smaller pieces due to physical factors, or it can get cut by endonucleases, we had to be careful that the sequences were not too far apart. In order for our system to work, the two regions of DNA that the TALEs bind to must remain contiguous, i.e., be on the same stretch of DNA. Therefore, we picked target sequences as close to each other as possible to decrease the chances of a cut between the two target sites. Our target sites are 200bp apart.

The TALEs were designed to be extremely specific. Based on the considerations explained above, a number of possible target sites were selected. To determine the most specific pair, we conducted BLAST searches on each candidate TALE target sequence separately. As expected, we observed a huge number of alignments with EHEC strains. We also found some partial alignments in non-EHEC organisms which were screened to determine if they could be problematic. For example, consider the two selected target sites to be [1] and [2]. If [1] had a 90% alignment to a region in the Homo Sapiens genome, we checked to see if [2] could also be found in the Homo Sapiens genome. If it was found, then we checked to see whether both [1] and [2] were found on the same chromosome; if they were found on different chromosomes, meaning they were not physically attached, the system would not detect it as a false positive. If they were found on the same chromosome, we determined which end of the target sequences aligned with the similar DNA sequence. This is important as TALEs are polar and bind significantly stronger to the 5’ end of their target sequence compared to the 3’ end. Therefore, if the alignment included the 0th to 10th nucleotide of the target sequence, that specific two target site combination could be ruled out. To get a better idea of how we designed the TALEs, please refer to Figure 2 below.

**Figure 2.**This flow chart shows the process to design TALEs specific to *stx2* gene. First we picked two target sequences ([1] and [2]) based on the consideration mentioned in the text. Then we followed this flowchart to make sure the combination of the two target sites will not produce a false positive.

The figure below shows one of the target sequences we picked and how it is common among some EHEC organisms as an example.

TALE Target Site — **Figure 3.**This alignment shows one of the target sequences we picked. Notice how it starts with a T and it is pyrimidine rich. This is the target sequence for **BBa_K1189033**

Proof of Concept

The TALEs are a very fundamental part of our project. In order to have a functioning system for E. coli detection, we need to have proteins that will successfully recognize and bind our DNA. However, before we can actually create and use TALEs to bind to enterohemorrhagic E. coli DNA, we need to have a proof of concept. On the iGEM parts registry we found TALEs that Slovenia's 2012 iGEM team synthesized for a previous project. TALEs are very large proteins and are costly to synthesize. Therefore, we decided to use these TALEs to test our system. We also saw this as an opportunity to use and build upon parts made by former iGEM teams. Thus, we ordered their three TALEs: TALE A (BBa_K782004), TALE B (BBa_K782006), and TALE D (BBa_K782005).

We used TALE A(BBa_K782004) and TALE B (BBa_K782006) to build our constructs and test our system, as it requires the use of two different TALE proteins. To test our TALEs, we had to synthesize the target sequences that they would recognize and bind to. We constructed a series of target sequences to test the binding affinity, some were the TALE's target site (BBa_K1189004, BBa_K1189005), while others had specific base pair alterations of its target site. These mutations will allow us to test the binding affinity of the TALE when it encounters a similar but not equivalent target sequence. Consequently, we can determine the specificity with which the proteins bind and how they might respond to DNA not belonging to enterohemorrhagic E. coli. This will help us define how specific we expect our final system to be.

When we sequenced TALE B (BBa_K782006), we discovered that it had a small mutated segment. We expected a sequence of AGCAATGGG in the repeat variable di-residue of the second repeat. However, the sequence was actually TCCCACGAC. This meant that the required target sequence at this position was a C, and not a T, as the parts registry web page indicates. The two TALEs also had a Kozak sequence at the front of the sequence. We took out the Kozak sequence before using TALE A and TALE B to construct BBa_K1189000 and BBa_K1189000.

Since, our system uses two TALEs for functioning, we also made a plasmid containing both [A] (TALE A target site) and [B], with some junk nucleotides in between, so that the distance between [A] and [B] is about the same distance between the two target sites of the TALEs we engineered for EHEC detection (about 200bp).

We also incorporated TALE A (BBa_K782004) and TALE B (BBa_K782006) in our other constructs. We synthesized J04500+His+TALE A+link+Kcoil (BBa_K1189029) and J04500+His+TALE B+link+Kcoil (BBa_K1189030). These two parts have Kcoil fused to TALEs. This allows our TALEs to bind to our Prussian blue reporter in vitro which is fused to an Ecoil.

As a back up plan, we considered using beta-lactamase as a reporter. Therefore, we synthesized J04500+His+TALA+Link+Blac (BBa_K1189031). In all of J04500+His+TALA+link+Kcoil (BBa_K1189029), J04500+His+TALB+link+Kcoil (BBa_K1189030), and J04500+His+TALA+Link+Blac (BBa_K1189031) we tried to further improve the TALEs, so that they are more useful to us and the general iGEM community, by codon optimizing the TALEs for E. coli. We also took out the nuclear localization sequence (NLS) on the C-terminus of the TALEs, since E. coli does not have a nucleus.

These parts were also designed in a way so that they could be used for a wide variety of applications. There is a KasI restriction site incorporated immediately following the TALE. This introduces the chemically inert amino acids, alanine and glycine, which are what the linker is made of and would not disrupt the functionality of either the TALE or the protein it is fused to. This allows other teams to easily switch out the TALE in the part with their own engineered TALE. This means that our system can potentially be used as a platform technology. By designing a TALE specific to any DNA target site, any one can use this technology to sense their DNA of interest.

These are the constructs we developed using Parts Registry TALE A(BBa_K782004) and TALE B (BBa_K782006):

**Figure 4.** Part **BBa_K1189004**. This construct has the TALE target site for TALE A (**BBa_K782004**). The target sequence was added to the RPF gene in a pS1C3 backbone.

**Figure 5.** Part **BBa_K1189005**. This construct has the TALE target site for TALE B (**BBa_K782006**). The target sequence was added to the RPF gene in a pS1C3 backbone.

**Figure 8.** Part **K1189023**. TALE A was originally designed by Slovenia's 2012 iGEM team (**BBa_K782006**). However, Slovenia's team made the part for use in eukaryotic cells. We modified the part so that it can be used in *E.coli* by removing the Kozak sequence and nuclear localization sequence, and codon optimizing it for *E.coli*.In addition, the sequence of TALE B reported by Slovenia was not completely accurate. We fixed the mutation in their sequence

**Figure 9.** Part **K1189000**. TALE A was originally designed by Slovenia's 2012 iGEM team (**BBa_K782004**). TALE A was built downstream of BBa_ J04500, an IPTG inducible promoter with RBS. Since this gene was used in eukaryotic cells by Slovenia's iGEM team, it had a Kozak sequence at the start of it. We removed this Kozak sequence to allow for the expression of the protein in *E.coli*.

**Figure 10.** Part **BBa_K1189001**. TALE B was originally designed by Slovenia's 2012 iGEM team (**BBa_K782006**). TALE B was built downstream of BBa_ J04500, an IPTG inducible promoter with RBS. Since this gene was used in eukaryotic cells by Slovenia's iGEM team, it had a Kozak sequence at the start of it. We removed this Kozak sequence to allow for the expression of the protein in *E.coli*.

**Figure 11.** Part **BBa_K1189029**. TALE A bound to K coil. E/K coils are synthetic coiled-coil domains designed specifically to bind to each other with high affinity and specificity.

Results

TALE Target Sequences

We produced specific DNA target sequences for both TALE A ([A]) and TALE B ([B]). We designed primers that would produce these target sequences in addition to about 480bp of junk DNA (to facilitate cloning) in a PCR reaction. Our target sequences were inserted into a pSB1C3 backbone. Our verification digest of the target sequences is shown in Figure 4 below. The backbone is the highest band at around 2070bp and our target sequences are represented by the lowest bands at around 500bp. These target sequences were sent for sequencing and have been sequence verified.

Once the individual target sequences were made, we designed new primers to create a target sequence construct with both the target sequences together. We did a KAPA PCR to produce our linear product. The PCR product consisted of the two target sequences, with about 200bp of junk DNA in between, and biobrick cut sites on either side. The results of the PCR are shown in Figure 5 below. We then put our linear PCR product into a pSB1C3 backbone. This construct was then transformed and mini prepped. It was sent for sequencing and has been sequence verified. Because TALE B was discovered to have a mutation, we have two versions of this construct. One has the original TALE B target sequence we created before we realized there was a mutation. The other contains the new target sequence we created for the mutated TALE B.

**Figure 5.**Kapa HiFi PCR products of TALE target sequences. These are the target sequences for both TALE A and the mutated and non mutated TALE B.

Protein Purification

After getting our DNA constructs ready, we moved on to express and purify them. We first did a mini expression on J04500+His+TALE A (BBa_K1189000) and J04500+His+TALE B (BBa_K1189001). We were finally able to observe expression of TALE A and TALE B, once we followed the Procedure outlined in Meckler et al. (2013). Figure 6 shows the western blot we performed on the crude lysate of the TALE A and TALE B. Subsequently, we did a large scale expression and purification through Ni-NTA column.

**Figure 6.**Western Blot of crude lysate of **BBa_J04500**+**TALE A** and **BBa_J04500**+**TALE B**.

Following the same conditions for large scale protein expression and purification we were able to purify TALE A+link+Kcoil (BBa_K1189029) and J04500+His+TALA+Link+Blac (BBa_K1189031). Figure 7 shows the western blot we performed on TALE B+link+Kcoil (BBa_K1189030) and Figure 8 shows the western blot for J04500+His+TALA+Link+Blac (BBa_K1189031 ). Figures 9 and 10 show the SDS-PAGE we ran on the elutions we collected after Ni-NTA purification of TALE B+link+Kcoil (BBa_K1189030) and J04500+His+TALA+Link+Blac (BBa_K1189031), respectively.

**Figure 7.**Western Blot of TALE A+link+Kcoil (**BBa_K1189029**).

**Figure 8.**Western Blot of J04500+His+TALA+Link+Blac (**BBa_K1189031**).

**Figure 9.**SDS Page Gel of **pSBIC3**-**BBa_J04500**-His-**TALE A**-link-Kcoil. Bands were expected at 82kDa.

**Figure 10.**SDS Page Gel of **pSB1C3**-**BBa_J04500**-His-**TALE A**-link-Blac.

We initially used the TALE A (BBa_K782004) and TALE B (BBa_K782006) from the parts registry and added a lacI promoter and RBS (BBa_J04500) upstream of it. However, since these TALEs were initially used in eukaryotic cells, they have a Kozak sequence at the start of the TALE A (BBa_K782004) and TALE B (BBa_K782006), which prevents the expression of these proteins. To improve the part and make it useful to everyone using E. coli as a chassis, we removed the Kozak sequence from the TALEs. All the parts that we submitted to the registry have the Kozak sequence removed and can be expressed in E. coli, as shown above. The western blot below shows expression of TALE B with (BBa_J04500 + BBa_K782006) and TALE B without (BBa_K1189001) the Kozak sequence in E. coli.

**Figure 11.** Western Blot of crude samples of TALE B with and without the Kozak sequence

TALE Target Sites — **Figure EXPLOSIONS.** TALE GROUP. ADD STUFF HERE.

Team:Calgary/Project/OurSensor/Detector

From 2013.igem.org

Detector

Detector

Background

Engineered TALEs

Proof of Concept

Results

TALE Target Sequences

Protein Purification