Engineered TALEs

Our system had to be designed to only detect EHEC E.coli. We found out that Stx2 is the common gene between all EHEC organisms. This gene is responsible for production of Shigatoxin, which is the factor that causes illness in humans. Therefore, we decided to design two TALEs that bind to two regions of Stx2 gene. We have to remember that our TALE-capture system detects DNA; therefore, if the TALEs are not designed to exclusively bind to the Stx2 gene they could produce a false positive by binding to anything such as the DNA of a type of grass that the cow has eaten, cattle’s own DNA, or another type of microorganism living inside the cow’s gut. Furthermore, TALEs are not perfect. They can bind to a DNA segment if its sequence is close enough to their target site. This could especially be problematic if a piece of DNA has a high similarity with the nucleotides on the 5’ end of TALE’s target site.

In addition to specificity, we also tried to design TALEs that have the highest binding affinity as possible. This is important for both TALEs. The immobilized TALE has to be able to keep a relatively large DNA in place and keep holding on to it when the very large mobile complex also docks onto the DNA (TALE attached to a ferritin nanoparticle or beta-lactamase). The mobile TALE is fused to a ferritin nanoparticle, which in turn is fused to eleven other TALEs. So the mobile TALE has to be able to keep a very large complex in place when it binds to the DNA. Thus, if our TALEs do not bind strongly to the DNA, the system would produce false negatives.

To increase the binding affinity of the TALEs, we picked the pyrimidine-rich regions of the Stx2 gene. The relative binding affinity of different RVDs to their respective nucleotides is as follows: NG(1)> NN (0.18) ~ HD (0.16) >> NI (0.0016)> NK(0.00016). NG, NN, HD, NI, and NK bind to T, G, C, A, and G, respectively (Meckler et al. 2013). Therefore, picking regions rich in Thymidines and Cystidines would dramatically increase the binding affinity of the TALE to the DNA segment. Although NN also has a relatively high binding affinity, it was avoided as much as possible, as NN can also bind Adenine and therefore decrease specificity.

Another important factor in picking the target sequences of the two TALEs was the distance between the two. The distance has to be far enough so that the proteins fused to one TALE would not block the target sequence of the other TALE. With that in mind, we tried to pick two target sequences that are as close to each other as possible; DNA can get sheared into smaller pieces due to physical factors, or it can get cut by endonucleases. In order for our system to detect an EHEC, the two regions of DNA that the TALEs bind to must remain attached together. Therefore, we picked target sequences as close to each other as possible to decrease the chances of a cut between the two target sites.

The TALEs were designed to be extremely specific. Based on the considerations explained above, a number of possible target sites were selected. To find out the most specific pair, we conducted some BLAST searches on each TALE target sequence separately. As expected, we observed a huge number of alignments with EHEC strains. We also found out some partial alignments in non-EHEC organisms that were screened to find out if they can be problematic. Lets call the two selected target sites [1] and [2]. If [1] had a 90% alignment to a region in the Homo Sapiens genome, we checked to see if [2] can also be found in Homo Sapiens genome. If it was found, then we checked to see whether both [1] and [2] are found on the same chromosome; if they are found on different chromosomes, they are not attached so the system cannot detect it as a false positive. If they were found on the same chromosome, we figured out which part of [1] or [2] the aligns with that match. Remember that TALEs are polar. In other words, they bind much more strongly to the 5’ end of their target sequence compared to the 3’ end. Therefore, if the alignment included 0 to 10th nucleotide of the target sequence, that combination of those two target sites was ruled out. To get better idea of how we went about designing the TALE, refer to the figure below.