The Travelling Salesman Problem

MapMen is Team Alberta's endeavour to create a biological computer capable of solving Travelling Salesman Problems (TSP), a general form of problem that asks:

Given a set of cities, and a list of the distances between each pair of those cities, what is the shortest possible route that travels to every city exactly once and then returns to the origin city?

In Biological Terms

To take advantage of the considerable information processing power of biology, we must create biochemical analogues for all of the components of the original problem. We begin by assigning a symbolic reporter gene or selectable marker gene to each of the six paths in our example map. The assignment is arbitrary; which gene is used for which path does not matter as long as it creates some kind of observable phenotype. For example, the path connecting the northeast city and the northwest city is symbolized by a gene coding for ampicillin resistance. The path from the northeast to city to the southeast meanwhile, is symbolized by a gene coding for kanamycin resistance.

Now, just as the travelling salesman might describe taking a particular route to his customers – leaving from the northeast origin city and taking the road to the southwest, then turning up the north road from there, then back down the southeast, before returning northward home from, we can also represent that series of paths as a sequence of genes: ChlorR, RFP, GFP, and KanR.

Finding the Path

Just as a computer using a brute force approach to the travelling salesman problem would test every route and tally the sum distance of the paths travelled, our biocomputer assembles a multitude of plasmids, representing every possible combination of paths, through a set of ligations that represent the legs of the salesman’s journey. In the first leg, we depart from the origin city along any of the available roads: AmpR, ChlorR, or KanR. In the plasmid production, we begin with a combinatorial ligation – one where all three of the genes are being ligated with origins of replication, but only one of the three is able to ligate onto each Ori, resulting in a collection of incomplete plasmids that each includes just one of the three genes.

Following the first gene ligation, a short linker piece is added, signifying the city that has been reached – all of the linkers are identical, there is no differentiation of routes based on this component. In the second leg of the journey, we perform a ligation with three more genes, representing all of the possible paths the salesman might take next – three colour genes. Again, each growing plasmid can only incorporate one of the new genes, so we obtain a random assortment of incomplete plasmids that each include one antibiotic resistance from the first ligation and one colour gene from the third ligation.

The third leg of the journey is identical to the second, incorporating an additional colour gene, and the final leg uses the three antibiotic genes once again. At the end of this process, we will have obtained a randomized distribution of 76 different kinds of plasmids representing all of the different possible permutations of the genes, capable of producing 18 unique phenotypes, and representing the six bi-directional solutions to the four-city travelling salesman problem.

MapMakers at Work

Once the plasmids have been produced, how can we use them to identify the correct solution? The answer is in the frequency with which each plasmid occurs: We create bias in the ligations during assembly so that genes that represent shorter paths are more likely to be incorporated into the plasmids, meaning in turn that the final product plasmids are more likely to be composed of several short-path genes. As a result, although we still expect to see plasmids representing every possible route, those plasmids that represent the shortest routes through the map will be much more common than plasmids representing lengthy routes. By identifying the most abundant plasmid, we identify the shortest route and thus the optimal solution.

The bias is introduced through the concentration of the reactants in each ligation. If a gene is meant to represent a short path, its concentration in the ligation will be higher than the others, and conversely, genes that are meant to represent lengthy paths will be included in lower concentrations than the others. As a result, at any given ligation step, the gene symbolizing the shortest available path is more likely to be incorporated. Among the product plasmids, genes are incorporated at a rate inversely proportional to their length. The concentration used for each gene in the ligation is described by the expression: –ln[Gene]∝(path distance)

To identify which of the plasmids is most frequent, the products are transformed into a bacterial culture. Because each of the genes produces an easily interpreted phenotype, this allows us to infer the most frequently transformed plasmid from the frequency of the colonies produced on agar plates. The plasmid carried by any given colony on the plates below can be readily identified by the combination of the colony colour (purple colonies have RFP and aCP, orange colonies have RFP and GFP, and green colonies have GFP and aCP), and the antibiotic resistances it exhibits, which are inferred from the plate on which it is growing. As an example, a purple colony growing on the kan/amp plate must have the Amp-Red-Blue-Kan-plasmid shown in the first column below.

Many of the colonies do not correspond to any valid route. For instance, an orange colony on the kan/amp plate has the plasmid Amp-Red-Green-Kan, which does not describe a valid circuit visiting each city exactly once – but because we can tell that this is an invalid solution, we would not count this colony when assessing the frequency of the valid plasmids. Any four-city travelling salesman problem has just three unique solutions, and so there are only three phenotypes that must be counted: purple kan/amp, orange kan/chlor, and green amp/chlor. The colonies of these three phenotypes are counted and compared, and the most frequently occurring plasmid is the optimal solution!

Calculating Gene Concentration