Team:Lethbridge/human practices

From 2013.igem.org

Human Practices

Biosecurity and DNA Synthesis

Current Sequence Screening Methods

In the last 5 years, there has been increased recognition of the powers of gene synthesis. It is now easy and affordable to look up genetic sequences for nearly any organism, design an expression construct, and order that gene from a synthesis company. This allows for the creative projects we see each year at the iGEM jamborees, but it also allows those with malevolent intentions and adequate knowledge to easily order genes that may pose a hazard to others.

The recognition of this potential has led members of governments and large synthesis companies to try and establish a framework for screening these synthesis orders to ensure that potentially hazardous sequences stay in the hands of those who would use them for legitimate research purposes. This effort to regulate the gene synthesis industry has largely come from within. In the late 2000's, both Europe's International Association of Synthetic Biology (IASP) and North America's International Gene Synthesis Consortium (IGSC) put forth reports on the state of synthesis order screening as well as a set of best practices to follow [1-2]. These bodies are made up of individuals from the major gene synthesis companies in each region as well as experts from major universities.

Both groups outline a very similar approach to screening these orders for legitimacy. This entails a two part approach that first compares the ordered sequence to sequences on a list of known bio-hazardous agents and second, verifies the legitimacy of the customer and their intended use of the final product. In both reports the sequence screening utilizes existing pathogen databases, such as the US Select Agents and Toxins List or the Australia Group List as well as internal pathogen databases, and BLASTs the submitted sequence against these regulated ones. This first step in screening is conducted automatically. If there is a similarity between the submitted sequence and one of the sequences on these lists that exceeds the specified threshold, human investigation is used to further characterize the sequence [2].

Customer screening is arguably the most important aspect of the current gene synthesis security strategies. It is possible that ordering sequences that could be considered hazardous is necessary for research applications and adequate customer screening could determine if this sequence was going to someone at a research facility for legitimate use. European and North American groups recommend collection of the name, mailing address, and institutional affiliation of the customer to ensure that they are individuals working in verifiable positions within companies or academic institutions [1-2]. This information is then independently verified and checked against a number of national and international lists of individuals of concern, such as the US Specially Designated Nationals list.

While these protocols are put forth by consortium members in both Europe and North America, as well as there being a set of guidelines published by the US Department of Health and Human Services, all of these measures are voluntary [3]. There are no penalties to synthesis companies that do not screen the sequences or customers they deal with, outside of restrictions on international shipment of dual-use goods. This lack of legal regulation has the potential to allow dangerous sequences into the hands of malevolent individuals if any company decides to loosen their security criteria in order to save time or money in processing an order.

Potential Weaknesses of Current Screening Procedures

Although companies included in the IASB and IGSC adhere to the regulations of the Code of Conduct for Best Practices in Gene Synthesis or the Harmonized Screening Protocol, respectively [1-2], these protocols have a few potential weaknesses. Both of these protocols require that all synthesis orders are at minimum screened against a regulated pathogen database. However, these lists are by no means complete and there is a chance that potentially hazardous sequences can be ordered and synthesized without any efforts made to investigate the source of the order. This is currently one of the major weaknesses of screening protocols, and efforts are being made to compile a list of data from organisms on the Select Agents list, the Australia Group List, and other national lists of regulated pathogens. Once complete, this list will provide a more comprehensive database of potential pathogenic and toxic organism sequences as a step toward higher biosecurity.

The following are a few other weaknesses associated with current screening protocols. First, the IASB requires its member companies to screen orders of a minimum 200 base pairs in length, but there is also the potential of larger sequences being ordered as a series of short oligonucleotide sequences, from one company or multiple companies, that could bypass the screening process entirely. Though it can be more difficult to get direct database hits for shorter sequences, including these types of orders in the screening procedure is still feasible and may only require extra processing time for human investigation for these database matches. Second, though a legitimate customer can be approved for ordering hazardous sequences, the synthesis company cannot be sure of the final end user. There is no way to ensure that the customer does not ship the product to a third-party user that has not been investigated. Finally, and almost the most concerning weakness of current screening protocol, is the accountability of DNA synthesis companies. While most of the larger synthesis companies are members of the IASB or IGSC, complying to the standards mandated by these groups is still only a voluntary practice. There are no regulations in place that require a synthesis company to screen their orders for hazardous sequences or to follow-up with customer investigations of suspicious orders [4]. Even for orders that do not give a direct match to a hazardous sequence, any additional steps to associate function with the sequence is at the discretion of the company. Minshull and Wagner (representing DNA2.0 and GENEART) suggest that synthesis companies should be subject to routine “tests” of their screening protocols by their respective government bodies to ensure that they are complying to screening protocols and using the most up-to-date screening databases [5].

How elements of our project were used to examine synthesis screening procedures

Our project involves the characterization of pseudoknot RNA secondary structural motifs. These motifs can be used to express dual-coding gene sequences to give protein products whose expression can be regulated by the pseudoknot’s ability to induce ribosomal frameshifting. This method of coding can allow for the expression of a protein which may be encoded by fragments in alternating reading frames. This technology adds another level of complexity in terms of screening for controlled sequences, in that the protein produced from a synthesized construct may not be the product of translating a gene in one continuous reading frame.

It was our goal to investigate the ability of DNA synthesis companies to identify hazardous sequences in their screening procedures in the presence of frameshifting elements. A series of hazardous sequences containing intervening pseudoknots were designed and tested by two of the leading synthesis companies in North America in their standard screening procedures. These constructs contained all the necessary components to form a dangerous protein product, with DNA segments allocated into different reading frames and successively frameshifted using pseudoknots. The results from this screening test indicate that the current screening methods are successful at identifying hazardous sequences that had been “hidden” in multiple reading frames. The companies expressed their support of our efforts to investigate loopholes and problems in current screening procedures with regards to this new type of technology.

Possible Methods for Bypassing Screening

Codon redundancy

Codon redundancy in the genetic code refers to having multiple codons that code for a single amino acid. This redundancy allows for the DNA sequence of a protein to be changed without altering the resulting amino acid sequence. By utilizing codon redundancy, bioterrorists could drastically change the known DNA sequence of a harmful virus or protein. Fortunately, synthesis companies scan both the DNA and protein sequence of sequences submitted for synthesis, and in this way would still be able to identify a harmful sequence that had been changed using codon redundancy. However, this method in conjunction with others, such as frameshifting elements or those others listed below, could potentially be used to bypass the DNA and amino acid sequence screening performed by synthesis companies.

Utilizing conservative and non-conservative regions of proteins

Homologous proteins are those that are derived from the same ancestor; however, the two proteins do not have to share 100% amino acid identity. Multiple sequence alignments of amino acid sequences of homologous proteins from different organisms can be used to identify functionally important residues in a protein by indicating which residues are absolutely conserved, semi-conserved, and non-conserved. This would allow an individual to alter a controlled protein sequence by changing all or some of the conserved and semi-conserved residues to residues with similar physiochemical properties. In addition, all or some of the non-conserved residues could be substituted with essentially any other amino acid without risking loss of the protein’s function. This method, in combination with utilizing codon redundancy, would allow for more drastic alterations to be made to both the DNA and protein sequence from a pathogenic organism that could bypass screening procedures.

Using “custom” tRNAs

A more complicated means for bypassing screening procedures by decoupling protein sequence from function would be to use a highly engineered system with non-canonical tRNAs. An organism could be designed that uses engineered amino acyl-tRNA synthetases that recognize non-cognate tRNAs and therefore aminoacylate the tRNA with the incorrect amino acid. By using this alternative genetic code in the engineered organism, the DNA sequence from a pathogenic organism could be altered in an almost indistinguishable way while still producing the protein of interest.

Do-it-yourself synthesis

As time progresses, the cost of a DNA synthesizer is getting more affordable to research labs and independent users. Initially this may seem like a good thing, but there are tremendous dangers that are associated with this development. Directly bypassing screening procedures by not requiring the services of synthesis companies allows the owner of the DNA synthesizer unrestricted access to synthesize whatever sequence they choose. This would make any techniques to bypass the screening methods of synthesis companies obsolete. As a result, there may need to be regulations put in place to limit or restrict the access of DNA synthesizers. This can be done for example by requiring the owner to upload any sequences they synthesize to a governing body that will scan them for harmful sequences, or by installing software that will screen sequences prior to allowing them to be synthesized. A combination of these two methods as well as additional advances in screening procedures is crucial to ensure the safety of the general public.

Changes recommended for screening protocols

Though commendable biosecurity efforts have been put forward by major international synthesis companies, these groups are aware that standard protocols may not be enough to mitigate the risk of the synthesis and delivery of hazardous sequences. In the IASB Code of Conduct for Best Practices in Gene Synthesis, all member companies are mandated to take part in ongoing efforts to refine and improve the current screening technologies by establishing a review committee to update and expand the Code of Conduct as new or changing threats emerge, maintain open communication with member companies through the exchange of research and literature searches, and regularly collaborating on best practices and new screening ideas [2]. While these practices are important for synthesis companies to implement, DNA synthesis is becoming less expensive and more accessible by non-professionals. According to Minshull and Wagner “[a]nyone who is sufficiently motivated could synthesize the gene for a toxin or even an entire viral genome using readily available reagents and without ever going near a specialized synthesizer” [5]. With molecular biology equipment becoming available through avenues such as E-Bay and other online dealers, individuals with limited molecular biology experience could soon realistically synthesize their own DNA sequences in the next few years [4]. Screening protocols could thereafter become obsolete. Until then, further steps are required to assure the public, government, and research community that biosecurity is being upheld to the highest standards possible. This may involve expanding the use of online forums, such as VIREP (Virulence Factor Information Repository), to allow researchers to deposit and access information about genes and organisms. Additionally, government regulations may need to be implemented that require all synthesis companies to adhere to standard practices and implement human investigation of suspicious orders [6]. This may best be achieved through the integration of both the IASB and IGSC protocols into an industry-wide Code of Conduct.

Testing the System

Learning to Be Bad

This year, we focused on the implications our frameshifting project might have on biosecurity. In thinking about the ways our pseudoknots could be used to do new, exciting things in synthetic biology, we came up with a use that is more frightening than exciting. Bioterrorism.

The idea is this: There are guidelines put forward by a number of industry groups on how DNA synthesis orders should be screened to ensure no biohazardous sequences get into the hands of the wrong people. The standard protocol for screening sequences involves taking the submitted DNA sequences and translating all six reading frames, then using BLAST to compare the DNA and amino acid sequences to those of organisms on a list of controlled agents.

Our pseudoknot enables the ribosome to switch frames mid-translation, essentially splitting the entire protein amongst as many reading frames as there are pseudoknots. If someone were to split a protein from the Ebola virus into small fragments distributed across the reading frames, could they bypass this initial automatic screening step?

Putting our White Hats On

To investigate this potential for abuse of our project, we worked together with major North American synthesis companies to see if we could try and fool their screening methods using our frameshifting elements. We designed and submitted sequences with vary coding changes and coding fragment sizes between the sequences for our PK401 pseudoknot to the synthesis companies we had partnered with. There is a full description of the sequences and a link to the raw data files below.

Sequence ID Number Sequence Origin Total Length (bp) Codon Changes (%) Length between PK (bp)

1 CFP 966 25 180

2 Staph-ORF 1869 0 210

3 Ricin 2392 25 198

4 Staph-ORF 2450 25 102

5 CFP 966 16 210

6 CFP 966 0 180

7 Staph-ORF 2450 0 102

8 Ricin 2392 0 198

9 Ebola Matrix Protein 1031 0 0

10 CFP 966 16 180

11 Staph-ORF 2450 0 102

12 CFP 966 0 210

13 Staph-ORF 1869 20 210

14 Ricin 3139 25 99

15 Staph-ORF 1869 25 210

16 CFP 966 25 210

17 Ricin 3139 0 99

Click here for sequences used
Can We Sleep at Night?

These sequences were sent to the companies and screened for hazardous elements. One company managed to detect all of the “threats” on the first level of screening. According to them, their next steps would be to do a review of the “threat” sequences using a group of human experts while interviewing the customer to determine their background, shipping and payment information, and the intended use of the synthesized DNA.

Another company simply analyzed the sequences to determine if they could actually construct the DNA if we were to order it. All of the sequences containing the pseudoknot elements were flagged as containing high repeats, but sequence 9, the Ebola matrix protein with no pseudoknot elements, was determined to be ready for synthesis. They did not make it clear whether or not there would be another level of screening to determine the origin of the sequences.

Based on these results, current industry standard screening protocols appear to be sufficient to detect biosecurity threats, even with codon changes and the distribution of coding sequences amongst many reading frames. What is still a cause for concern is the strictness with which these protocols are applied. There is no legal requirement to execute biosecurity screens on DNA synthesis orders; all of the proposed protocols are currently voluntary guidelines. This could allow companies to relax their security protocols and may increase the potential of a serious bioterrorism threat coming to fruition.

In order to make sure that the act of releasing the results of our study did not pose a security threat in itself, we consulted with Edward You. Edward is a representative of the FBI’s WMD department. Many of the industry guidelines for screening call for collaboration between the synthesis companies and government agencies responsible for responding to bioterrorism threats.

What About Non-conserved Residues?

Along with codon changes, it is possible to isolate sequence and function by altering the amino acid sequence. Using a multiple sequence alignment, the regions of proteins that appear essential for function can be determined, and it may be possible to alter the amino acids at the non-conserved residues.

Ricin is a toxic protein produced in the Castor bean. It is comprised of two chains, chain A which degrades the 28s rRNA and prevents protein translation, and chain B which binds to surface sugar moieties and allows the protein access tto the cellular machinery. Both of these proteins have many functional analogues that allow for multiple sequence alignments and the determination of conserved residues. To test if the combination of amino acid sequence alterations and the implementation of our PK401 psuedoknot has an impact on biosecurity, we have designed altered Ricin chain A and chain B sequences.

Sequence ID Number Sequence Origin Total Length (bp) Amino Acids Changed Length between PK (bp)

1 CFP 966 0 180

2 Chain B 1389 22 100

3 Chain A 2167 60 50

4 Chain B 864 22 200

5 CFP 966 0 210

6 CFP 966 0 180

7 Chain B 2065 22 50

8 Chain A 1491 60 100

9 Chain A 1154 60 200

10 CFP 966 0 180

11 Chain A 814 60 0

12 Chain B 800 22 0

Click here for sequences used
These sequences have been submitted to our industry partners and we are awaiting the screening results.

References

[1] International Gene Synthesis Consortium. Harmonized screening protocol: gene sequence & customer screening to promote biosecurity. http://www.genesynthesisconsortium.org/wp-content/uploads/2012/02/IGSC-Harmonized-Screening-Protocol1.pdf (2009).

[2] International Association Synthetic Biology. Code of conduct for best practices in gene synthesis. http://www.ia-sb.eu/tasks/sites/synthetic-biology/assets/File/pdf/iasb_code_of_conduct_final.pdf (2009).

[3] U.S. Department of Health and Human Services. Screening Framework Guidance for Providers of Synthetic DoublesStranded DNA. http://www.phe.gov/Preparedness/legal/guidance/syndna/Documents/syndna-guidance.pdf (2010).

[4] Maurer S. M., Fischer M., Schwer H., Stähler C., Stähler P., & Bernauer H. S. Working paper: making commercial biology safer: what the gene synthesis industry has learned about screening customers and orders. http://gspp.berkeley.edu/iths/Maurer_IASB_Screening.pdf (2009).

[5] Minshull J. & Wagner, R. Nat. Biotechnol. 27, 800-801 (2009).

[6] Fischer M. & Maurer S. M. Nat. Biotechnol. 28, 20-22 (2010).

Sequence ID Number	Sequence Origin	Total Length (bp)	Codon Changes (%)	Length between PK (bp)
1	CFP	966	25	180
2	Staph-ORF	1869	0	210
3	Ricin	2392	25	198
4	Staph-ORF	2450	25	102
5	CFP	966	16	210
6	CFP	966	0	180
7	Staph-ORF	2450	0	102
8	Ricin	2392	0	198
9	Ebola Matrix Protein	1031	0	0
10	CFP	966	16	180
11	Staph-ORF	2450	0	102
12	CFP	966	0	210
13	Staph-ORF	1869	20	210
14	Ricin	3139	25	99
15	Staph-ORF	1869	25	210
16	CFP	966	25	210
17	Ricin	3139	0	99