Team:UCL/Modeling/Bioinformatics

From 2013.igem.org

(Difference between revisions)

Revision as of 10:13, 12 September 2013

A BIOINFORMATICS APPROACH

Finding New Parts

Bioinformatics creates and enhances methods for storing, retrieving, organising and analysing biological data. We decided to take a completely new approach in our dry lab work and look into bioinformatic approaches to studying Alzheimer’s disease (AD).

The rationale behind this is simple. In order to make a genetic circuit in a synthetic biological construct as effective as possible in a medical application, we may need to target key dysfunctional genes within the problematic biological entity. There are many risk factors for AD and so predicting the key, ‘driver genes’, and the group of proteins with which they interact is invaluable in knowing what we want our construct to produce, in order to mitigate AD. The idea is that bioinformatics work can feed back into synthetic biology, and though we did not have the time to demonstrate this full circle, we feel bioinformatics can have a place in iGEM, helping teams to decide which dysfunctional genes to target in medical projects.

Bioinformatics and Alzheimer’s Disease

Recent progress in characterising AD has lead to the identification of dozens of highly interconnected genetic risk factors, yet it is likely that many more remain undiscovered (Soler-Lopez et al. 2011) and the elucidation of their roles in AD could prove pivotal in beating the condition. AD is genetically complex [internal link to neuropathology page], linked with many defects both mutational or of susceptibility. These defects produce alterations in the molecular interactions of cellular pathways, the collective effect of which may be gauged through the structure of the protein network (Zhang et al. 2013). In other words, there is a strong link between protein connectivity and the disease phenotype. AD arises from the downstream interplay between genetic and non-genetic alterations in the human protein interaction network (Zhang et al. 2013).

Recent progress in characterising AD has lead to the identification of dozens of highly interconnected genetic risk factors, yet it is likely that many more remain undiscovered ((Soler-Lopez et al. 2011) and the elucidation of their roles in AD could prove pivotal in beating the condition. AD is genetically complex [internal link to neuropathology page], linked with many defects both mutational or of susceptibility. These defects produce alterations in the molecular interactions of cellular pathways, the collective effect of which may be gauged through the structure of the protein network(Zhang et al. 2013). In other words, there is a strong link between protein connectivity and the disease phenotype. AD arises from the downstream interplay between genetic and non-genetic alterations in the human protein interaction network (Zhang et al. 2013).

In all pathologies, the most common way to predict driver genes is to target commonly recurrent genes. However, this approach misses misses rare altered genes which comprise the majority of genetic defects leading to, for example, carcinogenesis and arguably AD. This is partly because alterations in a single protein module can lead to the same disease phenotype. Thus, identification may best be attempted on a modular level. Yet it is also important to note correlation events between modules. Simply put, many rare gene alterations that influence the module they belong to and co-altered modules can collectively generate the disease pathology (Gu et al. 2013).

Our Programme

Under the guidance and tutelage of Dr Tammy Cheng from the Biomolecular Modelling (BMM) lab at Cancer Research UK, team member Alexander Bates coded in python a network analysis programme based on a method devised by Gu et al. and originally applied to the study of glioblastoma (brain cancer). The programme tries to reveal driver genes and co-altered functional modules for AD. The analysis procedure involves mapping altered genes (mutations, amplifications, repressions, etc.) in patient microRNA data to the protein interaction network (PIT), which currently accounts for 48,480 interactions between 10,982 human genes. This is termed the ‘AD altered network’, and is searched with the algorithm suggested by Gu et al. (which has been re-coded from scratch).

Pairs of co-altered modules (‘G1’ and ‘G2’ in equation) are assumed to be so if any gene within each module is altered in a proportion of AD sufferers, and genes between the modules are often altered together. For two modules, G1 and G2, we must calculate the probability, P, of observing than the number of the samples in the patient gene expression data that by chance simultaneously carry alterations in both gene sets.

‘n’ is the total number of patient samples, ‘a’ is the number of patients with alterations in both G1 and G2, ‘b’ is the number of patients with alteration in just G1, ‘c’ is the number of patients with alterations in only G2, and ‘d’ is the number of patients with alterations in neither set. The co-altered score’ S, is defined below. A high score indicates that the two modules tend to be altered together in AD.

Fig.1 depicts the searching algorithm. It searches co-altered module pairs for the gene combinations within them that have the greatest co-alteration scores. In step 1, it methodically choose two modules from the AD altered network and two seed genes, one from each of these modules. The ellipsoids denote direct interaction partners for these genes. These are added to the seeds to make temporary module pairs. The dashed line represents co-alteration. In step 2, the co-alteration score for each temporary module pair is calculated. Only the pair with the maximal S score is retained for subsequent searching. This maximal group becomes the new seeds group in step 3. In step 4, temporary modules are again derived, this time from step 3, and the maximum score is kept. In step 5, it must determine whether or not this group of genes is going to continue to expand. Each new addition save for the original two starting seeds is removed and S is recalculated. If in one of these configurations S becomes smaller, we loop through steps 3 to 5 again. Otherwise, if all combinations equate to the S value of the gene groups chosen from step 4, the process stops, having assumed that we have reached maximal module size for the two starting seeds.

The P-values of the co-altered modules this algorithm identifies are modified by the Benjamini–Hochberg procedure and those with an FDR < 10% are kept. If a pair of co-altered modules share more than 50% of their genes with another pair, the one with the lowest S score is discarded. We should be left with modules that frequently exhibit significant co-alteration in AD patients, and their gene products are therefore likely to be biochemically significant in the disease state.

@@ Line 45: / Line 45: @@
 <div class="full_page">
-<div class="small_image_right" style="background-image:url('https://static.igem.org/mediawiki/2013/8/89/BioInformatics.gif');height:250px;width:258px"></div>"></div>
+<div class="small_image_right" style="background-image:url('https://static.igem.org/mediawiki/2013/8/89/BioInformatics.gif');height:250px;width:250px"></div>
 <p class="major_title">A BIOINFORMATICS APPROACH</p>
 <p class="minor_title">Finding New Parts</p>