Team:Shenzhen BGIC 0101/Tutorial/kgml
From 2013.igem.org
Tutorial
Presentation from KGML
This module will grab genes’ details in different pathways, which from KEGG with KEGG Makeup Language (KGML) file and export genes list and relationship of genes. The goal here is to visualize the pathway and rebuild it in the level of genes.
Scripts
1. keggid_convert_gene.pl
This utility can convert KEGGID which in KGML file into genes’ name and rewrite KGML file.
Internal operation
First, this utility will change the pathway’s name into KEGG database names, and then open the file with the entire list of genes, push them in hash.
Second, this utility will read the original pathway’s xml file in and replace KEGGID with gene’s name one by one. Furthermore, it will change type element all into “gene”.
Thirdly, it will be the substitution of original pathway’s xml file.
Example
perl keggid_convert_gene.pl ko04010
The format of output:
It will rewrite the original pathway’s xml file, if we have following statement: convert.py This utility will read in KGML file which have been rewritten before, and grab genes’ information, such as genes’ name, genes’ relationship and then convert these into JSON. Internal operation First, this utility will use the parameter –f to determine the specified file and read it in. The output file will use the file name that put forward. Example python convert.py –f sce04111 Parameters: The format of output: The output file is stored in the path where you running this program.
Second, this utility convert KGML file into JSON and grab the information of genes, such as the reactions of genes and the relationship between genes.
Third, this utility continue to integrate the information above into two files, ‘gene.json’ and ‘relation.json’, which can be use directly in rewrite gene’s pathway.
default -f/--file read KGML from FILENAME(omit '.xml'), produce two files: gene list and relation
1. _gene.json
type:
geneID:the unique identification of geneortholog KO (orthology group) enzyme Enzyme reaction Reaction gene gene product (mostly a protein) group a complex of gene products (mostly a protein complex)
reaction:name the KEGGID of this gene type the type of this gene
2. _relation.json
relations:name the KEGGID of this reaction reversible true: reversible reaction; false: irreversible reaction substrates KEGGID of substrate node products the KEGGID of product node related-reactions relate to another pathway or gene
entry1&2:type The type of this relation[ ECrel, PPrel. GErel, PCrel, maplink] subtype Interaction/relation information[activation/inhibition] entry1 The first (from) entry that defines this relation entry2 The second (to) entry that defines this relation
entry ID The KEGGID of node which takes part in this relation type Have only two options: [gene/group] name The KEGGID of this gene group The node is a complex of gene products (mostly a protein complex)
Shortcoming
We can’t automatic acquisition KGML file in the KEGG API, all the demo we have show need to be downloaded before. You can get the entire list of one database genes through KEGG API, just like http://rest.kegg.jp/list/ko , it shows the entire list of orthology genes. The download method about KGML files shows in “How to finish this plun-in” part. Some genes will relate to another pathway but it doesn’t shows in the pathway so we are failed to grab the relationships between genes and pathways automatically. So we added two gene-pathway relations in ko04010 demo manually, TP53 gene connected with ko04115: P53 signaling pathway and NLK gene connected with ko04310: WNT signaling pathway. We look forward to the improvement of this plun-in through these disadvantages.
How to finish this plun-in?
KEGG is a database resource for understanding high-level functions and utilities of the biological system, and KGML is an XML presentation of the KEGG pathway database, which enables automatic drawing of KEGG pathways and provides facilities for computational analysis and modeling of gene/protein networks and chemical networks. Here is the data structure of KGML.
It’s really complex and it will bother you to understand the KGML file! Do not worry, I will show you how to understand a KGML file and then how to convert it into JSON.
First, how to find or download KGML file?
Method:
Download KGML" link for each pathway map.
If you choose a pathway with prefix “map”, you can’t find the download link in the page, that’s
because it can be generating almost ko/rn/ec/org files.
Such as “map04111”, it has no link for download. But if you change the “map” into “sce” in
the URL, you can get the file.
KEGG API:
Take “sce04111” as an example, you can download the KGML file via the this.
Second, the KGML file is difficult to find out the relationship through the data structure show above.
Here is the simple but straightforward tutorial to teach you how to understand a KGML file.
Take “sce04111.xml” as an example. We can simplify the data structure as:
The entry element can be path/ko/ec/rn/cpd/gl/org/group, enzyme/protein/gene will have relation and gene will also have compound and reactions. We choose one relation to be example:
It means that gene (YCL061C) have activation effect in gene (YPL153C).By the way, through the graphics elements, we can definitely rewrite the connection between these two genes.
Third, thought we know how to get KGML file and can understand it but the crucial problem is, how to grab the information of genes and convert it into JSON.
Here we use Python programming Language, install the library “lxml” for processing XML and HTML, and we import JSON library for convert.
Fourth, you may ask: If there have any other software to do such job, that is read KGML files, convert it and rewrite it?
Of course, and we indeed tried but failed for it’s not open source or the original source is difficult for modification.
Actually, if you want to do some visualize pathway job, Cytoscape is a good choice and it also have cytoscape.js for drawing.
Finally, we indeed done plentiful preparatory work, maybe in the end it’s not useful in this software but it can expand our horizons.