Team:Shenzhen BGIC 0101/Tutorial/kgml
From 2013.igem.org
Tutorial
Kgml2Jason
Kgml2Jason grab genes’ details in different pathways, which from KEGG with KEGG Makeup Language (KGML) file and export genes list and relationship of genes. The goal here is to visualize the pathway and rebuild it in the level of genes.
Scripts
convert.py
This utility will read in KGML file which have been rewritten before, and grab genes’ information, such as genes’ name, genes’ relationship and then convert these into JSON.
Internal operation
First, this utility will use the parameter –f to determine the specified file and read it in. The output file will use the file name that put forward.
Second, this utility convert KGML file into JSON and grab the information of genes, such as the reactions of genes and the relationship between genes.
Third, this utility continue to integrate the information above into two files, ‘gene.json’ and ‘relation.json’, which can be use directly in rewrite gene’s pathway.
Example
python convert.py –f sce04111
Parameters:
default | |
---|---|
-f/--file | read KGML from FILENAME(omit '.xml'), produce two files: gene list and relation |
The format of output:
The output file is stored in the path where you running this program.
1. _gene.json
type:
ortholog | KO (orthology group) |
---|---|
enzyme | Enzyme |
reaction | Reaction |
gene | gene product (mostly a protein) |
group | a complex of gene products (mostly a protein complex) |
name | the KEGGID of this gene |
---|---|
type | the type of this gene |
name | the KEGGID of this reaction |
---|---|
reversible | true: reversible reaction; false: irreversible reaction |
substrates | KEGGID of substrate node |
products | the KEGGID of product node |
related-reactions | relate to another pathway or gene |
type | The type of this relation[ ECrel, PPrel. GErel, PCrel, maplink] |
---|---|
subtype | Interaction/relation information[activation/inhibition] |
entry1 | The first (from) entry that defines this relation |
entry2 | The second (to) entry that defines this relation |
entry ID | The KEGGID of node which takes part in this relation |
---|---|
type | Have only two options: [gene/group] |
name | The KEGGID of this gene |
group | The node is a complex of gene products (mostly a protein complex) |
Shortcoming
We can’t automatic acquisition KGML file in the KEGG API, all the demo we have show need to be downloaded before. You can get the entire list of one database genes through KEGG API, just like http://rest.kegg.jp/list/ko , it shows the entire list of orthology genes. The download method about KGML files shows in “How to finish this plun-in” part. Some genes will relate to another pathway but it doesn’t shows in the pathway so we are failed to grab the relationships between genes and pathways automatically. So we added two gene-pathway relations in ko04010 demo manually, TP53 gene connected with ko04115: P53 signaling pathway and NLK gene connected with ko04310: WNT signaling pathway. We look forward to the improvement of this plun-in through these disadvantages.
How to finish this plun-in?
KEGG is a database resource for understanding high-level functions and utilities of the biological system, and KGML is an XML presentation of the KEGG pathway database, which enables automatic drawing of KEGG pathways and provides facilities for computational analysis and modeling of gene/protein networks and chemical networks. Here is the data structure of KGML.
It’s really complex and it will bother you to understand the KGML file! Do not worry, I will show you how to understand a KGML file and then how to convert it into JSON.
First, how to find or download KGML file?
Method:
Download KGML" link for each pathway map.
If you choose a pathway with prefix “map”, you can’t find the download link in the page, that’s
because it can be generating almost ko/rn/ec/org files.
Such as “map04111”, it has no link for download. But if you change the “map” into “sce” in
the URL, you can get the file.
KEGG API:
Take “sce04111” as an example, you can download the KGML file via the this.
Second, the KGML file is difficult to find out the relationship through the data structure show above.
Here is the simple but straightforward tutorial to teach you how to understand a KGML file.
Take “sce04111.xml” as an example. We can simplify the data structure as:
The entry element can be path/ko/ec/rn/cpd/gl/org/group, enzyme/protein/gene will have relation and gene will also have compound and reactions. We choose one relation to be example:
It means that gene (YCL061C) have activation effect in gene (YPL153C).By the way, through the graphics elements, we can definitely rewrite the connection between these two genes.
Third, thought we know how to get KGML file and can understand it but the crucial problem is, how to grab the information of genes and convert it into JSON.
Here we use Python programming Language, install the library “lxml” for processing XML and HTML, and we import JSON library for convert.
Fourth, you may ask: If there have any other software to do such job, that is read KGML files, convert it and rewrite it?
Of course, and we indeed tried but failed for it’s not open source or the original source is difficult for modification.
Actually, if you want to do some visualize pathway job, Cytoscape is a good choice and it also have cytoscape.js for drawing.
Finally, we indeed done plentiful preparatory work, maybe in the end it’s not useful in this software but it can expand our horizons.