Team:Shenzhen BGIC 0101/Tutorial/kgml




Kgml2Jason grab genes’ details in different pathways, which from KEGG with KEGG Makeup Language (KGML) file and export genes list and relationship of genes. The goal here is to visualize the pathway and rebuild it in the level of genes.


This utility will read in KGML file which have been rewritten before, and grab genes’ information, such as genes’ name, genes’ relationship and then convert these into JSON.

Internal operation

First, this utility will use the parameter –f to determine the specified file and read it in. The output file will use the file name that put forward.
Second, this utility convert KGML file into JSON and grab the information of genes, such as the reactions of genes and the relationship between genes.
Third, this utility continue to integrate the information above into two files, ‘gene.json’ and ‘relation.json’, which can be use directly in rewrite gene’s pathway.


python –f sce04111


-f/--fileread KGML from FILENAME(omit '.xml'), produce two files: gene list and relation

The format of output:

The output file is stored in the path where you running this program.
1. _gene.json

orthologKO (orthology group)
genegene product (mostly a protein)
groupa complex of gene products (mostly a protein complex)
geneID:the unique identification of gene
namethe KEGGID of this gene
typethe type of this gene
namethe KEGGID of this reaction
reversibletrue: reversible reaction; false: irreversible reaction
substratesKEGGID of substrate node
productsthe KEGGID of product node
related-reactionsrelate to another pathway or gene
2. _relation.json relations:
typeThe type of this relation[ ECrel, PPrel. GErel, PCrel, maplink]
subtypeInteraction/relation information[activation/inhibition]
entry1The first (from) entry that defines this relation
entry2The second (to) entry that defines this relation
entry IDThe KEGGID of node which takes part in this relation
typeHave only two options: [gene/group]
nameThe KEGGID of this gene
groupThe node is a complex of gene products (mostly a protein complex)


We can’t automatic acquisition KGML file in the KEGG API, all the demo we have show need to be downloaded before. You can get the entire list of one database genes through KEGG API, just like , it shows the entire list of orthology genes. The download method about KGML files shows in “How to finish this plun-in” part. Some genes will relate to another pathway but it doesn’t shows in the pathway so we are failed to grab the relationships between genes and pathways automatically. So we added two gene-pathway relations in ko04010 demo manually, TP53 gene connected with ko04115: P53 signaling pathway and NLK gene connected with ko04310: WNT signaling pathway. We look forward to the improvement of this plun-in through these disadvantages.

How to finish this plun-in?

KEGG is a database resource for understanding high-level functions and utilities of the biological system, and KGML is an XML presentation of the KEGG pathway database, which enables automatic drawing of KEGG pathways and provides facilities for computational analysis and modeling of gene/protein networks and chemical networks. Here is the data structure of KGML.
It’s really complex and it will bother you to understand the KGML file! Do not worry, I will show you how to understand a KGML file and then how to convert it into JSON.
First, how to find or download KGML file?
Download KGML" link for each pathway map.
If you choose a pathway with prefix “map”, you can’t find the download link in the page, that’s because it can be generating almost ko/rn/ec/org files.
Such as “map04111”, it has no link for download. But if you change the “map” into “sce” in the URL, you can get the file.
KEGG API: Take “sce04111” as an example, you can download the KGML file via the this.
Second, the KGML file is difficult to find out the relationship through the data structure show above.
Here is the simple but straightforward tutorial to teach you how to understand a KGML file.
Take “sce04111.xml” as an example. We can simplify the data structure as:

The entry element can be path/ko/ec/rn/cpd/gl/org/group, enzyme/protein/gene will have relation and gene will also have compound and reactions. We choose one relation to be example:

It means that gene (YCL061C) have activation effect in gene (YPL153C).By the way, through the graphics elements, we can definitely rewrite the connection between these two genes.
Third, thought we know how to get KGML file and can understand it but the crucial problem is, how to grab the information of genes and convert it into JSON.
Here we use Python programming Language, install the library “lxml” for processing XML and HTML, and we import JSON library for convert.
Fourth, you may ask: If there have any other software to do such job, that is read KGML files, convert it and rewrite it?
Of course, and we indeed tried but failed for it’s not open source or the original source is difficult for modification. Actually, if you want to do some visualize pathway job, Cytoscape is a good choice and it also have cytoscape.js for drawing.
Finally, we indeed done plentiful preparatory work, maybe in the end it’s not useful in this software but it can expand our horizons.