Team:Heidelberg/Tempaltes/iGEM42-W-9b

From 2013.igem.org

Revision as of 00:44, 5 October 2013 by JuliaS1992 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Text analysis drafts

For the text analysis we used python and it's nltk platform (see [http://nltk.org]). The only text corpus used was the stopword corpus.
Independent of the analysis done a "stemmer" was run on the abstracts, which reduces all words to their very basic form. For the topwords and information content calculation simple counting was performed and for the information content the proportion of the stopwords corpus in the whole was determined.
For the extraction of the meshterms a list of terms in synthetic biology from .... was used. Here the stemmer was applied to both the abstract and the words list and the matches were again counted.