Team:Heidelberg/Tempaltes/iGEM42-W-9b

From 2013.igem.org

Text analysis drafts

For the text analysis we used python and it's nltk platform (see [1]). The only text corpus used was the stopword corpus.
Independent of the analysis done a "stemmer" was run on the abstracts, which reduces all words to their very basic form. For the topwords and information content calculation simple counting was performed and for the information content the proportion of the stopwords corpus in the whole was determined.
For the extraction of the meshterms a list of terms in synthetic biology from Paul Oldahm and his colleagues (Oldham, P., Hall, S., & Burton, G. (2012). Synthetic Biology: Mapping the Scientific Landscape. PLoS ONE, 7(4), e34368. doi:10.1371/journal.pone.0034368) was used. Here the stemmer was applied to both the abstract and the words list and the matches were again counted.