Team:Heidelberg/Tempaltes/iGEM42-W-9b
From 2013.igem.org
JuliaS1992 (Talk | contribs) |
JuliaS1992 (Talk | contribs) |
||
Line 1: | Line 1: | ||
- | |||
== Text analysis drafts == | == Text analysis drafts == | ||
For the text analysis we used python and it's nltk platform (see [http://nltk.org]). The only text corpus used was the stopword corpus.<br/> | For the text analysis we used python and it's nltk platform (see [http://nltk.org]). The only text corpus used was the stopword corpus.<br/> | ||
Independent of the analysis done a "stemmer" was run on the abstracts, which reduces all words to their very basic form. For the topwords and information content calculation simple counting was performed and for the information content the proportion of the stopwords corpus in the whole was determined.<br/> | Independent of the analysis done a "stemmer" was run on the abstracts, which reduces all words to their very basic form. For the topwords and information content calculation simple counting was performed and for the information content the proportion of the stopwords corpus in the whole was determined.<br/> | ||
- | For the extraction of the meshterms a list of terms in synthetic biology from .... was used. Here the stemmer was applied to both the abstract and the words list and the matches were again counted. | + | For the extraction of the meshterms a list of terms in synthetic biology from Paul Oldahm and his colleagues (Oldham, P., Hall, S., & Burton, G. (2012). Synthetic Biology: Mapping the Scientific Landscape. PLoS ONE, 7(4), e34368. doi:10.1371/journal.pone.0034368) was used. Here the stemmer was applied to both the abstract and the words list and the matches were again counted. |
Latest revision as of 02:12, 5 October 2013
Text analysis drafts
For the text analysis we used python and it's nltk platform (see [http://nltk.org]). The only text corpus used was the stopword corpus.
Independent of the analysis done a "stemmer" was run on the abstracts, which reduces all words to their very basic form. For the topwords and information content calculation simple counting was performed and for the information content the proportion of the stopwords corpus in the whole was determined.
For the extraction of the meshterms a list of terms in synthetic biology from Paul Oldahm and his colleagues (Oldham, P., Hall, S., & Burton, G. (2012). Synthetic Biology: Mapping the Scientific Landscape. PLoS ONE, 7(4), e34368. doi:10.1371/journal.pone.0034368) was used. Here the stemmer was applied to both the abstract and the words list and the matches were again counted.