Team:Heidelberg/Tempaltes/iGEM42-W-3b
From 2013.igem.org
(Difference between revisions)
JuliaS1992 (Talk | contribs) |
JuliaS1992 (Talk | contribs) |
||
Line 1: | Line 1: | ||
+ | |||
== Reimplementation of scraping == | == Reimplementation of scraping == | ||
As the scraping scripts we started out with had a rather high run time and were organised pretty complicated, the whole thing was rewritten. The spiders use the central team (https://igem.org/Team_List?year=%d) and results (https://igem.org/Results?year=%d) page per year to go through all the teams at once instead of opening, editing and closing the JSON results file for every team. Within each team page HtmlXPathSelectors are used to generate the single values. As this of course depending on the exact same page structure, nothing but the two central pages can be easily scraped using these detailed selectors. | As the scraping scripts we started out with had a rather high run time and were organised pretty complicated, the whole thing was rewritten. The spiders use the central team (https://igem.org/Team_List?year=%d) and results (https://igem.org/Results?year=%d) page per year to go through all the teams at once instead of opening, editing and closing the JSON results file for every team. Within each team page HtmlXPathSelectors are used to generate the single values. As this of course depending on the exact same page structure, nothing but the two central pages can be easily scraped using these detailed selectors. |
Latest revision as of 00:24, 5 October 2013
Reimplementation of scraping
As the scraping scripts we started out with had a rather high run time and were organised pretty complicated, the whole thing was rewritten. The spiders use the central team (https://igem.org/Team_List?year=%d) and results (https://igem.org/Results?year=%d) page per year to go through all the teams at once instead of opening, editing and closing the JSON results file for every team. Within each team page HtmlXPathSelectors are used to generate the single values. As this of course depending on the exact same page structure, nothing but the two central pages can be easily scraped using these detailed selectors.