Team:Heidelberg/Tempaltes/iGEM42-W-3b

From 2013.igem.org

(Difference between revisions)

Latest revision as of 00:24, 5 October 2013

Reimplementation of scraping

As the scraping scripts we started out with had a rather high run time and were organised pretty complicated, the whole thing was rewritten. The spiders use the central team (https://igem.org/Team_List?year=%d) and results (https://igem.org/Results?year=%d) page per year to go through all the teams at once instead of opening, editing and closing the JSON results file for every team. Within each team page HtmlXPathSelectors are used to generate the single values. As this of course depending on the exact same page structure, nothing but the two central pages can be easily scraped using these detailed selectors.

Revision as of 00:21, 5 October 2013 (view source) JuliaS1992 (Talk \| contribs) ← Older edit		Latest revision as of 00:24, 5 October 2013 (view source) JuliaS1992 (Talk \| contribs)
Line 1:		Line 1:
		+
	== Reimplementation of scraping ==		== Reimplementation of scraping ==
	As the scraping scripts we started out with had a rather high run time and were organised pretty complicated, the whole thing was rewritten. The spiders use the central team (https://igem.org/Team_List?year=%d) and results (https://igem.org/Results?year=%d) page per year to go through all the teams at once instead of opening, editing and closing the JSON results file for every team. Within each team page HtmlXPathSelectors are used to generate the single values. As this of course depending on the exact same page structure, nothing but the two central pages can be easily scraped using these detailed selectors.		As the scraping scripts we started out with had a rather high run time and were organised pretty complicated, the whole thing was rewritten. The spiders use the central team (https://igem.org/Team_List?year=%d) and results (https://igem.org/Results?year=%d) page per year to go through all the teams at once instead of opening, editing and closing the JSON results file for every team. Within each team page HtmlXPathSelectors are used to generate the single values. As this of course depending on the exact same page structure, nothing but the two central pages can be easily scraped using these detailed selectors.