Team:Heidelberg/Tempaltes/iGEM42-W-20c
From 2013.igem.org
JuliaS1992 (Talk | contribs) (Created page with " == Data update and completion == There were many automatic and manual corrections done on the data. These included synchronisation of track and award naming, as well as some dou...") |
JuliaS1992 (Talk | contribs) |
||
Line 137: | Line 137: | ||
=== Updated scoring function === | === Updated scoring function === | ||
- | + | After the in depth curation of all the award names, the scoring function had to be updated, since some awards had never been included in the scoring due to their rare awarding and some were lost in the naming issues. After updating the scores list, the data conversion to RData was rerun and the scoring was briefly checked by reviewing the list of the "best" teams. | |
+ | |||
=== Manual data curation === | === Manual data curation === | ||
For some reason the 2011 championship awards were entirely missing on the standard results page. Thus we had to go directly to the jamboree results page and add the right awards to every team in 2011. This was done directly in the JSON file an the RData-file was updated imediately. | For some reason the 2011 championship awards were entirely missing on the standard results page. Thus we had to go directly to the jamboree results page and add the right awards to every team in 2011. This was done directly in the JSON file an the RData-file was updated imediately. |
Latest revision as of 01:24, 5 October 2013
Contents |
Data update and completion
There were many automatic and manual corrections done on the data. These included synchronisation of track and award naming, as well as some double or missing awards.
Name synchronisation
The naming of the tracks and awards was not conserved over the years, but basically kept the value they had. One example is the extension of the 2007 Energy track to the Food or Energy track in 2008. The same applies for other compined tracks, but there were also minor differences, that needed fixing, as for example medals starting with upper or lower case letters. Most of these were solved using regular expressions, when converting the data from JSON to R. See table 23.2 for all synchronisations made.
Championship awards | |||
---|---|---|---|
Regular expression | Replacement | Covered occurances | |
Grand Prize | Grand Prize |
| |
(1st)|(First) Runner Up | 1st Runner Up |
| |
(2nd)|(Second) Runner Up | 2nd Runner Up |
| |
Environment | Best Environment Project |
| |
Energy | Best Food & Energy Project |
| |
Health | Best Health & Medicine Project |
| |
Foundational | Best Foundational Advance Project |
| |
New Application | Best New Application Project |
| |
Part, Natural | Best New BioBrick Part, Natural |
(differenciating between great teams on this level | |
Best Model | Best Model |
| |
Information Processing | Best Information Processing Project |
| |
Software Tool | Best Software |
(Best Software Tools will be one with the Best Software) | |
Presentation | Best Presentation |
| |
Regional awards | |||
Regular expression | Replacement | Covered occurances | |
Grand Prize | Grand Prize |
Regional prizes always end | |
Finalist | Regional Finalist | ||
Human Practices | Best Human Practices Advance | ||
Experimental Measurement | Best Experimental Measurement Approach | ||
Model | Best Model | ||
Device, Engineered | Best New BioBrick Device, Engineered | ||
Part, Natural | Best New BioBrick Part, Natural | ||
Standard | Best New Standard | ||
Poster | Best Poster | ||
Presentation | Best Presentation | ||
Wiki | Best Wiki | ||
Safety | Safety Commendation | ||
Medals, Regions, Tracks | |||
Regular expression | Replacement | Covered occurances | |
[Bb]ronze | Bronze | upper or lower case medals | |
[Ss]ilver | Silver | ||
[Gg]old | Gold | ||
America | America | All regions on american continents were put together. | |
US | America | ||
Canada | America | ||
Medic | Health & Medicine |
| |
Energy | Food & Energy |
| |
Foundational | Foundational Advance |
|
Updated scoring function
After the in depth curation of all the award names, the scoring function had to be updated, since some awards had never been included in the scoring due to their rare awarding and some were lost in the naming issues. After updating the scores list, the data conversion to RData was rerun and the scoring was briefly checked by reviewing the list of the "best" teams.
Manual data curation
For some reason the 2011 championship awards were entirely missing on the standard results page. Thus we had to go directly to the jamboree results page and add the right awards to every team in 2011. This was done directly in the JSON file an the RData-file was updated imediately.
Bug-fix: NA-values
The award filters match the full team list to retrieve the names of the teams to keep in the dataset. These are then taken from the already reduced data set, which produces empty rows in the data for those teams, that match the award filter and were already removed the data-frame by another filter prior to the award matching. The naming of those empty rows is "NA.number". Thus in order to remove these empty rows those row-names containing "NA" were removed. This was a really bad idea, because we spend lots of time trying to find out why our tool doesn't like the UNAM MEXICO teams. This bug was fixed by adding a dot to the regular expression and separately removing the first empty row, which would exactly match "NA".