Expansion of Filters
The last filters missing (medal and abstract), another filter for team names and one for the information content of the abstract were implemented. Additionally the selection ranges of some filters were adjusted. See table 23.1 for the final filter setups.
Table 23.1: Overview of all final filters.
Parameter | Type | Options* | Status
|
Year | numeric | 2007-2012 | final
|
Region | string | levels | final
|
Track | string | levels | final
|
Students | numeric | 0, 5, 10, 15, 20, >20 | final
|
Advisors | numeric | 0, 2, 4, 6, 8, 10, 12, 14, >14 | final
|
Instructors | numeric | 0, 2, 5, 10, 15, >15 | final
|
Biobricks | numeric | 0, 5, 10, 20, 50, 100, 200, >200 | final
|
Championship | character vector | levels | final
|
Regional | character vector | levels | final
|
Medals | string | levels | final
|
Score | numeric | 0-100 (steps of 10) | final
|
Abstract | binary | all/only provided | final
|
Information content | numeric | 0, 0.4, 0.45, 0.5, 0.55, 0.6 | final
|
Team name | string | variable entry | final
|
* Levels means all possible values the parameter can have, which is either generated automatically or manually.
|
Abstract
The filter for the abstract only checks, whether an abstract was submitted or not. See the pseudocode:
if user wants to see all teams return full data set
else {
iterate through all row names in the data-frame {
if the abstract of the team "row name" in the data-list matches the string "-- No abstract provided yet --" save the name to a vector
}
delete all teams without abstract
return the reduced data set
}
Since the information content is also related to the abstract, they are put together in one tab of the filter notebook. As it turned out that the information content always is in a very small range around 50%, the possible selections were chosen accordingly. The filter function is similar to those of any other numeric parameter, where the only difference is that all choices are numeric and thus the different cases of minimum and maximum selection can be skipped.
Medal
The medal filter is just the same as the filters for track or region, since it is also saved in the data-frame.
Team Name
For filtering the team name a text entry box was introduced, together with a javascript, that checks on the entered search term. A list of all team names was created and converted to a javascript array using R and its package RJSONIO. The javascript code then compares the entered term(s) with this array and changes the text color to either red (no exact match) or black (exact match). To allow for multiple name entries comma-serapated terms can be entered and are automatically separated by the javascript for color changing and by the filter function.
The pseudocode of the filter funciton looks like this:
split the input string at every comma
iterate through all strings {
find all teams having the string in their name and keep their ids
}
reduce and return dataset
Team table output
All the apps so far only displayed data, but with no option to see which teams contrbuted to each portion of the graphs. Thus a table output was implemented, that shows the teams' names, years and wiki links. To allow for better overview, one can select how many tams should be display (none, 5, 10, 20, 50, 100 or all) and in which order. This can be done either by the score, displaying the best teams at the top of the list, by year starting with most recent teams, or alphabetically. The data-frame displayed always represents the fully filtered data. Thus the user can narrow the number of teams down using the filters and can then go directly to the corresponding wiki pages.
Facts and Figures app
Since the type of data handled in the scatterplot and the timeline app is very similar, the two apps were put together into one app changing it's appearance depending on the chosen x-axis scale. For this a conditional panel was used to either display selection for the scatterplot y-axis parameters or the one for the timeline-summing parameters.
In order to have a continuous design and to be able to target both plot types to the same div the scatterplot was changed from rPlot to nPlot. This means it is now generated using nvd3 instead of the standard R-methods.
Additionally a selection for the categories was added. The data displayed can now be grouped by either region, track or medal awarded. The grouping per year is no longer needed, since the timeline gives the optimal time-resolution of the data.
Methods app
The goal of the methods app was to quickly lead the user to wikis providing high quality protocols for various methods. For this purpose the methods were clustered and now the interface was implemented. It contains two main drop down menus - one for the method cluster and one for the particular method. The options given in the second drop down menu are determined by the first one via conditional panels. There is only one responsive filter applied to the data, which is similar to that for matching the awards. The only differences are, that exact matches instead of regular expressions are detected in the data set and that no empty rows have to be deleted resulting data set. The result is displayed only through the table containing all teams matching the method, which can again be customized regarding the number and order of teams.
To give an overview over the most popular - or until now the most mentioned - methods a table displaying the top 10 methods was added to be constantly displayed in the application. Unfortunately this list only contains 9 elements using the current list of methods and thus the list has to be extended and the text analysis has to be optimised.
Topics app
In the topics app the main element is the entry box, where the user can enter a term to find, which works very similar to the entry box for team name filter. The text turns red when there is no exact match in the meshterms assigned to the teams and otherwise black. Those teams exactly matching the searched term are again displayed in a tabular format. The filter function currently is exactly the same as for the methods, but has to be altered, to also allow for typing mistakes, case unsensitivity and partial string matches. Here we also added a list of the top 10 topics.
Data update and completion
There were many automatic and manual corrections done on the data. These included synchronisation of track and award naming, as well as some double or missing awards.
Name synchronisation
The naming of the tracks and awards was not conserved over the years, but basically kept the value they had. One example is the extension of the 2007 Energy track to the Food or Energy track in 2008. The same applies for other compined tracks, but there were also minor differences, that needed fixing, as for example medals starting with upper or lower case letters. Most of these were solved using regular expressions, when converting the data from JSON to R. See table 23.2 for all synchronisations made.
Championship awards
|
Regular expression | Replacement | Covered occurances
|
Grand Prize | Grand Prize |
- Grand Prize
- Grand Prize, Winner of the BioBrick Trophy
- Grand Prize Winner
|
(1st)|(First) Runner Up | 1st Runner Up |
- 1st Runner Up
- First Runner Up
- 1st Runner Up, Winner of the PoPS Prize
|
(2nd)|(Second) Runner Up | 2nd Runner Up |
- 2nd Runner Up
- Second Runner Up
- 2nd Runner Up, Winner of the Synthetic Standard
|
Environment | Best Environment Project |
- Best Environment Project
- Best Environmental Project
- Environmental Sensing
|
Energy | Best Food & Energy Project |
- Best Food & Energy Project
- Best Food or Energy Project
- Energy
|
Health | Best Health & Medicine Project |
- Best Health & Medicine Project
- Best Health or Medicine Project
- Health & Medicine
|
Foundational | Best Foundational Advance Project |
- Best Foundational Advance Project
- Best Foundational Advance
- Best Foundational Tech.
- Foundational Research
|
New Application | Best New Application Project |
- Best New Application Project
- Best New Application Area
|
Part, Natural | Best New BioBrick Part, Natural |
- Best New BioBrick Part, Natural
- Best New BioBrick Part, Natural, Runner Up
(differenciating between great teams on this level doesn't serve the purpose of the tool)
|
Best Model | Best Model |
- Best Model
- Best Modeling / Sim.
|
Information Processing | Best Information Processing Project |
- Best Information Processing Project
- Information Processing
|
Software Tool | Best Software |
(Best Software Tools will be one with the Best Software)
|
Presentation | Best Presentation |
- Best Presentation
- Best Presentation, Runner Up
|
Regional awards
|
Regular expression | Replacement | Covered occurances
|
Grand Prize | Grand Prize |
Regional prizes always end
with the corresponding region.
Besides this other minor differences
as for the championhsip awards are removed.
|
Finalist | Regional Finalist
|
Human Practices | Best Human Practices Advance
|
Experimental Measurement | Best Experimental Measurement Approach
|
Model | Best Model
|
Device, Engineered | Best New BioBrick Device, Engineered
|
Part, Natural | Best New BioBrick Part, Natural
|
Standard | Best New Standard
|
Poster | Best Poster
|
Presentation | Best Presentation
|
Wiki | Best Wiki
|
Safety | Safety Commendation
|
Medals, Regions, Tracks
|
Regular expression | Replacement | Covered occurances
|
[Bb]ronze | Bronze | upper or lower case medals
|
[Ss]ilver | Silver
|
[Gg]old | Gold
|
America | America | All regions on american continents were put together.
|
US | America
|
Canada | America
|
Medic | Health & Medicine |
- Health & Medicine
- Health/Medicine
- Medical
|
Energy | Food & Energy |
- Food & Energy
- Food/Energy
- Energy
|
Foundational | Foundational Advance |
- Foundational Advance
- Foundational Research
|
Updated scoring function
After the in depth curation of all the award names, the scoring function had to be updated, since some awards had never been included in the scoring due to their rare awarding and some were lost in the naming issues. After updating the scores list, the data conversion to RData was rerun and the scoring was briefly checked by reviewing the list of the "best" teams.
Manual data curation
For some reason the 2011 championship awards were entirely missing on the standard results page. Thus we had to go directly to the jamboree results page and add the right awards to every team in 2011. This was done directly in the JSON file an the RData-file was updated imediately.
Bug-fix: NA-values
The award filters match the full team list to retrieve the names of the teams to keep in the dataset. These are then taken from the already reduced data set, which produces empty rows in the data for those teams, that match the award filter and were already removed the data-frame by another filter prior to the award matching. The naming of those empty rows is "NA.number". Thus in order to remove these empty rows those row-names containing "NA" were removed. This was a really bad idea, because we spend lots of time trying to find out why our tool doesn't like the UNAM MEXICO teams. This bug was fixed by adding a dot to the regular expression and separately removing the first empty row, which would exactly match "NA".