on Thursday, October 20th, 2011 2:44 | by Julien Colomb
looking for a platform to share our trajectory data, I got interested in buzz data. They started a little contest, and since it seemed to be a nice way to test the functionality of the web site, I participated.
I downloaded the dataset of water consumption in Canada for the last years. The data is split by year and ward. I focused on the total water consumption and ran a little analysis.
A simple ANOVA shows that the total consumption is dependent on the year, the ward and the combination of the two (this means that the differences between years consumption is not equivalent in the distinct wards). In the visualization, one can see that the mean consumption decreased in the last three years (black dots are means of consumption for each year). In the trace for each ward, we can pick particular wards with specific results. For instance ward 11 had a huge increase in consumption in 2002; and ward 2 decrease its (relative to the other very important) consumption every year since 2002.
Df Sum Sq Mean Sq F value Pr(>F) data$Year 1 6.2445e+13 6.2445e+13 110.0587 < 2.2e-16 *** data$Ward 43 5.1300e+15 1.1930e+14 210.2682 < 2.2e-16 *** data$Year:data$Ward 43 1.0535e+14 2.4500e+12 4.3181 2.399e-15 *** Residuals 396 2.2468e+14 5.6738e+11 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The test show that it is quite easy to take data and perform its own analysis. Buzz data is not (yet?) very good in updating data, since there is no way to directly add data (in this case, once the 2011 data come, it will be needed to download the dataset, add the new data and reupload the new dataset. Not very convenient.
Category: open science