Data Collaboration
Jon Udell is one of the guest speakers at the Science 2.0 talks on July 29. I asked him, “Can you bring a bucket full of links to examples of cool things people are doing with civic data that Toronto could emulate?” He responded:
This is the dilemma. There’s lots of geek navel-gazing in this space but darned few tangible outcomes—and I’ve been looking hard for them for going on 4 years, ever since the DCStat project got going in Washington.
Personally I think the focus needs to shift from cool things people are doing with civic data that cities unilaterally dump online to how cities should be collaborating with citizens to work out what questions need answering, what data is being gathered, or could be gathered, to address those questions, and how best to publish the data in order to enable the desired analysis.
He discusses this further in his post on influencing the production of public data. So let me throw it open: what do you want your city/county/province/national government to put online for you to play with, and why?
I’d love to have access to real-time municipal works data: where the buses are, where street closures are, where the garbage trucks are, where traffic is bad, municipal parks bookings (soccer fields, ice rinks…), etc.
Crime location data from city police would be useful too, I think. In my city, there’s quite a bit of stereotyping about where crime happens, and it’d be really nice to be able to make objective statements about this.
There’s probably quite a bit more, but that’s a start.
Everyblock is a great example of aggregating civic data. Their specialization could be summarized as: display of anything which has a location associated with it. The site lets you see crimes, building permits, street closures, foreclosures, restaurant inspections, news stories, even geo-tagged photos by location. It also generates custom RSS feeds so you can stay up to date on these things within a user-defined radius of your home.
It takes a lot of effort to get cities to open up the data into a form which Everyblock can use, though. There was a good talk at PyCon 2009 from Adrian Holovaty about the nuts and bolts of the site, but the Q&A at the end also talked about how they try to get cities to make this data available.
There’s also the OECD Factbook:
http://stats.oecd.org/oecdfactbook/
Also interesting to think about how city data could be used to support smaller-scale greenery, like this attempt to build a green hotel: http://blog.marsdd.com/2009/07/14/the-75-5-plan-lessons-from-building-north-americas-greenest-hotel/
Or this project to crowdsource information in a crisis: http://ushahidi.com/. (Lots of possibilities for abuse/subversion, from what I can tell, but still could be very useful. Anyone have experience with it?)
You should check out Your Mapper http://www.yourmapper.com/
There is city, state and federal data mapped out for the public, with more being added. Things like crime, sex offenders, restaurant health reviews, building permits, meth labs, pollution, etc.
And there are some great tools with each map, like RSS and KML feeds, a mobile phone version, a programmer API, and you can embed the maps on your own site.
And how could I forget chicagocrime.org (now http://chicago.everyblock.com/crime/)?
The real question raised by Jon Udell (via this post) is “should cities be collaborating with citizens to work out what questions need answering, what data is being gathered, or could be gathered, to address those questions, and how best to publish the data in order to enable the desired analysis.”
I adivse the Mayor’s office in Vancouver on Open Data and agree with Udell in in part. Yes cities need to shift how they perceive themselves from just organizations delivering services to citizens, to being a platform that delivers services but also collects and disseminates data that others can repurpose.
So should cities be thinking about what data they should collect? Absolutely. Should they be doing that in conjunction with citizens? Absolutely.
There is a danger to this approach however. One danger is that cities will only release data if it knows how citizens are going to use it. Thus data that might bring real accountability (such as procurement data, as opposed to just knowing where street signs are) might not get shared (or prioritized). The real goal is not to figure out what data should be shared, but to figure out what data can’t be shared. If we limit ourselves to what we can imagine will be done we restrict the possibilities to those who happen to be in the room when the decision is made.
The other danger is that we’ll get distracted. The fact is cities already collect A LOT of data. Beginning a process of figuring out what data should be collected and shared eats up time that could be spent putting the enormous amount of data cities already collect online.