There have been a lot of unpublished datasets appearing on the government’s open data portal over the past couple of months. This is part of the response to Stephan Shakespeare’s review of Public Sector Information.
In his review, Shakespeare recommended that the government identify what he referred to as National Core Reference Data. He defined this as being the high quality core data that the public sector maintains already and said that he would “expect to find the connective tissue of place and location, the administrative building blocks of registered legal entities, the details of land and property ownership” in this collection.
The government’s response has been to rename?National Core Reference Data to the National Information Infrastructure. Rather than deciding which datasets should be part of that infrastructure themselves they have been releasing the details of unpublished datasets held within government on to data.gov.uk.
When doing this they are asking members of the public to comment on them and say if releasing them would create economic, social and/or efficiency benefits.?Following an initial consultation period?potential candidates for the National Information Infrastructure?are being discussed ahead of release during the Open Government Partnership’s Summit at the end of the month.
However, this release will be a first draft and the National Information Infrastructure?will change over time. Which is a good thing as, with over?4000 unpublished datasets on data.gov.uk , there haven’t all that many people commenting on them. At the time of writing, just 95 out of 4305 unpublished datasets have any feedback against them.
If you look at the unpublished datasets that have been listed by the Ministry for Justice, there are 36 out of 43 that have no feedback against them.
Given the work that has been put in by people participating on the Open Data Challenge Series on Crime and Justice – the competition weekend is next week and there’s still time to sign up and take part – we ought to be in a good place to increase the number of comments on unpublished datasets relating to crime and justice. I’m going to spend some time going through them over the coming weeks, especially after the weekend.
There may be some?some significant unpublished datasets that are not listed, and?Owen Boswara has been collating these on an openly editable Google doc. One of the advantages of taking an open approach and crowdsourcing the datasets in the?National Information Infrastructure is that it makes it easier to identify inaccuracies and incompleteness.
One thing I’ve noticed is that there has been some concern expressed that this could be an exercise that initiates the publication of private data. An example of this is the listing of the Dartford Crossing?Payment System whose fields include: “Names, addresses (inc e-mail), telephone numbers, vehicles registration, bank / credit card details, for users of DART-Tag.”. I somehow doubt that there is any intention of releasing loads of drivers’ bank details to the web, but some clarification on the site would be helpful.
This is an opportunity to influence the priorities of open data releases and it would be good if we were to increase the amount of participation in the process.