The future of data portals

This article just came across my reader and is timely to our discussion:

I also found this article to be relevant even now:


I actually agree with both authors.

For the first piece, I cannot agree more with Camellia. You need to open data with a purpose. You need to start with the question you’re trying to answer, then sourcing and publishing the data you need to answer it, and taking it further by actually creating a solution to answer the problem.

My favorite example of this is what WPRDC did in response to feedback from their first Property Data User Group meeting cc @rgradeck @steve.saylor @dwalker

As for the second piece, I agree that we need to move beyond Open Data Portals as “a website raw of data”, but as essential infrastructure for digital government.

This requires government to start treating Data as a Strategic Asset, opening data not just to the public for transparency purposes, but internally as well to promote a data-driven culture.

Here’s a talk I gave at the Open Data Science Conference in Boston in 2016.

My main assertion being that Nextgen Open Data needs to be 1) open source; 2) open standards; 3) federated; 4) be treated as essential data infrastructure and 5) Open Knowledge - open data that is useful, usable, and used.

The main analogy I used in that presentation is the History of the Web - where we started with proprietary walled gardens (Compuserve, Prodigy, Minitel, etc.), and it only became widely available/successful when the industry Chose Open.

One other analogy I often make is that we’re at the “Yahoo Directory” stage of open data. Recall Yahoo’s manually curated catalog during the 90’s vs. the experience we have now with Google. Yahoo is actually a backronym for “Yet Another Hierarchically Organized Oracle.”

We need to get to a state where we have Open Data for All - where users (both internal and external) can readily get answers to their questions, in addition to the raw datasets.

Where government staff are incented to “open data by default” by using a platform where they can collaborate on internal data operationally on a day-to-day basis, with a simple workflow to not only derive the publicly available data from the same internal operational dataset, but to publish data-driven solutions/stories that answer the questions of different audiences - from the public, from their colleagues, from other agencies, from policy-makers.


+1000 for making the dmoz reference.
my site is an open directory.
one thing i absolutely hate about aws is default closed directories. in fact, i’ve yet to see a site on aws with open directories.
worst about about all of this for me, is explaining this to open/civic groups, and no one having a clue or understanding the importance of it.
or just rejecting it out right because its not aws so therefore it is inferior.


Here’s another deployment pattern we’re starting to see more of - Integrated Data Systems.

The nice thing about CKAN’s org-based access control is that it can support this pattern.

1 Like