One of the important things the Open Data Census doesn’t consider is the usability of data. Datasets can meet all the hard criteria tracked in the census, but due to poor usability they remain difficult to use. Documentation of data may be hidden or non-existent, terms used to describe data cryptic for those without insider knowledge, or data sources buried in unindexed parts of the Web. For example, we encountered this with the public procurement data from the Czech Republic, which satisfies many of the hard criteria of the census, yet fails on many levels of usability.
I’d like to suggest to consider data usability in the future editions of the Open Data Census.
Agreed. If you want to evaluate usability of data on a solid basis, then it is going to be difficult. However, it can start simply as a free-text comment on usability. Possibly a simple Likert scale could be used for subjective evaluation of the overall usability. Decomposing usability into more specific metrics would need input from usability experts.
@jindrichmynarz - don’t forget that the index is crowdsourced, meaning the submitters has different knowledge and interactions with open data. Some of them never used the data. Therefore, we need to explain what usability is, or to incorporate the relevant fields to the datasets description.
I was thinking about this too @jindrichmynarz. My proposal would be to insert following question in our survey:
[DRAFT] “Does government provide information how to use the data? - This question measures if your government provides you with information that clarify how to read and use the data. Answer “Yes”, if there is information that 1) explains the single elements of the data in an accessible language, 2) gives guidance how you should interpret it; 3) informs you about the data sources; 4) describes how the data changed over time.”
Thereby we can measure a government activity (publication of info material about the data) and can also use this as a proxy measurement for usability (are information provided that explain you what each element means)
However, we do not capture if users really understand the data. And I’m also wary if “accessible language” guarantees that everybody can understand the documentation.
While I agree with the intent, I think it would be too difficult to measure. Remember you end up with a yes/no answer. What if two of the four criteria were documented? Also remember the translation challenge for people assessing the data. Any extra assessment effort will be × 100 countries × datasets in census.
I see. Maybe to come back to Mor’s question about our definition of usability, which I would derive from definitions of “data quality”: According to literature on data quality, it can be defined as intrinsic (quality of the data themselves, including aspects such as data accuracy and consistency), accessible (relating to technical and cognitive accessibility), contextual (quality of data for a specific context and need - including comprehensiveness, timeliness and disaggregation) and representational (structure and syntax, version control, existence of metadata for data attributes, etc.).
I assume all of these elements could be seen as factors enhancing or limiting “usability”.
Given that these definitions are sufficient to delineate questions of usability, I would argue that usability/quality aspects relating to data structure, accuracy, consistency or comprehensibility of data (values) are all hard to measure - partly for the reasons @Stephen and @Mor already mentioned (crowdsourced, laypersons can fill out survey).
I think that your proposal, @Stephen, seems to be an interesting proxy for usability measured through the existence of supporting metadata. Do you have an idea how we can define the right metadata for that? We could ask for descriptive metadata for data attributes that inform about strengths and weaknesses of the data, the existence of metadata documenting how data are collected, metadata documenting update frequency and versioning, etc.