Google dataset search out of beta

Stephen · January 24, 2020, 8:23pm

Threat or opportunity?

Consequences for CKAN, Frictionless Data?

https://blog.google/products/search/discovering-millions-datasets-web/amp/

lwinfree · January 24, 2020, 8:45pm

Hey Stephen, great question! I think it’s an opportunity & have been wondering if there is a way we can incorporate GDS + Frictionless Data. For example, this issue: Can data packages be made easily findable using search engines? · Issue #622 · frictionlessdata/specs · GitHub.
Do you have any thoughts?

herrmann · January 28, 2020, 2:31pm

Great question, @Stephen! I agree with @lwinfree that it’s an opportunity. But I’m not sure what the next step should be. Writing a pattern as @rufuspollock suggested on that Github issue a year ago would probably help, but would it be enough? Is it safe to assume that every datapackage will have a description page accompanying it just to make it visible to Google, or is there some use case where that would be difficult or impractical?

The best case scenario would be if Google Dataset Search would read not only DCat, but the CKAN API and datapackage.json. The CKAN API has plenty critical mass of datasets available, yet for some reason that has not sufficed for Google to bother implement reading it. Data Packages, on the other hand, is still somewhat niche, despite a recent increase in availability.

For CKAN, I think we will see a sharp increase on the use of the DCat extension by data portals, just so as to make the datasets more visible to Google. For SEO reasons, it might even be a good idea to incorporate it into main CKAN in a future version.

versant2612 · February 7, 2020, 2:15am

“One thing hasn’t changed however: anybody who publishes data can make their datasets discoverable in Dataset Search by using an open standard (schema.org) to describe the properties of their dataset on their own web page.”

“If you have a dataset on your site and you describe it using schema.org, an open standard, others can find it in Dataset Search. If you know that a dataset exists, but you can’t find it in Dataset Search, ask the provider to add the schema.org descriptions and others will be able to learn about their dataset as well.”

By reading these two parts (excerpts) of the text, I understood that Google is not interested in using other standards at the moment.

amirouche · February 24, 2020, 11:01am

I tried when it was launched, it was not working as good as the main google search product. In particular, no support for synonyms. Say a search for GDP, except if the original dataset mention “GDP” it will not find “Gross Domestic Product”.

My informed take on the subject of dataset search engine, is that at least the lexicon should be built in the open. That may be built by machines over time, but some dataset use so much rare jargon that a human is required to spot it.