Launching goodtables.io: tell us what you think!

Hi, goodtables.io is great!

Now I need to learn how to use it… I can’t see the problem here,
http://goodtables.io/github/datasets-br/state-codes
the error “INVALID - non-matching-header” not make sense, br-state-codes.csv was updated as datapackage.json

@ppkrauss Thanks a lot for the feedback and for trying out goodtables.io

I see that you fixed the build, did you find what was wrong with the data file?

You are right that non-matching-header isn’t very helpful, one of our next priorities is making the error descriptions more meaningful (Improve error descriptions on job reports · Issue #215 · frictionlessdata/goodtables.io · GitHub)

1 Like

Thanks @amercader, yes now I see that I fixed :slight_smile:
But the commit of README’s change was after the commit of solution, the system lost some git sync after my correction, so this is perhaps a sync-bug, or a need for interface like “redo validation please”.

1 Like

@ppkrauss You are right, this looks like a bug, we will investigate, thanks!

Hi @amercader, sorry, I am not sure, but seems a bug: after change datatype at datasets-br/state-codes, an integer to gyear, say “Internal error” at GoodTables.

Thanks @ppkrauss, we’ve created an issue to track this: Validation task raises an exception on invalid datapackage · Issue #223 · frictionlessdata/goodtables.io · GitHub

Goodtables is reeeeally nice! Thanks for making it!

Nice work! Is Admin access really required for Repository webhooks and services?

hi @tesera, thanks for checking out goodtables.io.

The admin:repo_hook OAuth scope is needed to delete the repository hooks we create. We use this when deactivating the integration on a particular repo, to clean after ourselves.

From the GitHub docs:

write:repo_hook Grants read, write, and ping access to hooks in public or private repositories.
admin:repo_hook Grants read, write, ping, and delete access to hooks in public or private repositories.

Sadly it doesn’t seem that GitHub scopes are granular enough to only give permissions on particular hooks.

Hi @amercader thanks for the reply. We are really excited to use your service. We’ve half baked some similar tooling but this looks awesome so we will be a paid subscriber when you release it I have no doubt.

I wonder if it would be possible to separate the grants for this from the use of OAuth for login/registration. For example I can sign in with basic OAuth grants but if I want to hook up my Github repositories it would be a second Grant request.

From my perspective I would like to grant access to specific repositories or none at all (and use the API / S3 integration). Because I have owner roles on multiple Github organizations the cascading effect of these grants opens up access to multiple clients and organizations which is hard to accept.

I’d be more comfortable setting up the webhooks manually per repo or some other way. Maybe you are adding a login besides Github?

Thanks.
Spence

Hi @tesera, you are right in that login / repos authorization should have different scopes and separate authorizations. We have flagged that for a while as an issue but haven’t had the chance of implementing it. It’s a priority for sure though:

In general GitHub permissions are too permissive and broad (see eg this issue) and I don’t think we can grant access to individual repos (it may be possible for organizations?). We have to reach a compromise between convenience and ease of use and the permissions we ask. I wouldn’t be opposed to let users create their own webhooks, the only big issue is that we use a secret key known only to the server to make sure only the webhooks we create can ping our endpoints. I guess we could implement a system of per-user secret key for hooks but that would take a bit of time.

Hi, I’m updating the list of open licenses and its data package in GitHub - okfn/licenses: Open source and open knowledge (data and content) licenses together with API and web service..

I’m validating my changes with Goodtables.io at goodtables.io

In my pull request, I accidentally added an error by mis-spelling “superceded”. Goodtables.io didn’t fail the data despite my enum constraint.

I’m wondering if there’s an error in my table schema or if GoodTables has a bug?

Edit: now solved

I’m making a repository containing many data packages. I would like to validate each package individually and award badges to each data package.

When I add the repo to goodtables.io, it correctly detects that there are two data packages but only awards one badge: goodtables.io

Is there a way to have one repo, many data packages, each with there own badge?

1 Like

Hi!

I’m trying to use goodtables.io to add continuous data validation to the publicbodies project. The repository has a single datapackage.json file that points to several CSV files described using tabular data packages. Since all the files use the same schema, I’ve used a single file for the schema and referenced it in the datapackage.json file as described in the specs and discussed on this topic.

So far I’ve added the repository and get a validation error telling me it wasn’t able to resolve the reference to the table schema file, as below:

Captura%20de%20tela%20de%202018-08-21%2010-06-26

However, the command line goodtables tool is able to parse the referenced table schema just fine.

Here’s a sample:

$ goodtables datapackage.json 
DATASET
=======
{'error-count': 483,
 'preset': 'nested',
 'table-count': 11,
 'time': 14.937,
 'valid': False}

TABLE [1]
=========
{'datapackage': 'datapackage.json',
 'error-count': 0,
 'format': 'inline',
 'headers': ['id',
             'name',
             'abbreviation',
             'other_names',
             'description',
             'classification',
             'parent_id',
             'founding_date',
             'dissolution_date',
             'image',
             'url',
             'jurisdiction_code',
             'email',
             'address',
             'contact',
             'tags',
             'source_url'],
 'row-count': 259,
 'schema': 'table-schema',
 'source': '/home/herrmann/dev/publicbodies/src/data/br.csv',
 'time': 1.724,
 'valid': True}

It then goes on to errors that actually are find in the other tables. The point is that the command line tool correctly interprets and uses the external reference for the table schema, but the online tool on goodtables.io does not. Shouldn’t they be consistent? Doesn’t it use the same tool on the backend?

2 Likes

I describe the issue here. Is adding a goodtables.yml file that tells it to read only datapackage.json and nothing else going to solve it? It seems that neither adding a new branch nor adding a pull request to the repository has triggered a new check at Goodtables.io. Any ideas, @vitorbaptista & @roll ?

@roll fixed it by updating the data package module version used by Goodtables.io. Thanks! Looking forward to use it in more projects.

1 Like

Hey Thank you for the information, much appreciated

Hey, @roll, has this been completely superseded by Frictionless Repository?

@herrmann
Currently, it’s recommended to use Frictionless Repository although goodtables.io works.

1 Like