One of the main goals of Frictionless Data is to improve the quality of data published and to make it easier to maintain this quality over time. Building on top of the excellent goodtables Python library we are launching a free service to provide Continuous Data Validation to everybody:
GoodTables.io builds on all the work that has been done in Frictionless Data specifications and tooling to date. It is designed to integrate with different backends and run validation jobs whenever data is updated. For this first Beta version, we are focusing on data hosted on GitHub repositories and Amazon S3 buckets.
There are a lot of rough edges to polish and you can see what issues are already on the roadmap on the issue tracker, we’d love to hear your early feedback and learn about your use of the service.
To register a new source simply login with GitHub and authorize the application. Once on the dashboard page, click on the “Manage Sources” link on the header:
- For Github repos, click the Synchronize button to get a list of your repos, and then activate the relevant one.
- For Amazon S3 buckets, enter the access key id and secret key, and the name of the bucket (we currently only support buckets located in the Oregon (us-west-2) region).
Validation jobs should start the next time a commit is pushed to the repo or a file is updated on the bucket.
Feel free to add any comments to this thread, your feedback is greatly appreciated!
Kudos!!

), I’m absolutely happy to run validation/checking processes on data which reports on them for manual follow-up/correction. Directly changing things itself though without oversight is a complete no go. At least, not without some kind of (extensive?) trial period to ensure there’s no edge case bugs which incorrectly change the data.

