Thanks @henrykironde. The example I was thinking of is point data in a CSV:
facility id
facility name
address
lat
lon
1
St. John’s Hall
1 jones st, london
51.5
0.1
2
Data Wranglers Community Hall
2 smith st, west ham
na
na
3
Coordinate’s Anonymous Hall
3 fiona rd, shoreditch
51.55
4
Missing Values Drop-in Centre
4 shrek lane, soho
5
St. John’s Childcare
1 jones st, london
51.5
0.1
6
Pear Hall
6 smith st, west ham
na
The data is useful without lat,lon.
On validation:
row 1 is OK
row 2 is OK assuming "missingValues": "na" is defined
row 3 is Invalid but how is this tested if the required constraint can’t be used due to missing values allowed? Perhaps using "locations": [{ "type": "lat-lon", "fields": {"latitude": "lat", "longitude": "lon"}}] could check if both values are present or missing?
row 4 is OK, default is "missingValues": ""
row 5 is OK, but note it is a duplicate of row 1 hence using primaryKey won’t work.
Thanks for the clarification @Stephen I like the example. What makes this file a spatial data file is the fact that it has latitude and longitude columns with at least one row having both actual latitude and longitude. There will be many cases of such data and I think it is totally fine as long as “missingValues” is defined.
In my opinion, I would not call this a duplicate. They may be locations on the same building but on different floors (row 5 is OK, but note it is a duplicate of row 1 hence using primaryKey won’t work.).
If I get this correct, I am thinking of something like
Hi @henrykironde you’re correct - using primaryKeys is an incorrect hack. The primaryKey is facility id.
My point is that software that validates the data (like goodtables.io, tableschema.js or datapackage.js) will need to use the locations property to ensure that either:
both latitude and longitude are present
both latitude and longitude are missing based on missingValues