Good initiative (!)... I think it is important to avoid intersections on datasets projects, ideal is the focus on indepedent data that is really country-specific. PS: about label, I vote
Hum... thinking better... we need some intersection...
Questions in the case of "some intersections allowed"
Let's see the country-codes.csv example, with standard names in English (
name) and an
official_name_fr for French... And only these two languages.
/datasets-br we would want to include
official_name_pt column (
pt is the language of BR), so to reduce duplications and intersections we must creat a new
/datasets-br/country-codes/data/country-codes.csv file with only two columns,
ISO3166-1-numeric (as primary key) and
official_name_pt... Posssible problems:
ISO3166-1-numeric is not mnemonic, and not so useful as
ISO3166-1-Alpha-2... Use more one or two columns as candidate keys?
to produce a kind of "SQL JOIN" with two CSV files (the
/datasets-br and from the main
/datasets) is not so easy for all users, why not copy all other columns?
/datasets-pt? it will use the same translations, but perhaps some variants (pt, pt-BR and pt-PT not always the same).
Questions in the case of "some intersections in the curation"
Suppose a big and vibrant community working in
/datasets-nl, etc. and all in a "no
/datasets intersections allowed" mode, but each one with a big set of people looking for a
official_name_X column at
I think this is the "most vibrant" aspect: a new demand, a new pression in the curatory organization of the central
/datasets, perhaps a kind of "federated democracy"
In the case of country-codes, and supposing that "join CSV files" is not a problem for users, the federated community can help central
/datasets to maintain a new
country-codes-names.csv file with all
official_name_X columns: this solution seems better than intersection in