Our process of finding and evaluating key datasets

Hi @dannylammerhirt,

When looking at our results, do you see any countries for which those problems apply you mentioned?

Yes, Brazil… And many similar contries with similar “maturity level in E-government” and similar informality level of digital preservation… Perhaps 90% of Latin America countries.

Any specific files that could show these problems?

First it is important to reinforce the context, is digital preservation and digital integrity of a public content — not usual “digital certification” of authorship. See this handbook page about checksums.

There are 3 or 4 millions of documents, each document with an Brazilian full-text norm (government acts as law, etc.). The cost to audit the integrity of millions of digital documents, without checksums (by full-backups and its full-homologation), is very high, no one is doing it.

Sampling one. Brazilians “Civil Internet Frame” federal Law of 2014, officially named Lei Federal nº 12.965 de 2014 (it is linking to LexML portal that is the Brazilian’s equivalent of European N-LEX portal).
The full-text of the law is at the online government gazette as “official PDF” (see here), and at official sites transcripted to HTML, like the presidential page with HTML full-text of the law.

The problems:

  1. The HTML reamains as unofficial transcription, without value of evidence. See the link to the HTML version, at the end there are a red phrase
    “Este texto não substitui o publicado no DOU de 24.4.2014”
    = “This text does not replaces the published one in the OGU (Offcial Gazette of the Union) in 2014-04-24”.

  2. The PDF have no checksum, which would be so that any citizen can freely check integrity… So, no citizen can proof that there was (in the past at the official webserver) other content, that the document was changed.
    And there no other resource to check integrity, the foot note is only an ID (say “código 00012014042400124”) to the same PDF, that is not a proof of integrity.


Yes, hello @herrmann? :wink: and perhaps @wagner_faria_de_oliv

1 Like