Entry for National Laws / Australia


#1

hi @JXhaf - in your review for National Laws, you have marked down for the data not being in HTML format, but as this is the primary mechanism for their web delivery, it seems a little harsh?

e.g. https://www.legislation.gov.au/Latest/C2016C00828 is the dedicated HTML page for the Bank Integration Act 1991 (C2016C00828), and will always be the latest version.

In https://www.legislation.gov.au/Content/Linking they even acknowledge the use of crawlers!


#2

Thank you @tobybellwood for your feedback. After also consulting with Alyssa Beaton of Open North, I will not be making any changes to the review because of the following reasons:

  • In the link that you provided it clearly states that:"Most documents on this website can be downloaded and printed using the “Download” tab at the document level. Documents are usually available as a PDF file, a formatted text (.doc, .docx. or .rtf) file, and a zip file. " As you can see there is no mention of the contents being available in html. My review was based on the fact that the text of the laws is actually not available in html- despite the fact that I found the document on a webpage. The only information available in html is the title of the laws (which is not sufficient data for the purpose of the survey)
  • as per your comment on the use of crawlers, you are right to point that they do acknowledge the use of crawlers- but this is insufficient. While the information can be crawled, it cannot be scraped (this is more important for the purpose of the survey). Information on the difference between scraping and crawling can be found here: https://www.quora.com/What-are-the-biggest-differences-between-web-crawling-and-web-scraping

#3

thanks @JXhaf for the explanation. I would certainly benefit from a documented clarification on what constitutes open formats. If the HTML is parseable at a defined endpoint (however tough that parsing may be), surely it’s open, given the plain text nature. In this case it is relatively simple to parse the (relatively) well-structured HTML from the webpage using core Python libraries.

It’s worth adding that AustLII (the other main free access resource for Australian Legislation) provides a version of the laws in a much cleaner HTML format (http://www.austlii.edu.au/au/legis/cth/consol_act/bia1991166/) as well as RTF/TXT download (http://www.austlii.edu.au/cgi-bin/download.cgi/au/legis/cth/consol_act/bia1991166) which may fit the open definition better in this case.


#4

@tobybellwood. After your useful feedback, I have updated the review to include a comment that the data is available in html. However, as you may have noticed there has been no change on the survey itself as html is no longer one of the options for machine readable formats (because it can be difficult to scrape the content).

As per the other source you indicated, it does not appear to be a government source and so it is not relevant for the purpose of the index.


#5

Thanks @JXhaf for the update, and all your hard work! Much appreciated.