We are making some progress to parse this info. You can see our GitHub page here.
We're parsing both a list of all corporations as well as going through a list of over 100 or so sets of data with procurement in it. Much of the scraping of contracting was done in PHP.
We may well use Node.JS to do the parsing based on peoples experience in the team.
@ryan thanks again for your support with R. I kept hoping there's be a simpler process.
@rufuspollock Trouble is that there is more than one. We're building a local cache of HTML files which we will be working from. Our expectation is that it will be faster to do any work locally if the content is all there.