I was browsing around for info about scraping the SEC’s EDGAR database and delighted to see that some of the first results were your work on it , . I’m thinking about looking into that data casually, and I was wondering whether you might have some help for me on a few questions:
Do you have any sense how large a full scrape of the data (the XML portion at least) might be?
Did you ever play with any of the available parsers for the actual SGML filings?  looks like this might be quite traumatic to the untrained explorer.
Similarly, did you ever try out any of the Python tooling for XBRL?