Getting structured SEC EDGAR data

rufuspollock · October 10, 2014, 9:11am

Do you have any sense how large a full scrape of the data (the XML portion at least) might be?

I think it is pretty large but not absolutely sure. I think that public.resource.org might have quite a bit already done here for old stuff (pre 2001 IIRC) - https://bulk.resource.org/edgar/

Did you ever play with any of the available parsers for the actual SGML filings? [3] looks like this might be quite traumatic to the untrained explorer.

Not totally clear on the SGML vs XBRL stuff - i was focused on getting more of the “data” so focused on XBRL (however, the mention of pysec in the SO comments suggests that most libraries may do both). I had a very short library review here:

I also note that the lady behind RankAndFiled.com must have done some pretty good stuff (however I don’t think any of that is open-source AFAICT).

Similarly, did you ever try out any of the Python tooling for XBRL?

Yes, and I actually managed to get one working. see https://github.com/datasets/edgar/tree/master/scripts

Topic		Replies	Views
Data packages with R Frictionless Data opendata	6	2240	July 14, 2016
Introduction & potential meetup at RSE Conference in Manchester Frictionless Data	2	1246	September 26, 2016
Csvy: csv + yaml Frictionless Data	4	1522	May 14, 2016
Anyone have links to data in GitHub repositories? Open Knowledge Labs	4	1282	February 12, 2018
Finding data packages Frictionless Data	8	1879	February 26, 2018

Getting structured SEC EDGAR data

Related Topics