Recently, there was a job post asking for the bulk download of SEC filings (10-K and 8-K) for about 600 companies to be saved as PDF files. Given the number of times this type of request comes up we already had some of the flow mapped out with API Providers, but this case really didn't require the spend, so we mapped out the process on how to navigate the EDGAR site. Below we'll go over the steps you need to know to find the forms that were being requested and get you familiar with EDGAR. What you won't find is any code.
EDGAR, the Electronic Data Gathering, Analysis, and Retrieval is where companies have to file their documents. There are a number of ways to access the data in EDGAR, some are paid, but if you have the bandwidth, you can pull the information with a little effort for free. In this case the client only wants the PDF of the full filings which means no parsing or data extraction from the resulting pages. This means we can access EDGAR manually.
Step 1 - Understand CIK Codes
The Central Index Key (CIK) is a ten digit number used within the SEC to identify both individuals and corporations who have filed documents. To easily navigate EDGAR, you need to start with a CIK number.
For the person looking for one code, there is a handy Company CIK Lookup Form, but if you are processing a number of companies, it may be quicker to grab the complete list, load it into a dictionary and work out a way to search it for yoursel or for you Python people, try one of the libraries listed in the 'Parting Thoughts' section.
To get started building your own search, you'll need the list in its entirety, so here's the link:
https://www.sec.gov/Archives/edgar/cik-lookup-data.txt.
For our little example we are going to find some 8-K forms for Transocean and after a little research you'll find that the CIK code for Transocean Ltd is 0001451505 or 1451505.
Now, I'm gonna take a moment and point out that if you are using Excel to do any thing with CIK codes, you have to remember that the preceding zeros are gonna disappear on you if you treat it as a number and not a string - this is not a big deal as EDGAR is lenient.
Step 2 - Determine What You Want
Each company can have a sizeable number of files in their history and everything is provided in the companies submission index. The URL is the same for every company, just change the CIK code in the url and you're good to go.
There is a lot of data returned on this page and it's gonna be ugly. Use a JSON beautifier if you want to see it in a format that makes a bit more sense.
Step 3 - Do Some Detective Work
The items you are interested in are the accessionNumber and the primaryDocument or Form Type fields. You'll see that they are both sequentially ordered and that the indexes are related to each other. So index position 12 in both lists, identify the same data element.
So a little piece of info that may help, or not, is that the accessionNumber has the last two digits of the year wedged in there so you can filter on that piece as well if you so choose. An example is the accessionNumber 0001451505-21-000021 is year 2021 because of the -21-, not the 21 at the end (sorry for the poor example).
To build a link, you need to combine the accessionNumber with the primaryDocument or Form Type. Because ach document has document attributed broken out, you will find that document 12 is filed on 2021-03-01 and reported on 2021-02-26 and has an Accession Number of 0001451505-21-000021 and a file name of rig-20210226x8k.htm
Step 4 - Follow Your Nose
Stick everything together and get the url:
https://www.sec.gov/Archives/edgar/data/0001451505/000145150521000021/rig-20210226x8k.htm
But wait - what about the dashes in the accessionNumber??? When building the URL you just don't use them.
Step 5 - Deliver the goods
In this case the client wanted the report saved as a PDF file, so just any easy HTML->PDF converter and we're all done. It's Python. Pick one.
Parting Thoughts
Clearly, Caffeine Lab is a Python Shop and being that Python has so much community support a quick search for 'Python Edgar' will yield you a few libraries to work with. The one that helped us crack the nut in the past was py-edgar and it'll get you part way there. Other paid services that will parse the data in the reports include sec-api.io and edgar-online.com but those were not necessary as this particular job did not require data parsing so we'll save that for another post.