Data Extraction

Extracting Web Data

Latest Update: 03.05.2021

First Choice is to ALWAYS use any provided API. Followed shortly after by using an existing data-accumulator's API. And then if allowable, scrape site content using any number of tools including those listed below.

Caffeine Lab is a Python First Shop. So given that, there are a number of tools that we can easily subscribe to.

BeautifulSoup - Screen Scraping since 2004!

Mechanize - The first line tool for stateful programmatic web browsing.

Selenium - a powerful automation tool for web browsers that is great for testing, but also for simplifying boring web-based administration tasks.

Nightmare - a high-level browser automation library that is primarily used for UI testing and of course crawling.

Projects Related to Data Extraction

Mining Edgar

An overview on how to utilize EDGAR and find the documents you are looking for.

Case Files

An overview of how we developed a mining strategy for a firm heling people behind on bills.

Freshbooks API

A peek at how we quickly access the Freshbooks API without any additional libraries.

Citrix Podio API

A quick overview of how we quickly access the Podio API without any additional libraries.

Windy Maps API

Hosted at medium.com, an article about capturing animated weather forecasts.