Not everything has an API.
I haven’t had to do much web scraping in my life, and when I have it’s been simple and did not need to be reproducible. But there are a few projects that have been floating around in my head that would benefit greatly from repeatedly collecting a lot of data straight from webpages. My search for a good scraping tool led me to the usual places (Stack Overflow and Quora) and I found Scrapy.
Scrapy is a Python based screen scraping and web crawling framework that is available to fork on GitHub. I currently work on a windows machine so, like most cool things, it was non-trivial to set up but luckily they provide a straightforward installation guide with links to all of the dependencies you need to install. They also provide a nice tutorial to help you get a feel for the framework.
So that’s where I am now: everything is up and running, and I feel comfortable with the tutorial project. Now I just need to figure out how to use it for my own (currently ill-defined) projects. Hopefully I’ll be back here soon reporting on some cool results.