Meeting Scrapy

Not everything has an API.

I haven’t had to do much web scraping in my life, and when I have it’s been simple and did not need to be reproducible. But there are a few projects that have been floating around in my head that would benefit greatly from repeatedly collecting a lot of data straight from webpages. My search for a good scraping tool led me to the usual places (Stack Overflow and Quora) and I found Scrapy.

Scrapy is a Python based screen scraping and web crawling framework that is available to fork on GitHub. I currently work on a windows machine so, like most cool things, it was non-trivial to set up but luckily they provide a straightforward installation guide with links to all of the dependencies you need to install. They also provide a nice tutorial to help you get a feel for the framework.

So that’s where I am now: everything is up and running, and I feel comfortable with the tutorial project. Now I just need to figure out how to use it for my own (currently ill-defined) projects. Hopefully I’ll be back here soon reporting on some cool results.

A Network Science textbook

I recently started a wonderful course in Social Network Analysis (available on Coursera). There are many, many good things to be said about this course but I will save those until I have completed it.

For now, I just want to highlight a book that I found through this class. Network Science by Barabási is an introductory text to the field. Network science is the study of network representations of phenomena and their related models. While its foundation is graph theory (a field of mathematics), network science is interdisciplinary and draws methods and concepts from a wide variety of fields, ranging from sociology to physics. Like data science, its applications have grown dramatically in recent years thanks to cheap computing power and data collection & storage. I had actually hoped to title this post “Network Science is Data Science for people who had sex in high school”. While I’m pretty sure that would be a lie, the point is that network science is awesome and has a fascinating future in store for it.

Albert-László Barabási is one of the biggest names in the field. In 1999 he and Réka Albert published a paper on scale-free networks that has proven pivotal in launching the booming interest that network science has seen as an academic field over the last decade. Now, Barabási is working on a textbook aimed at exposing undergrads to this powerful field of study.

It is a work in progress but the first two chapters are currently available for free. So far, the content is at too low of a level to warrant what I expect the price to be (undergrad books are unconscionably expensive), but I am definitely looking forward to read any additional chapters that are posted online.  And I am glad that a high-quality, introductory book on this subject will be available soon.

Are we alone in the universe?

“A single ear of wheat in a large field is as strange as a single [habitable] world in infinate space” – Metrodorus

This week I finished a course called Intro to Astrobiology by Professor Charles Cockell (of the UK Centre for Astrobiology at The University of Edinburgh) offered on Coursera.


An ancient field of thought that is concerned with the origin, evolution and distribution of life in the universe. It pulls from many disciplines (chemistry, biology, astrophysics, etc).

  • How/why/where did life begin on earth?
  • What are the extreme limits of life (temperature, pressure, desiccation) on earth?
  • Are these limits universal? Can life exist in ways we haven’t concieved?
  • Is there life outside of earth? How can we go about finding it?

There are billions of galaxies, each with billions of stars, many having planets; and we are only beginning to have the technology capable of inspecting them.

The Class

Each week a handful of short lecture videos were released as well as a couple of multiple choice quizzes. Meant as an introductory/teaser course, the videos offered an overview of the many disparate parts of the subject. Professor Cockell clearly finds the field fascinating and did a good job of connecting the topics together.

Is life sustainable outside of the comfort of the earth?

This may not be answered for a long time. But eventually, the earth will be unable to sustain life; whether through our own actions or because of the expiration of our sun. So it is imperative for us to explore these issues.

Are we alone in the universe? The answer is profound either way.