Project Luther - Discovery

Project Luther is in motion.

For Project Luther, I need to move quickly towards a MVP (Minimal Vialble Product). This has been broken into two parts. I believe the formal MVP presentation is Monday. But Friday I must be able to show I’ve already begun to collect data. This projects has two key aspects: Web Scraping; Linear Regression. So to a degree the MVP for the Scraping is tomorrow.

I’ve performed a bit of Exploration and Discovery. I played with Selenium and the websites for four of the counties of the Chicagoland area:

  • Cook
  • DuPage
  • Lake
  • Will

I evaluated these for ease of access as well as for the amount and type of data I could procure. For the purpose of the MVP I have settled on Will countly alone.

Next, due to the need to show data tomorrow, I cannot do start as I’d hoped. I do want to start to switch to CI. But I dare not waste time on getting that part working.

Hilarously, I already have a template project to pull from for the web scraping. I will likely create a writeup for that project in the near future.

PLACEHOLDER (I’ll come back here later and create a link to it).

But just because I cannot immediately integrate CI doesn’t mean I shouldn’t go ahead and start the repository. So here’s the short-term plan…

  1. Create the repository for GitHub
  2. Use Jupyter, Selenium and BeautifulSoup for prototyping
  3. Begin development with Scrapy
  4. Incorporate Mongo for storage
  5. Pull some data

© 2017. All rights reserved.