Project Luther - Discovery

25 Jan 2018 in Project / Luther

Project Luther is in motion.

For Project Luther, I need to move quickly towards a MVP (Minimal Vialble Product). This has been broken into two parts. I believe the formal MVP presentation is Monday. But Friday I must be able to show I’ve already begun to collect data. This projects has two key aspects: Web Scraping; Linear Regression. So to a degree the MVP for the Scraping is tomorrow.

I’ve performed a bit of Exploration and Discovery. I played with Selenium and the websites for four of the counties of the Chicagoland area:

Cook
DuPage
Lake
Will

I evaluated these for ease of access as well as for the amount and type of data I could procure. For the purpose of the MVP I have settled on Will countly alone.

Next, due to the need to show data tomorrow, I cannot do start as I’d hoped. I do want to start to switch to CI. But I dare not waste time on getting that part working.

Hilarously, I already have a template project to pull from for the web scraping. I will likely create a writeup for that project in the near future.

PLACEHOLDER (I’ll come back here later and create a link to it).

But just because I cannot immediately integrate CI doesn’t mean I shouldn’t go ahead and start the repository. So here’s the short-term plan…

Create the repository for GitHub
Use Jupyter, Selenium and BeautifulSoup for prototyping
Begin development with Scrapy
Incorporate Mongo for storage
Pull some data

Project Luther - Discovery

Joseph Hamilton - Solution Oriented Data Scientist

Error

Templates (for web app):

Error