Project Luther - Discovery
Project Luther is in motion.
For Project Luther, I need to move quickly towards a MVP (Minimal Vialble Product). This has been broken into two parts. I believe the formal MVP presentation is Monday. But Friday I must be able to show I’ve already begun to collect data. This projects has two key aspects: Web Scraping; Linear Regression. So to a degree the MVP for the Scraping is tomorrow.
I’ve performed a bit of Exploration and Discovery. I played with Selenium and the websites for four of the counties of the Chicagoland area:
- Cook
- DuPage
- Lake
- Will
I evaluated these for ease of access as well as for the amount and type of data I could procure. For the purpose of the MVP I have settled on Will countly alone.
Next, due to the need to show data tomorrow, I cannot do start as I’d hoped. I do want to start to switch to CI. But I dare not waste time on getting that part working.
Hilarously, I already have a template project to pull from for the web scraping. I will likely create a writeup for that project in the near future.
PLACEHOLDER (I’ll come back here later and create a link to it).
But just because I cannot immediately integrate CI doesn’t mean I shouldn’t go ahead and start the repository. So here’s the short-term plan…
- Create the repository for GitHub
- Use Jupyter, Selenium and BeautifulSoup for prototyping
- Begin development with Scrapy
- Incorporate Mongo for storage
- Pull some data