Project Luther Begins

Project Luther begins.

Yay! With this project I get to play with web scraping.

There are so many aspects of Data Science which I have employed at various points throughout my career and personal life. During the lectures this morning we discussed what Machine Learning is at its most basic definition. In this definition a lot of the optimization things I did long ago would qualify as Machine Learning. Indeed, some of the stuff I’ve done with fitness functions have quite a lot of resemblance in spirit to Gradient Descent.

But this project is not just web scraping. It’s web-scraping plus linear regression.

It actually proved a tad difficult to choose a topic to investigate. See… from here on out, we are only given a few requirements and rough guidelines. From these we define the project.

So here, we need something that:

  • Involves some sort of structured data we scrape from the web. We can use additional data of any kind from any source. But the first part of the data needs to be scraped.
  • Has enough data of this structured fashion that we can do some linear regression.
  • Has to be interesting (at least to me).
  • Ought to be somewhat novel.

Hmm… tough call.

I played around last night with a number of ideas. I settled on something to do with property tax assessments since I’ve been curious whether there are measurable effects of increasing property tax. I don’t think I’ll be able to get enough data longitudinally (in time) to see much. But I ought to be able to measure across space or across valuations.

An even tougher call is HOW to do this…

I don’t mean how to scrape. Though I will have to decide for or against inclusion of Scrapy. I mean Jupyter vs. Python, how much to employ Test-Driven Development, CI, etc.

MVP is Monday. Again… the pressure is on.


© 2017. All rights reserved.