Project McNulty Begins

Project McNulty begins.

OK. Once bitten, twice shy.

They TOLD us last time to submit two ideas for Project Two and to be able to pivot to the second one if things didn’t work out with the first. I am not sure anyone actually did so. I cannot even remember what my second idea was. I know I gave it very little consideration.

Now, to be fair, it was hard to assess things. First of all, half of that project was scraping. We couldn’t even look at data until we had it. Furthermore, a lot of us wouldn’t have really known how. But we WERE getting a sense of the lack of correlation at the MVP stage. I imagine several of us could have, should have hopped at that point. Again though… we didn’t know. For example, I spent time scraping more data. I had fun. It as a worthwhile experience. I proved in MongoDB. But only towards the end did I look at the learning curves which made it really, really clear more data wasn’t going to help me at all.

So… THIS time I’m treating both ideas seriously and have already pulled down the initial datasets for both. As much as is feasible I’m going to do a quick, hasty deep dive to look for “signal”. Things like:

  • Are there worthy correlations between features and target?
  • Is the classification going to be skewed due to imbalanced categoricals?

I plan to have things like that addressed very shortly. I may be able to add features later. But I don’t want to get stuck chasing ghosts this time.

David called this “fail fast”. We need to be able to determine quickly when to “fail” and move on rather than falling trap to a sunk cost fallacy.

Let’s see how this goes…


© 2017. All rights reserved.