Bootcamp - Blogging about Projects

It’s a challenge to keep up with the blogging!

We had an interesting and rather lively feedback session Friday. We’d been talking about doing one for a bit. So David finally hosted it.

One of the more interesting things to come out of it was a consistent call that Dask should be introduced into the curriculum immediately following Pandas. This may have been about the only thing where the cohort at least had no dissenting views. Most of the feedback was diverse in the sense that not everyone agreed.

But Dask? Not everyone had delved into it. But many had dabbled a bit. And Alex did give a good Investigation presentation on the topic. Many of us had run into issues throughout the bootcamp which this could have addressed. This main problem is what to do when your local machine runs out of memory? For most of this there have only been two choices:

  • Reboot after the crash and try again with a smaller data set.
  • Move it to the cloud somehow.

But moving to the cloud isn’t trivial. Do you try to create a single EC2 instance with a ton of RAM? Or try out something on EMR with the work spread out? Both of these weren’t something we could have done early on in the bootcamp. Sometimes you can restructure your work so you process a large file one line at a time. But that’s what Dask does for you!!

I rolled some swap space. I’ve now got lots more virtual memory. So my machine doesn’t lock up or crash any longer. But things get slow once I’m chewing through swap.

Dask, however, would have a been a natural and straightforward solution many of us could have incorporated rather easily even back in Project One. And Dask would have provided us an immediate benefit on a local machine well before taking advantage of its full power on a cluster.

One of the more lively topics of discussion related to the Challenges. I dutifully completed all the required challenges. I am actually looking forward to doing some of the optional ones later… after the conclusion of the bootcamp. But it seemed pretty clear nobody did the optional ones. Indeed, many didn’t actually complete the required ones. They just turned them incomplete. Oh well…

Another thing that became optional halfway through the bootcamp was blogging about the projects. This one seems a bit odd. I mean… that’s pretty much the entire point of the bootcamp - to develop a portfolio of projects which you display via your blog and/or GitHub. In this case, the issue is timing. You can go back and work on your blogs later. This seems to be what a lot of folk do.

Well… I’ve started to blog to keep a few notes along the way for Project Five. But I’d not yet even really started the blogging for Projects Three and Four… tsk… tsk! So in between some work on Project Five this afternoon I stopped and fluffed out the structure for the remaining projects. Soon I’ll put up the presentations for Project Four. There’s something I want to go back and finish for the final presentation of Project Three. So I may wait on that. We’ll see…


© 2017. All rights reserved.