Big data in a nutshell

You’ve probably heard the term big data in the past several years. As the name might imply, it’s about analyzing a lot of data at once, like an entire laptop full of data, often much more. We’ve all trouble with a single file that refuses to open, is laggy, or the ultimate sin — crashes your computer before saving. So how in the world is anyone able to process data so much larger than that? The answer is software built for this specific task that leverages affordable hardware.

As the cost of both processing power and storage have dropped, big data applications — or abuses, in some cases — have become more feasible and potentially more profitable. In order to leverage these benefits, technologies like the open source Hadoop ecosystem and Spark allow you to connect a bunch of computers together (known as horizontal scaling) to work on a large task. A good indicator of a technology’s popularity is how many companies include them in their tech stack — for Hadoop, it’s a lot.

So instead of upgrading a single computer to something approaching a super computer (called vertical scaling, this time), companies can use software to connect cheap computers together. For some companies, it’s more cost effective to rent computer time and storage from Amazon Web Services (AWS), or other cloud providers. The cloud is a separate explainer, though.

IoT's affinity for open source software

I've been a fan of open source software (OSS) for some time now. But ask anyone who's worked with older systems in hardware, and they'll probably tell you the interface was written in proprietary software. My main complaint with proprietary software -- and a common one, I'm sure -- is that it often feels uninspired and sometimes sluggish or dated. It's almost as if more effort is spent on marketing and licensing efforts. And that makes sense; if a customer has locked in a license, there's not a lot to incentivize a company to rapidly develop and release new software.

Not so with the case emerging with IoT, which is seeing an increasing number of OSS stacks, writes readwrite. Compared to proprietary software, OSS is more flexible in terms of end user control, and tend to offer more rapid updates, usually improves quality quickly. From the article:

[W]hile open source will remain a big deal to IoT developers even as the space commercializes, we’re likely to see it embraced more for its quality than for its ideology over time.

What's interesting is the analog between IoT hardware and software -- circuits and fixtures are prototyped from individual components the same way the code base is connected together from various OSS projects. In both cases, the result maximizes control and flexibility. Hopefully the OSS trend in IoT continues.

Hackathon insights: roles on teams

Recently, I had the opportunity to help with a hackathon. Over the course of three days, students would connect data end-to-end from hardware APIs, through a Hadoop backend, to a Hive Server end point for downstream analysis and viz in Tableau.  My role was designing the backend, introducing and instructing on the technology used (Linux and Hadoop), and troubleshooting with students during the process. There were six teams, comprised of five members with various combinations of hardware, data backend, and Tableau analysts.

One of the main things I noticed was how well everyone worked together. There were aspects of it that reminded me of teams I have enjoyed working on in the past and currently. Some common threads:

  • Interested in team success.  Seems like a no-brainer, but probably everyone has been part of team where at least one person wasn't interested in the product. That sort of indifference doesn't necessarily lead to tension, but it can lead to an uninspired product or worse, the entire team slowly running off the rails. It can be hard to come back from that.
  • Unique roles in the team. Even if your role has wide breadth (hardware wizard, dev ops guru, front end master), having a role in a team gives you an identity, ownership, and accountability. While it's possible to have more than one person within a role on a team, a larger role overlap comes with a greater risk from personality conflict.
  • Sense of individual ownership. In general, if people feel like they own a part of the project or product, they feel invested and included.

Most interesting was observing the same themes that I experience in the past, this time outside looking in.