Big data in a nutshell

You’ve probably heard the term big data in the past several years. As the name might imply, it’s about analyzing a lot of data at once, like an entire laptop full of data, often much more. We’ve all trouble with a single file that refuses to open, is laggy, or the ultimate sin — crashes your computer before saving. So how in the world is anyone able to process data so much larger than that? The answer is software built for this specific task that leverages affordable hardware.

As the cost of both processing power and storage have dropped, big data applications — or abuses, in some cases — have become more feasible and potentially more profitable. In order to leverage these benefits, technologies like the open source Hadoop ecosystem and Spark allow you to connect a bunch of computers together (known as horizontal scaling) to work on a large task. A good indicator of a technology’s popularity is how many companies include them in their tech stack — for Hadoop, it’s a lot.

So instead of upgrading a single computer to something approaching a super computer (called vertical scaling, this time), companies can use software to connect cheap computers together. For some companies, it’s more cost effective to rent computer time and storage from Amazon Web Services (AWS), or other cloud providers. The cloud is a separate explainer, though.

Hackathon insights: roles on teams

Recently, I had the opportunity to help with a hackathon. Over the course of three days, students would connect data end-to-end from hardware APIs, through a Hadoop backend, to a Hive Server end point for downstream analysis and viz in Tableau.  My role was designing the backend, introducing and instructing on the technology used (Linux and Hadoop), and troubleshooting with students during the process. There were six teams, comprised of five members with various combinations of hardware, data backend, and Tableau analysts.

One of the main things I noticed was how well everyone worked together. There were aspects of it that reminded me of teams I have enjoyed working on in the past and currently. Some common threads:

  • Interested in team success.  Seems like a no-brainer, but probably everyone has been part of team where at least one person wasn't interested in the product. That sort of indifference doesn't necessarily lead to tension, but it can lead to an uninspired product or worse, the entire team slowly running off the rails. It can be hard to come back from that.
  • Unique roles in the team. Even if your role has wide breadth (hardware wizard, dev ops guru, front end master), having a role in a team gives you an identity, ownership, and accountability. While it's possible to have more than one person within a role on a team, a larger role overlap comes with a greater risk from personality conflict.
  • Sense of individual ownership. In general, if people feel like they own a part of the project or product, they feel invested and included.

Most interesting was observing the same themes that I experience in the past, this time outside looking in.