Big data in a nutshell

You’ve probably heard the term big data in the past several years. As the name might imply, it’s about analyzing a lot of data at once, like an entire laptop full of data, often much more. We’ve all trouble with a single file that refuses to open, is laggy, or the ultimate sin — crashes your computer before saving. So how in the world is anyone able to process data so much larger than that? The answer is software built for this specific task that leverages affordable hardware.

As the cost of both processing power and storage have dropped, big data applications — or abuses, in some cases — have become more feasible and potentially more profitable. In order to leverage these benefits, technologies like the open source Hadoop ecosystem and Spark allow you to connect a bunch of computers together (known as horizontal scaling) to work on a large task. A good indicator of a technology’s popularity is how many companies include them in their tech stack — for Hadoop, it’s a lot.

So instead of upgrading a single computer to something approaching a super computer (called vertical scaling, this time), companies can use software to connect cheap computers together. For some companies, it’s more cost effective to rent computer time and storage from Amazon Web Services (AWS), or other cloud providers. The cloud is a separate explainer, though.

Explainer: Rule 41 and its dangers

The EFF brings news of an innocuous-sounding — yet Orwellian — Rule 41. The proposal has two main segments; from the article:

The first part of this change would grant authority to practically any judge to issue a search warrant to remotely access, seize, or copy data relevant to a crime when a computer was using privacy-protective tools to safeguard one's location.

The second part...would grant authorization to a judge to issue a search warrant for...infiltrating computers that may be part of a botnet. This means victims of malware could find themselves doubly infiltrated: their computers infected with malware and used to contribute to a botnet, and then government agents given free rein to remotely access their computers as part of the investigation.

This means that any judge in the US — perhaps one with a history of granting warrants without much consideration of evidence — can issue a search warrant for any computer in the world, regardless of jurisdiction. Combine that with the language of the second segment, and this is effectively a rubber-stamp to intrude every connected device on the planet with a single warrant.

Congress has until December 1 of this year to block these changes to Rule 41. See EFF’s write up for an in-depth on the the legal ramifications.