5 Things You Should Know About Big Data

Big data is a very hot topic, and with the Splunk IPO last week seeing a 1999-style spike, the bandwagon is overflowing.  We’re poised to see many businesses pivoting into...
Rob Sobers
2 min read
Last updated January 17, 2023

Giant T Rex Big data is a very hot topic, and with the Splunk IPO last week seeing a 1999-style spike, the bandwagon is overflowing.  We’re poised to see many businesses pivoting into the big data space or simply slapping a big data sticker on their products—accurate or not—just to ride the wave.

This post aims to help educate you with a few byte-sized big data concepts (not just trivia) so that you can distinguish the substance from the hype.

Hate computers professionally? Try Cards Against IT.

 

1. Big data is distributed data

Big data is a nebulous term with many different definitions.  The key thing to remember is that in this day and age, big data is distributed data.  This means the data is so massive it cannot be stored or processed by a single node.

The days of buying a single big iron server from IBM or Sun to handle all your business intelligence needs are long gone.  It’s been proven by Google, Amazon, Facebook, and others that the way to scale fast and affordably is to use commodity hardware to distribute the storage and processing of our massive data streams across several nodes, adding and removing nodes as needed.

2. You’re going to hear the words “Hadoop” and “MapReduce”

What is Hadoop?   It is an open source platform for consolidating, combining and understanding large-scale data in order to make better business decisions. Hadoop is the technology powering many (but not all) big data analytics infrastructures.

There are 2 key parts to Hadoop:

  • HDFS (Hadoop distributed file system) which lets you store data across multiple nodes.
  • MapReduce which lets you process data in parallel across multiple nodes.

Although Hadoop is one of the most popular solutions for crunching big data — there are plenty others.  Big data can’t be shoehorned into one flavor of technology.  The important characteristic is that you’re able to draw insights from large quantities of data, independent of specific technologies.

3. You can understand MapReduce without a degree from Stanford

The best plain English explanation of MapReduce I’ve encountered (paraphrasing):

We want to count all the books in the library.  You count up shelf #1.  I count up shelf #2.  That’s map. Now we get together and add our individual counts.  That’s reduce.

For a deeper understanding, Wikipedia is a good place to start.

4. Distributed data generation is fueling big data growth

The reason we have data problems so big that we need large-scale distributed computing architecture to solve is that the creation of the data is also large-scale and distributed.  Most of us walk around carrying devices that are constantly pulsing all sorts of data into the cloud and beyond – our locations, our photos, our tweets, our status updates, our connections, even our heartbeats.

For every human-generated piece of data there’s likely associated machine-generated data.  And then there’s the metadata.  The data is abundant and it’s extremely valuable.

5. Machine learning is…awesome!

One of the key differentiators in big data analytics are the machine learning algorithms used to answer interesting questions and derive value from the 0s and 1s we’re furiously chewing up and spitting back out.

Some pretty cool examples:

  • Nest – a beautiful thermostat that learns how hot or cold you like your house so you never have to adjust it again (not technically big data, but fun nonetheless)
  • Gmail’s Bayesian spam filter – no more tempting emails from that pesky Nigerian prince!
  • Amazon’s product recommendations – sure, I’ll take a JavaScript book, a pair of Asics, and season 1 of Game of Thrones.  How do they know me so well?!
  • Varonis’ access control recommendations – ratchet down access based on highly accurate analytics.

If you’re interested in learning more about big data, join our webinar this Wednesday on Mastering Big Data.

photo credit: http://fav.me/d4vqn4w

What should I do now?

Below are three ways you can continue your journey to reduce data risk at your company:

1

Schedule a demo with us to see Varonis in action. We'll personalize the session to your org's data security needs and answer any questions.

2

See a sample of our Data Risk Assessment and learn the risks that could be lingering in your environment. Varonis' DRA is completely free and offers a clear path to automated remediation.

3

Follow us on LinkedIn, YouTube, and X (Twitter) for bite-sized insights on all things data security, including DSPM, threat detection, AI security, and more.

Try Varonis free.

Get a detailed data risk report based on your company’s data.
Deploys in minutes.

Keep reading

Varonis tackles hundreds of use cases, making it the ultimate platform to stop data breaches and ensure compliance.

speed-data:-preparing-for-the-unknown-in-cybersecurity-with-ian-hill
Speed Data: Preparing for the Unknown in Cybersecurity With Ian Hill
Ian Hill, the Director of Information and Cybersecurity for Upp Telecommunications, offers his take on AI and the future of tech, shares his tricks for a good cyber defense, and explains why the best-laid plans of mice and security professionals often go astray.
is-your-company-prepared-for-a-cyber-attack?
Is Your Company Prepared for a Cyber Attack?
Would your company survive a cyber attack? Use our flowchart to see if your business is prepared for cybersecurity threats.
do-executives-and-cybersecurity-pros-agree-on-today’s-biggest-cyber-threats?
Do Executives and Cybersecurity Pros Agree on Today’s Biggest Cyber Threats?
Breaches cost companies billions, erode trust and can have a long-lasting negative impact on a company’s brand. With so much as stake, we wondered: are C-Suite executives aligned with their...
speed-data:-rethinking-traditional-cybersecurity-principles-with-rick-howard
Speed Data: Rethinking Traditional Cybersecurity Principles With Rick Howard
Rick Howard, author, journalist, and Senior Fellow at the CyberWire, chats about his new book on rebooting cybersecurity principles with Varonis' Megan Garza.