Share what you know with millions of people

Focus is the best place to turn what you know into remarkable content
×
0

Does 'big data' need to be better defined?

It seems like there's a lot of confusion over what 'big data' actually means. Does 'big data' need to be better defined? How should it be defined? Why has there been so much confusion?

Attachments

4
Barry Devlin
Founder and Principal, 9sight Consulting
Posted on Aug. 5, 2011

Sorry Robert - disagree entirely. Big data is a largely useless name except for marketing. Any serious discussion I've ever been involved in on big data starts with a "what do you mean by...?" This indicates to me that the definition is so vague that almost any vendor can claim to have a product that supports big data.

The term covers (1) a wide variety of data types, (2) many storage formats, (3) a wide range of volumes amd most importantly (4) many, many business uses. Any sensible desision-making about how to meet a real business need has to parse out points (1) - (3) before you can decide on a solution. Calling it big data just doesn't help.

0
Robert Keahey
Robert Keahey Replied on Aug. 5, 2011

So let's call it "cloud data" and that will make it much simpler! :)

0
Andrew Baker
Andrew Baker Replied on Aug. 11, 2011

Barry, I think this naming problem you refer to is prevalent because we seem to be more interested in coining phrases that are easy to market, than in clearly communicating concepts for the purpose of enlightenment or problem solving. And, once a name has been coined, we use it for everything that is even peripheral to the originally intended target.

2
Robert Keahey
IT, Business and Social Strategist/Commentator, SummaLogic LLC
Posted on July 29, 2011

I don't think so. But it could probably benefit from some clarification.

Big data should not be confused with "lots of data". You can have hundreds or thousands of terabytes of online and archived data, but that doesn't mean you have a big data problem. You normally only need to retrieve a small portion of that data for processing purposes at any given time.

But if you need to analyze all or most of that data in total and the traditional tools, processes and procedures you have won't support that analysis, then you need to turn to other solutions - such as using MapReduce.

What's interesting is that big data is no longer the purview of just the big online companies like Amazon, Facebook, Google, Yahoo!, or highly specialized companies such as pharmaceuticals, energy or geophysical. Even mid-market companies are starting to turn to big data analysis to determine social media/networking trends, contextual awareness, buying patterns, etc. This is driving a rich open source market, with players such as Cloudera, Jaspersoft and Revolution Analytics offering solutions that scale accordingly.

0
Dan McComas
Dan McComas Replied on Aug. 6, 2011

testing to see if replies now work. Please ignore.

1
Stephen O'Donnell
CEO and Chairman, S1NED Limited
Posted on July 30, 2011

There is one other (normal) characteristic of Big Data and that is that it has a high update rate. This makes it very difficult to use conventional B-tree type indexing systems to provide rapid lookups.

Sometimes Big Data is unstructured, that is the records are not congruent, sometime with missing fields, shorter or longer than the average.

A promising approach for dealing with certain Big Data use cases is to trade consistency for eventual consistency. As Robert says in his answer, MapReduce is one (now quite old) implementation of a Big Data solution. By making this trade off the data manipulation can be done in parallel (scale out) rather than in series (scale up).

Low cost COTS hardware and smart software are combined to drive very low costs per answer.

A major impediment to adoption of Big Data solutions is a skills shortage.

1
Neil Raden
Vice President & Principal Analyst, Constellation Research Group
Posted on Aug. 10, 2011

I don't know what's so special about Big Data. When we built data warehouses of 200+ GB in 1994, that was really big data. Maybe someone will come up with a different name but for me, Big Data implies the kind of data that we don't encounter as part of the normal processes of the business (unless you're comScore, e.g.). The first wave to hit most companies was clickstream data from their early B2B and B2C websites. But now it's everywhere. But it isn't just the size that makes it Big Data (after all, eBay has a 39 PETABYTE single instance on Teradata), it's the perishability and rapid update. It really isn't like what we've been used to, even in large data warehouses. However, many organizations will never stick their toe in the water of Big Data by either forgoing it or by using the services of third parties that process it for them, basically, data aggregators.

0
Barry Devlin
Barry Devlin Replied on Aug. 15, 2011

Well said, Neil! It's a moving target, and one that applies only to some organizations.

0
  • Recommended by:

Two things: name and definition. The name is poor. We at rainmakerfiles suggest Savage Data. http://rainmakerfiles.com/2011/05/big-data-poor-label/
The definition takes some effort to grasp as Robert and Stephen elaborate on, but the problem is that vendors like new buzzwords without explaining them properly. This thing requires homework, but then people will find that this is pretty cool innovation. Solving Savage IT anyone?

0
  • Recommended by:

Robert: Cloud data is not what big data is about. That label does not solve the problem.
Barry: I agree with you, and look forward to your feedback on my Rainmaker post mentioned in this trail.

0
Barry Devlin
Barry Devlin Replied on Aug. 7, 2011

Claus, I agree that cloud data is not what big data is about.

I took a look at your post, too. I see where you're coming from, but I do believe that trying to define big data as a single concept is like trying to find the common concept behind fire, earth, air and water. Ancient philosophers recognized them as different things and dealt with them separately. Saying from a modern science point of view that they're all made of fundamental particles is useful only to a particle physicist!

As one example, the term "big data" encompasses both social networking data and machine-generated data from mechanical sensors such as monitoring devices in autos and RFID tag readers. Yes, they both contain large quantities of bits, but in terms of using them, I suggest there are far more differences than commonalities.

0
Adrian Alleyne
Director of BI Market Research, DecisionPath Consulting
Posted on Aug. 10, 2011
  • Recommended by:

I think Barry's really hit on the heart of the matter: the differences. As a useful definition, I like to think of big data as the amount and type of data that will cause your current analytics infrastructure to fail. If you're a small-midsize company, that could be several terabytes of data. I think more than the definition is -- as Barry points out -- the underlying problem (and also opportunity) of finding a way to integrate or synthesize existing structured data (transaction) new strucgture data (sensor, etc..) and unstructured data that exist in entirely different environments. The size of those data sets is secondary.

0
Andrew Baker
Director, Service Operations, SWN Communications Inc.
Posted on Aug. 11, 2011
  • Recommended by:

I would say that, yes, the term needs to be more clearly and consistently defined.

It is broadly used by anyone and everyone looking to peddle a solution that might come close to the topic at hand. I would argue that -- like cloud computing -- some significant percentage of people using term do not really know how to apply it, or what it fully encompasses.

This seems to be an increasing problem in today's world of sound bites, catch phrases and heavy marketing. It used to be that computer jargon was what you had to worry about, but it seems more and more like plain old English is being distorted for purposes of either technology or business, to the detriment of proper communication.

0
Robin Goodchild
Owner, Antarctic Technologies
Posted on Aug. 11, 2011
  • Recommended by:

"Big data" to me suggests that someone needs to go back to school and re-study English grammar!!

I treat it the same way as "Big Society", "going forwards", &tc; with complete and utter contempt!!!

Who invents these nonsensical phrases anyway? Is someone retaliating against a nasty English teacher they had in 2nd grade?

0
Barry Devlin
Founder and Principal, 9sight Consulting
Posted on Aug. 15, 2011
  • Recommended by:

IDC's "Digital Universe Study" 5th annual update was released recently. Most coverage has been around the size and growth rate of the data universe... and we're talking really big data! However, of more interest for me are the implications for IT data managment, some of which the authors bring out. See my blog http://bit.ly/mPFfUk for more details.

0
Peter Johnston
Director (CEO), Intelligent Prospecting
Posted on Aug. 17, 2011
  • Recommended by:

I'm stepping out of my knowledge zone here on definitions and IT but I can see a revolution going on in marketing data.

We are in the midst of the cloud revolution. Those of us brought up in a different era are worried about our songs not being a physical presence on our machine, but being in the cloud. Five years ago it was that CDs were physical and we didn't like them just being inside a machine. Mindsets move on.

We have the same with data. In the old world it was something paper - held in our rolodex. That seems ludicrous now. Then it was in our CRM. Big lists of everyone we'd ever met, mostly incomplete, often duplicated and totally out of date. A dumb tool too - you only got out what you put in.

Wouldn't it have been more useful if your CRM said - you have three top companies in the oil and gas industry - here's the rest of the top ten. If it said - you need CEO, CFO and COO to sell your product but 30% of your data only has two of those and 50% only has one - so here's the rest.

And wouldn't it be useful if it was just something your systems looked up as they needed to - accessing always on, up to date information. Much richer data too - not just name rank and phone number but their whole digital footprint. And with real connections - which company is owned by whom, what goes on in this location of this company and who works with who across the globe. Even past conversations on the subject, white papers the person has done which are related and connections your contact has with experts in the field.

That is 21st century data. I see it being critical for modern marketing to have much richer data than currently, but it won't all be stored in-house - it will link to online always-on resources and connect proprietary with publicly available in real time.

0
Leighton Jenkins
Principal, tJP
Posted on Aug. 17, 2011
  • Recommended by:

I sat with a group of very technical data analysts recently. They distinguished between

Big Data and BigData. The missing space was very important to them. In the mind of many in this arena BigData is unstructured data( facebook posts for example) and built on relational databases, with its own set of issues.

BigData in this defn is not high transaction load OLTP systems.

Not sure we got to a defn of Big Data ( with space) :-)

Answer This Question