Share what you know with millions of people
Focus is the best place to turn what you know into remarkable content
0
Does 'big data' need to be better defined?
It seems like there's a lot of confusion over what 'big data' actually means. Does 'big data' need to be better defined? How should it be defined? Why has there been so much confusion?
Events
- Dos and Don'ts of Small Business Marketing May 29 @ 11 am PT
- Lead Nurturing 202: The Next Generation May 31 @ 11 am PT
- The Tricks to Paid Media June 6 @ 11 am PT
- Display Advertising for Brand Awareness June 20 @ 11 am PT





12 Answers
Sorry Robert - disagree entirely. Big data is a largely useless name except for marketing. Any serious discussion I've ever been involved in on big data starts with a "what do you mean by...?" This indicates to me that the definition is so vague that almost any vendor can claim to have a product that supports big data.
The term covers (1) a wide variety of data types, (2) many storage formats, (3) a wide range of volumes amd most importantly (4) many, many business uses. Any sensible desision-making about how to meet a real business need has to parse out points (1) - (3) before you can decide on a solution. Calling it big data just doesn't help.
I don't think so. But it could probably benefit from some clarification.
Big data should not be confused with "lots of data". You can have hundreds or thousands of terabytes of online and archived data, but that doesn't mean you have a big data problem. You normally only need to retrieve a small portion of that data for processing purposes at any given time.
But if you need to analyze all or most of that data in total and the traditional tools, processes and procedures you have won't support that analysis, then you need to turn to other solutions - such as using MapReduce.
What's interesting is that big data is no longer the purview of just the big online companies like Amazon, Facebook, Google, Yahoo!, or highly specialized companies such as pharmaceuticals, energy or geophysical. Even mid-market companies are starting to turn to big data analysis to determine social media/networking trends, contextual awareness, buying patterns, etc. This is driving a rich open source market, with players such as Cloudera, Jaspersoft and Revolution Analytics offering solutions that scale accordingly.
There is one other (normal) characteristic of Big Data and that is that it has a high update rate. This makes it very difficult to use conventional B-tree type indexing systems to provide rapid lookups.
Sometimes Big Data is unstructured, that is the records are not congruent, sometime with missing fields, shorter or longer than the average.
A promising approach for dealing with certain Big Data use cases is to trade consistency for eventual consistency. As Robert says in his answer, MapReduce is one (now quite old) implementation of a Big Data solution. By making this trade off the data manipulation can be done in parallel (scale out) rather than in series (scale up).
Low cost COTS hardware and smart software are combined to drive very low costs per answer.
A major impediment to adoption of Big Data solutions is a skills shortage.
I don't know what's so special about Big Data. When we built data warehouses of 200+ GB in 1994, that was really big data. Maybe someone will come up with a different name but for me, Big Data implies the kind of data that we don't encounter as part of the normal processes of the business (unless you're comScore, e.g.). The first wave to hit most companies was clickstream data from their early B2B and B2C websites. But now it's everywhere. But it isn't just the size that makes it Big Data (after all, eBay has a 39 PETABYTE single instance on Teradata), it's the perishability and rapid update. It really isn't like what we've been used to, even in large data warehouses. However, many organizations will never stick their toe in the water of Big Data by either forgoing it or by using the services of third parties that process it for them, basically, data aggregators.
Two things: name and definition. The name is poor. We at rainmakerfiles suggest Savage Data. http://rainmakerfiles.com/2011/05/big-data-poor-label/
The definition takes some effort to grasp as Robert and Stephen elaborate on, but the problem is that vendors like new buzzwords without explaining them properly. This thing requires homework, but then people will find that this is pretty cool innovation. Solving Savage IT anyone?
Robert: Cloud data is not what big data is about. That label does not solve the problem.
Barry: I agree with you, and look forward to your feedback on my Rainmaker post mentioned in this trail.
I think Barry's really hit on the heart of the matter: the differences. As a useful definition, I like to think of big data as the amount and type of data that will cause your current analytics infrastructure to fail. If you're a small-midsize company, that could be several terabytes of data. I think more than the definition is -- as Barry points out -- the underlying problem (and also opportunity) of finding a way to integrate or synthesize existing structured data (transaction) new strucgture data (sensor, etc..) and unstructured data that exist in entirely different environments. The size of those data sets is secondary.
I would say that, yes, the term needs to be more clearly and consistently defined.
It is broadly used by anyone and everyone looking to peddle a solution that might come close to the topic at hand. I would argue that -- like cloud computing -- some significant percentage of people using term do not really know how to apply it, or what it fully encompasses.
This seems to be an increasing problem in today's world of sound bites, catch phrases and heavy marketing. It used to be that computer jargon was what you had to worry about, but it seems more and more like plain old English is being distorted for purposes of either technology or business, to the detriment of proper communication.
"Big data" to me suggests that someone needs to go back to school and re-study English grammar!!
I treat it the same way as "Big Society", "going forwards", &tc; with complete and utter contempt!!!
Who invents these nonsensical phrases anyway? Is someone retaliating against a nasty English teacher they had in 2nd grade?
IDC's "Digital Universe Study" 5th annual update was released recently. Most coverage has been around the size and growth rate of the data universe... and we're talking really big data! However, of more interest for me are the implications for IT data managment, some of which the authors bring out. See my blog http://bit.ly/mPFfUk for more details.
I'm stepping out of my knowledge zone here on definitions and IT but I can see a revolution going on in marketing data.
We are in the midst of the cloud revolution. Those of us brought up in a different era are worried about our songs not being a physical presence on our machine, but being in the cloud. Five years ago it was that CDs were physical and we didn't like them just being inside a machine. Mindsets move on.
We have the same with data. In the old world it was something paper - held in our rolodex. That seems ludicrous now. Then it was in our CRM. Big lists of everyone we'd ever met, mostly incomplete, often duplicated and totally out of date. A dumb tool too - you only got out what you put in.
Wouldn't it have been more useful if your CRM said - you have three top companies in the oil and gas industry - here's the rest of the top ten. If it said - you need CEO, CFO and COO to sell your product but 30% of your data only has two of those and 50% only has one - so here's the rest.
And wouldn't it be useful if it was just something your systems looked up as they needed to - accessing always on, up to date information. Much richer data too - not just name rank and phone number but their whole digital footprint. And with real connections - which company is owned by whom, what goes on in this location of this company and who works with who across the globe. Even past conversations on the subject, white papers the person has done which are related and connections your contact has with experts in the field.
That is 21st century data. I see it being critical for modern marketing to have much richer data than currently, but it won't all be stored in-house - it will link to online always-on resources and connect proprietary with publicly available in real time.
I sat with a group of very technical data analysts recently. They distinguished between
Big Data and BigData. The missing space was very important to them. In the mind of many in this arena BigData is unstructured data( facebook posts for example) and built on relational databases, with its own set of issues.
BigData in this defn is not high transaction load OLTP systems.
Not sure we got to a defn of Big Data ( with space) :-)
Answer This Question