Share what you know with millions of people
Focus is the best place to turn what you know into remarkable content
0
How is data mining different from everyday statistical analysis?
Events
- Dos and Don'ts of Small Business Marketing May 29 @ 11 am PT
- Lead Nurturing 202: The Next Generation May 31 @ 11 am PT
- The Tricks to Paid Media June 6 @ 11 am PT
- Display Advertising for Brand Awareness June 20 @ 11 am PT





2 Answers
Data Mining finds non-intuitive facts.
Like if a travel site advertises a beach vacation package on a Tuesday it finds it gets 50% more sales than offering the exact same package on a Wednesday. Or a retail site that offers a free pen with each order for a limited time finds that it sells a lot more umbrellas every time it offers the free pen. Umbrellas and Pens should not be connected, yet somehow the data indicates that they are.
There appears to be a little confusion of terms here.
In the typical business, analyzing data does not always mean statistical analysis. A report does not involve statistics, but real-world summed data. Thus, an analysis that slices the data a different way, resulting in another report, is data analysis but not statistical analysis.
A data miner analyzes data, as a profession. In the real world, that means taking reports and "usual practice" statistics as the basis for further data slicing, exploratory data analysis, and statistical analysis. Thus, the data miner is someone whose analyses go beyond the typical "canned" ones.
As the amount of data to be analyzed has become enormous, a significant problem with data mining has surfaced, not typically found with traditional data and statistical analysis. It has become exceptionally easy to run many "models" against a set of data, so that by the laws of probability one of those "models" will appear to work even though it doesn't (thus, for example, if you get significance "at the 5% level" for one of one hundred models, that probably just means that you found one of the models that looks like it works but doesn't -- which happens for one out of 20 models). This is particularly notable in the financial industry, where data miners will often claim to have found arbitrage opportunities that turn out to be no such thing.
Your net-net: if a BI product claims it supports "data mining", that's no big thing these days -- everyone does it. If it claims to support sophisticated statistical analysis, that is a big thing; most BI tools don't provide some of the latest statistical techniques. However, if someone is doing data mining with one of these products, watch out "for data mining bias" -- what I described above. The best cure for "data mining bias" is (a) common sense about what the model is saying, and how likely it is to be true, (b) limiting the number of alternative models to the 5 or 6 most likely.
Answer This Question