Share what you know with millions of people

Focus is the best place to turn what you know into remarkable content
×
0

What are the top 3 pros and cons of data warehouses?

Data Warehouses are around since the mid 80s. And still millions of $$ are spend, and still we see many failures. Why do we need data warehouses? Or why don't we?

What are the top 3 pros and cons of Data Warehouses?

Attachments

1
Barry Devlin
Founder and Principal, 9sight Consulting
Posted on May 30, 2011

There are many more than 3 pros and cons of data warehouses, and Hrvoje's choice is as good as any - although they seem to focus more on the original justification for data warehouses than current practice. I'd like to approach Gertjan's question from the viewpoint of what he seems to imply in the lead-in: do we STILL need data warehouses, especially in the light of the many failures we see?

The link to project failures is unfortunate, I feel. Data warehouse projects fail because they are complex projects with multiple external dependencies - not because they are data warehouse projects per se. The fundamental purpose of a data warehouse is to consolidate and reconcile data from disparate sources to create a trustworthy source for BI analyses. This is a complex and interdependent process, no matter whether it is done via a warehouse or not. Today, there are other possible approaches to meeting this need, virtualization, for example, but all suffer from similar complexity and interdependency issues. And are also prone to failure.

This leads to the question of whether we need a data warehouse any more. My belief is that we do, but that its size will decrease, certainly in proportion to other data needs and most likely in absolute terms too. Functions like financial reporting and accounting do need a shared and consistent view of the business - "a single version of the truth" - and a data warehouse is the ideal place to create and store it. However, there are many other functions where speed of access, huge volumes of data, or simply a need to see multiple versions of reality are more important. We see this in Hadoop systems, DW appliances and spreadsheets among other things. So, the role of the traditional data warehouse becomes more limited.

The growing challenge, of course, is how to manage the quality and integrity of data over such varied and distributed data sources.

1
Dan Linstedt
President, Empowered Holdings, LLC
Posted on May 31, 2011

I would tend to say: we do need a data warehouse, and will continue to need a data warehouse. I would also tend to disagree with Mr. Devlin, in that I believe Data Warehouses size will not shrink, but in fact grow (this is what has been happening). The need is for RAW data warehouses, not data warehouses that house "business massaged data sets".

There is still a need for business massaged data sets, and that need is growing: but it is separating from the data warehouse, and is typically known as Master Data system (hopefully with Master Data Mangement in place on top of that). The MDM solution (people, technology, process, data), will be the "single-version-of-the-truth" that the business is so desperately seeking. Leaving the Raw Data Warehouse to do it's job: record an integrated snapshot of all data coming in or leaving the organization - that is necessary from an auditability standpoint.

Again, in my opinion an EDW is needed, will continue to be needed, and will grow (not shrink). In my opinion, the RAW data Warehouse is rising, and the "massaged" or business altered SVOT (single view of the truth) will be a perception / requirement placed on Master Data Sets.

This is what I see happening all over the world, at customers I visit - from Financial companies to Oil & Gas, to Insurance.

Regards,
Dan Linstedt

0
Barry Devlin
Barry Devlin Replied on June 1, 2011

Hi Dan, I suspect that the divergence here is on what we mean when we say "data warehouse". I use DW to mean the environment consisting of hard (structured) data in the EDW and marts fed from it that has been modeled, structured and reconciled. I'm imagining you use the term in a looser sense to cover all BI support systems.

As soft (unstructured) data grows in percentage terms (now over 90%) and volumes, the question I believe we must pose is - must all of that go into/through a DW as defined above? There are even qrowing questions (for me) whether all hard information needs to be (or makes sense to be) fed through the DW. Take CDRs in Telcos for example - taking them through a layered DW structure is expensive and time-consuming. Loading them into an appliance (an independent data mart) is more efficient and timely, and virtualizing access to that store, using the EDW to assure consistency, is a reasonable thing to do. As technology performance and cost improves, we may even try to have just a single copy of the CDR data for multiple uses.

As we see more and bigger data sets ("Big Data", text, video, etc), it seems to me that we cannot afford to process and copy them all into DWs (as I define DW) and we will begin to focus the DW back to its core value - reconciling and integrating data from disparate sources (a significant overlap of function with MDM, of course) where such consistency is mandated by the business.

Bottom line: I fully agree that the volumes of information we use to support decision-making is growing fast and will continue to do so. My feeling is that the architectural concept of tthe DW (as I first defined it in the mid-80s) will become more focused on consistency and reconciliation needs rather than handling all BI information and thus beome proportionately smaller.

Best regards,
Barry.

0
Hrvoje Smolic
Co-founder | CEO | Creative Director, Qualia d.o.o.
Posted on May 28, 2011
  • Recommended by:

Hi Gertjan,
from my point of view and experience here are 3 pros and cons:

Pros:

1. Reporting and analysis are made on data warehouse data, not directly on ERP, transactional data. That way we avoid unneeded load on ERP system.
2. With data warehouse you are able to consolidate data from many different sources, several databases, xml data, excel spreadsheet etc...
3. In data warehouse you can update data in the way maybe impossible otherwise, you can enforce different business rules without touching EPR systems and you can prepare data by aggregating them in order for reporting to be fast.

Cons:
1. Warehouse Data is usually not up-to date. Most of the time, implementation is made so that transactional data are transferred into data warehouse during night jobs.
2. In order to have a data warehouse, you usually need additional hardware and software expenses
3. With data warehouse the whole system become more complex.

0
Richard Hom
Health Economics/Public Policy, Richard Hom Consulting
Posted on May 30, 2011
  • Recommended by:

As a researcher and business modeler, I feel that data warehouses are necessary for regression analysis and even predictive behavior Whether they are current or not may not be so vital, although currency can improve demand forecasting and predictive behavior.

What I find most disturbing about data warehouses are there unstructured nature. They are more patterned after form-based queries than statistical analysis. Fortunately, with SQL commands, one can construct tables that are prudent or useful.

In summary, data warehouses for statistical analysis are helpful, but the information could be structured better for the purpose of being a data set There are query or report languages that can structure them for better data manipulation, but that usually adds CPU and network time to any project.

0
Dan Linstedt
President, Empowered Holdings, LLC
Posted on June 2, 2011
  • Recommended by:

Hi Barry,

I am not so sure I do agree with your definition. Perhaps it is my perception of what Data Warehouse means. Let's see if I define it the same way...

The raw DW that I see being implemented includes both structured and unstructured data (although the unstrcutured is in fact searched and the results are "structured" in order to link to the structured world), and the raw DW is being fed by real-time messaging as well as batch. I've also seen the trend to absorb more and more external data from the outside world, in regards to XML. I've continued to see a rise in growth of the data sets from all of these sources going in to "traditional Data Warehouse."

However the traditional data warehouse (definition) has changed: it is now a passive integration point (integrated by business key), stored in raw format, and accepting all data as raw - regardless of arrival mechanisms. It has become the one-place in the organization that exists now as a system-of-record (no longer a single version of the truth). I also see huge growth in this area.

However, as you point out: "Video, docs, images, audio" usually do not get stored directly in the data warehouse (raw or not). This is true. These things must be "mined" for patterns before storing just the results in the data warehouse. However, I still am not understanding how the DW as you defined it, aligns with the DW as I've defined it, and I think I am missing your point (my apologies) on why this resource is shrinking, especially when my experience in the field is the opposite. I welcome any clarifications you would care to share with me.

Consistency and reconciliation needs are definitely highly focused, and "less but better data" is needed at this level to make better decisions. However, in order to run statistics, or mine the information, more "raw data" is needed on the raw DW to make "less but better results". Also, I am referring to the fact that most of the customers I talk to, state they are headed in the direction of "virtual data marts" with a raw DW in the back-end, and with virtual data marts in place, they can begin to make self-service BI a reality.

From that regard, the virtual data marts in fact, do have less data, and are more specific and are more targeted. But, the engines producing these "more informed decisions" need more raw data to make the information "better" with better correlated results.

So, sizes of data in the data marts is shrinking, while sizes of the data in the warehouse are growing... This is what I am experiencing in the customer base. Please let me know if I've completely missed your point (which has been known to happen). By the way, I do see a rise in master data systems, and yes, they too are smaller than the raw DW (because the MD systems do not store history nor multiple copies of the data), but they are just another publisher and subscriber on the enterprise service bus.

Thank-you kindly,
Dan Linstedt

0
Dan Linstedt
President, Empowered Holdings, LLC
Posted on June 2, 2011
  • Recommended by:

To answer the original question: here are my top 3 picks:

Top 3 reasons to have a data warehouse:
* Auditability, Raw Data Set is required in the DW for this
* Compliance with internal and international regulations
* Historical snapshots of "old, retired systems", or systems which are external and outside our control.

Pros of having a Data Warehouse
* The ability to capture all source systems, and integrate by business key across lines of business
* Unloading of work from the source systems (answering ad-hoc deep time based queries for comparison analysis)
* The ability to incorporate unstructured, and structured data in the same place
* The ability to "see" and explore the gaps between source systems, business perception, business requirements, and source applications.
* Data is an asset on the books, and historical data is an asset as long as it is needed to meet compliance and auditory regulations.

and many more....

Cons of having a Data Warehouse
* Can be difficult to manage properly for an enterprise view
* Can be challenging to load properly
* Can be brittle (if the right architecture for the data model is not chosen)
* Can require a new knowledge set, new implementation skills (compared to those required to build and operate operational systems)
* Can require lots of money invested in high speed infrastructure (CPU, RAM, Disk, Network, processing power & applications).
* Can require a specific data model for the DW to counteract the original "brittleness" of the wrong designs in your data warehouse.

Do we need one? Yes, as long as the enterprise requires a "consolidated and integrated view" across all lines of business, including external data and unstructured data.

Is there anything stopping us from making it "virtual"? Yes, the amount of money required to put all that data in RAM. This is why you see "virtual marts" arriving on seen, the marts (once a true DW / Raw DW has been put in place) can be virtualized because their size and scope can be managed and paired down.

What causes the failures?
In my view, one of the main contributing factors is the lack of the proper data model architecture, and the lack of recognition that the DW must house RAW data integrated by business key. In other words, by putting "business rules" upstream of the data warehouse, and by choosing the wrong data model, the system can become brittle, and too complex to manage. When the business users require changes/additions, and the system requires re-engineering, it can become too expensive or take too long - causing the business to "shut it down", but then the business realizes they still need BI - so they try and try to re-focus, and rebuild.

However, if they rebuild based on the same principles, they will end up right back where they are now... Shutting it down and going through the same failure as before.

It's time for a change, in order to succeed two things must happen:
1) Recognition that the DW must be a RAW DW, push the business rules / interpretation down-stream (post-data warehouse) or between the DW and the data marts
2) Selection of the right Data Model for the RAW DW layers.

Anyhow, this is what I am seeing in the market place, and many of the customers I have visited who have broken the cycle, are now succeeding on an enterprise scale within their organization.

These are my two cents anyhow.

Thank-you kindly,
Dan Linstedt

Answer This Question