Share what you know with millions of people

Focus is the best place to turn what you know into remarkable content
×
0

Business Intelligence ETL Definition?

When it comes to business intelligence, what does the acronym "ETL" stand for? I had someone say it meant, "Exact, Transform, Load," but they couldn't really explain anything beyond that. Does anyone have any idea what ETL means and how it relates to a BI system??

Attachments

0
Geoff Bazira
Manager, Information Technology, SAP
Posted on July 30, 2010
  • Recommended by:

Hi Gary, there are good books with examples that will do a thorough job of explaining ETL for BI.

Having said that, here is my very simplified example:

Let's define Business Intelligence as technique of analyzing business data to aid better decision making. A BI system would have a backend (datastore/warehouse) and a frontend (dashboard).

Business Intelligence would therefore require that data is collected, consolidated and aggregated in a way that makes it easy for a decision maker to see patterns and correlations. For example, a CIO trying to determine how his/her department is performing financially may want to see a BI dashboard which charts expenses against budget.

In this example, let's say that expense information is taken from the departments "Vendor A" financial system, while the budget information is taken from the departments "Vendor B" financial system.

The process of taking/pulling data from source systems into your BI system is what we call "EXTRACT". Data analyzed in a BI system is sometimes taken from multiple non-similar source systems. Sometimes complex scripts or specialized tools are required for this.

Data extracted is sometimes not in the format that you would like to have it stored in for your BI system. It is possible that there may be fields in extracted data that you don't need included in your reports, or data that needs to be aggregated (e.g. totaled or averaged) before it can be displayed in the target BI dashboard. The series of steps taken to change the extracted data into the format required for the target BI dashboard is what we call the "TRANSFORM" stage.

The "LOAD" stage is simply takes the extracted (and possibly transformed) data into the BI system data store. Very simplistically a BI system will have a backend where data is collected and stored (call it warehouse or datastore), and frontend where the data is visualized (i.e. dashboard/reports).

Not sure I fully addressed your question.

Regards.

0
Forrest
Posted on Aug. 2, 2010
  • Recommended by:

We found it necessary to use ETL to transform legacy data into a database format that was more flexible and responsive for a web based client BI portal we are developing. The ETL relies upon data maps that are executed in BATCH to update the data warehouse on a frequency that is relevent to the data being transferred.

0
Hrvoje Smolic
Co-founder | CEO | Creative Director, Qualia d.o.o.
Posted on Aug. 6, 2010
  • Recommended by:

ETL indeed stands for Extract, Transform and Load.

Point is, if you need to build a Data Warehouse, you have to
'Extract'
your ERP-generated data, Excel-based-data and so on, then to
'Transform'
it in various ways for data cleanining purposes for example and finally to
'Load'
the data into a Data Warehouse.

Important is that your ERP system is optimised for fast writing, but your Data Warehouse system is optimised for fast reading (reporting).

0
Brian Dudley
Senior System Architect, Perficient
Posted on Aug. 6, 2010
  • Recommended by:

This is an answer which I had posted on LinkedIn and am copying here for completeness.

ETL stands for Extract, Transform, and Load and refers to moving data from one system and schema to another. It is related to BI because it is common to move data from operational systems into a Data Warehouse, analysis tool, or even to a vendor.

The Extract phase is usually pretty straightforward and often just involves getting all data from a particular table or set of tables. It is important to consider what restrictions to put on the data, such as New and Changed records, records in a particular date range, or all data. Since it is likely that multiple tables will be chosen, it is important that the data be consistenly pulled -- i.e. all the detail for master records. Code values need to be handled either by exporting code tables or translating the codes during the pull. A final consideration is what form the extracted data will take. Will it go directly into a staging database or first to flat files? It might even be communicated via a messaging system.

Transformation is the meat of the operation and often involves a high degree of data cleansing. The goal is to conform data from different systems to a common schema. This can involve matching customer names into a common customer identifier, translating code values from the operational system to a different conformed set, validating data or flagging suspect fields.

The final step is to load the new schema. This is likely a Data Warehouse in a star-schema or a cube of some sort. Issues like surrogate keys assignment, slowly changing dimensions, and audit fields somewhat straddle the line between Transform and Load.

It is common to get software specifically designed for ETL operations, but it can be done with homegrown tools and database scripts.

I mention sending data to a vendor since I once worked at an auditing firm and we would get client data and do analysis on our systems.

0
Chris Spivey
President, Spivey & Co. LLC
Posted on Aug. 6, 2010
  • Recommended by:

If you're interested, I have a presentation "Data Integration and ETL for the Auditor" that I have given for the IIA.. It has some good examples and a BI Architecture from the AICPA you might find useful. It also has a high-level discussion of the canonical model concept. Email me at Chris.Spivey@SpiveyNCo.com and I will send a copy.

Answer This Question