Information Technology

                        

                                                                    Business Objects
                                                                              
 BO Home *     
 

                           Data Warehousing

In a nutshell, a data warehouse is a "repository for historical data, used to make decision". Of course there's much more to it than that. 

Data warehousing is a concept. It is a set of hardware and software components that can be used to better analyze the massive amounts of data that companies are accumulating to make better business decisions. Data Warehousing is not just data in the data warehouse, but also the architecture and tools to collect, query, analyze and present information.

Operational data is the data you use to run your business. This data is what is typically stored, retrieved, and updated by your Online Transactional Processing (OLTP) system

Informational data is created from the wealth of operational data that exists in your business and some external data useful to analyze your business. Informational data is what makes up a data warehouse. Informational data is typically:

  • Summarized operational data
  • De-normalized and replicated data
  • Infrequently updated from the operational systems
  • Optimized for decision support applications
  • Possibly "read only" (no updates allowed)
  • Stored on separate systems to lessen impact on operational systems

Creating the informational data, that is, the data warehouse, from the operational systems is a key part of the overall data warehousing solution. Building the informational database is done with the use of transformation or propagation tools. 

These tools not only move the data from multiple operational systems, but often manipulate the data into a more appropriate format for the warehouse. This could mean:

  • The creation of new fields that are derived from existing operational data
  • Summarizing data to the most appropriate level needed for analysis
  • Denormalizing the data for performance purposes
  • Cleansing of the data to ensure that integrity is preserved.

The heart of the data warehouse is the dimensional (a.k.a. star or snowflake) schema: a central fact table surrounded by dimension and transform tables.

Think of the fact table as the business process you are measuring, such as a sale or shipment. 

The dimension tables will contain the selections and their attributes which will be used to constrain the records in the fact table, as well as provide the values used for grouping and sorting the results.

The simple example above models a retail business that operates stores in a number of different markets. Each row in the fact table (Sale Fact) contains information about the sale of an item (UnitsSold and UnitPrice). Notice that the fact table is inherently normalized. You should also notice that the facts are additive along all dimensions, i.e. no matter which dimension we choose to group the sales facts by, we can sum either UnitsSold or UnitPrice and the results make sense. 

Some of the popular data stores for data warehousing are relational databases like Oracle, DB2, Informix or specialized Data Warehouse databases like RedBrick, SAS.

Benefits of Data Warehousing

A well designed and implemented data warehouse can be used to:

  • Understand business trends and make better forecasting decisions
  • Bring better products to market in a more timely manner
  • Analyze daily sales information and make quick decisions that can significantly affect your company's performance

Data Marts

Data marts are workgroup or departmental warehouses, which are small in size, typically 10-50GB. The data mart contains informational data that is departmentalized, tailored to the needs of the specific departmental work group

Another advantage is the data mart is its portability. Data marts can be loaded onto laptop computers and used by people who requires access to the information while they travel. The laptop computer may be transformed monthly, weekly, or even daily if necessary.

EX: It may be that the first data mart is used for specific querying and reporting while the second data mart is used for OLAP, data mining, EIS or even another instance of reporting.

When designing your data warehouse model, you may have unlimited data marts, and each data mart may have repeated data, if necessary.

What is Data Warehousing used for ?

Reporting : 

General queries applied to the data warehouse with the intent of finding answers to specific questions. That is the  basic "who, what, when, where and how many" of the data.  

Ex: how many CARS did the company sell in the quarter there ? Who bought them? Are sales are up on the quarter two? If not, why not?

Online Analytical Processing (OLAP):

Relational databases store data in a two dimensional format: tables of data represented by rows and columns. Multi-dimensional analysis solutions, commonly referred to as On-Line Analytical Processing (OLAP) solutions, offer an extension to the relational model to provide a multi-dimensional view of the data.

Multi-dimensional solutions provide the ability to:

  • analyze potentially large amounts of data with very fast response times
  • "slice and dice" through the data, and drill down or roll up through various dimensions as defined by the data structure

The foundation of OLAP is the ability to "drill down" into data, as far into its minute detail as is necessary to get the answer you need.

Consider the scenario : We need to review the results of a straw poll taken before a national election, in order to evaluate the possible outcome.  Analysis of the results of this poll on a countrywide level could tell us, essentially who was likely to win the election.

We could then drill down to state level and identify those state where the result was likely to be marginal.

We could then go down to county level and find out whether it was a close call between the contesting parties all over the state, or if support for a certain party was strong in some areas and poor in others.

If data was available from the other years we could look for strong swing in voting habits and try to assess the reason for this.

The major convenience of OLAP tools is their ability to dynamically represent the data without re-querying the database.

There are three types of OLAP.

Multidimensional OLAP (MOLPA), 

Relational OLAP (ROLAP) and 

Hybrid OLAP (HOLAP).

Data Mining :

Data mining is the process of analyzing business data in the data warehouse to find unknown patterns or rules of information that you can use to tailor business operations. For instance, data mining can find patterns in your data to answer questions like:

  • what item purchased in a given transaction triggers the purchase of additional related items?
  • how do purchasing patterns change with store location?
  • what items tend to be purchased using credit cards, cash, or check?
  • how would the typical customer likely to purchase these items be described?
  • did the same customer purchase related items at another time?

What's unique about data mining is that we don't need to ask any questions. The tools assume that we don't know exactly what information we're looking for.

Executive Information :

Finding key summary information. for the purpose of making business decisions, without having to wade through reams of data.