|
Information
Technology
|
|
| Business Objects | |
|
Data Warehousing In a nutshell, a data warehouse is a "repository for historical data, used to make decision". Of course there's much more to it than that. Data warehousing is a concept. It is a set of hardware and software components that can be used to better analyze the massive amounts of data that companies are accumulating to make better business decisions. Data Warehousing is not just data in the data warehouse, but also the architecture and tools to collect, query, analyze and present information. Operational data is the data you use to run your business. This data is what is typically stored, retrieved, and updated by your Online Transactional Processing (OLTP) system Informational data is created from the wealth of operational data that exists in your business and some external data useful to analyze your business. Informational data is what makes up a data warehouse. Informational data is typically:
Creating the informational data, that is, the data warehouse, from the operational systems is a key part of the overall data warehousing solution. Building the informational database is done with the use of transformation or propagation tools. These tools not only move the data from multiple operational systems, but often manipulate the data into a more appropriate format for the warehouse. This could mean:
The heart of the data warehouse is the dimensional (a.k.a. star or snowflake) schema: a central fact table surrounded by dimension and transform tables. Think of the fact table as the business process you are measuring, such as a sale or shipment. The dimension tables will contain the selections and their attributes which will be used to constrain the records in the fact table, as well as provide the values used for grouping and sorting the results.
The simple example above models a retail business that operates stores in a number of different markets. Each row in the fact table (Sale Fact) contains information about the sale of an item (UnitsSold and UnitPrice). Notice that the fact table is inherently normalized. You should also notice that the facts are additive along all dimensions, i.e. no matter which dimension we choose to group the sales facts by, we can sum either UnitsSold or UnitPrice and the results make sense. Some of the popular data stores for data warehousing are relational databases like Oracle, DB2, Informix or specialized Data Warehouse databases like RedBrick, SAS. Benefits of Data Warehousing A well designed and implemented data warehouse can be used to:
Data Marts Data marts are workgroup or departmental warehouses, which are small in size, typically 10-50GB. The data mart contains informational data that is departmentalized, tailored to the needs of the specific departmental work group Another advantage is the data mart is its portability. Data marts can be loaded onto laptop computers and used by people who requires access to the information while they travel. The laptop computer may be transformed monthly, weekly, or even daily if necessary. EX: It may be that the first data mart is used for specific querying and reporting while the second data mart is used for OLAP, data mining, EIS or even another instance of reporting. When designing your data warehouse model, you may have unlimited data marts, and each data mart may have repeated data, if necessary. What is Data Warehousing used for ? Reporting : General queries applied to the data warehouse with the intent of finding answers to specific questions. That is the basic "who, what, when, where and how many" of the data. Ex: how many CARS did the company sell in the quarter there ? Who bought them? Are sales are up on the quarter two? If not, why not? Online Analytical Processing (OLAP): Relational databases store data in a two dimensional format: tables of data represented by rows and columns. Multi-dimensional analysis solutions, commonly referred to as On-Line Analytical Processing (OLAP) solutions, offer an extension to the relational model to provide a multi-dimensional view of the data. Multi-dimensional solutions provide the ability to:
The foundation of OLAP is the ability to "drill down" into data, as far into its minute detail as is necessary to get the answer you need. Consider the scenario : We need to review the results of a straw poll taken before a national election, in order to evaluate the possible outcome. Analysis of the results of this poll on a countrywide level could tell us, essentially who was likely to win the election. We could then drill down to state level and identify those state where the result was likely to be marginal. We could then go down to county level and find out whether it was a close call between the contesting parties all over the state, or if support for a certain party was strong in some areas and poor in others. If data was available from the other years we could look for strong swing in voting habits and try to assess the reason for this. The major convenience of OLAP tools is their ability to dynamically represent the data without re-querying the database. There are three types of OLAP. Multidimensional OLAP (MOLPA), Relational OLAP (ROLAP) and Hybrid OLAP (HOLAP). Data Mining : Data mining is the process of analyzing business data in the data warehouse to find unknown patterns or rules of information that you can use to tailor business operations. For instance, data mining can find patterns in your data to answer questions like:
What's unique about data mining is that we don't need to ask any questions. The tools assume that we don't know exactly what information we're looking for. Executive Information : Finding
key summary information. for the purpose of making business decisions,
without having to wade through reams of data. |