Professional Documents
Culture Documents
PENJOR NGUDUP
Dr. CHEN
ABSTRACT
Many corporations are experiencing significant business benefits of using data warehouse
technology. Users report gains in market competitiveness through increased revenue and
reduced costs through information management. Data warehousing is thus a major issue
within most organizations, and thus the development of a data warehouse with a strong
base is essential. This paper aims to present the important concepts of Data Warehousing
such as Data Warehousing tools and the benefits of Data Warehousing, that a manager
company.
INTRODUCTION
2
It has become a strategic imperative for corporations to know more about its customers
and prospects than ever before. Corporations are competing in a world that is moving
faster, and in more directions, than at any time in history. The need for information is
growing at an increasing rate thus, the more we know, the more we need to know. Where
once it was the job of the information technologists to study customer data; nowadays
even the president of the company may need to sieve through the databases of the
corporation to retrieve clues for better marketplace performance. (Rolleigh and Thomas,
2002)
The importance of data warehousing in the commercial segment arises from the need for
enterprises to gather all of their information into a single place for in-depth analysis, and
the desire to decouple such analysis from online transaction processing systems. (Widom,
1995)
But this isn't an easy endeavor. Before people can get ready access to data, and be able to
insane to just go out and have your information technology organization buy a data
warehouse. If any corporation does take such an action it will undoubtedly join the ranks
of many big-name corporations that have made humongous investments that have failed
If a company does indeed want to succeed with data warehousing, it has to build cross-
organizational consensus and support for a way of business that is empowered by real
customer data. And then the warehouse has to be tailored for the specific requirements of
the company, this means a lot of careful, tedious and time-consuming steps are required.
Let's look at what goes into creating a rich data warehouse, and what we need to know
about it. This paper will first introduce the concept of data warehouse in a simple
3
straightforward manner followed by the major components of a data warehouse and the
various structures of a data warehouse. The paper will then follow on by presenting the
data warehousing methodologies. And finally it will discuss the advantages and
Data warehousing is a concept. It is a set of hardware and software components that can
be used to better analyze the massive amounts of data that companies are accumulating to
make better business decisions. Bill Inmon widely considered as the 'father' of data
This implies that within a warehouse, the data will be organized by entity (such as
customer or product) rather than application (sales or purchase), that both current and
historical, time stamped data will be present and that once stored in the warehouse it
Figure 1
4
However, a simple definition of data warehouse, as Ralph Kimball puts it, is “(a) data
warehouse is a copy of transaction data specifically structured for querying and analysis.”
(Kimball, 1996)
But this definition does not encompass the entirety of data warehousing. Sometimes non-
transaction data are stored in a data warehouse - though probably 95-99% of the data
usually are transaction data. Additionally, "querying and reporting" rather than "query
and analysis" is key when talking about the functionality of data warehousing because the
main output from data warehouse systems are either tabular listings (queries) with
minimal formatting or highly formatted "formal" reports. Queries and reports generated
from data stored in a data warehouse may or may not be used for analysis. (Greenfield)
Nevertheless, data warehousing doesn't just make data available. Data warehousing is the
process of making your operational data available to your business managers and
access. Of course, this efficiency doesn't happen magically. Corporations must first
identify what it is that they require from the data and the decision support applications,
and then they must evaluate the current operational data to determine how to transform
that data into what adds value to the output provided by the corporations. The tools that
you choose for your warehousing solution will take data from your operational systems
(extract it), convert your operational data into business information using your defined
business rules (transform it), and create a data warehouse (load it).
(www.sas.com/rnd/warehousing)
Operational systems create data "parts" that are loaded into the warehouse. Some of those
parts are summarized into information "components" and stored in the warehouse. Data
5
Warehouse users make requests and are delivered information "products" that are created
Data warehousing is one of the hottest industry trends - for good reason. A well-defined
and properly implemented data warehouse can be a valuable competitive tool. (Perkins)
Figure 2
Summarized Data :- There are two kinds of summarized data, lightly summarized data
and highly summarized data. Lightly summarized data are the hallmark of a Data
summarized data for every department. Highly summarized data are primarily for the
executives. Highly summarized data can come from either the lightly summarized data
6
used by enterprise elements or from current detail. If executives require more detailed
information they have the capability of accessing increasing levels of detail through a
Current Detail :- The heart of a Data Warehouse is its current detail, where the bulk of
data resides. Current detail comes directly from operational systems and may be stored as
raw data or as aggregations of raw data. Every data entity in current detail is a snapshot,
at a moment in time, representing the instance when the data are accurate. Current detail
is typically two to five years old. Current detail refreshment occurs as frequently as
System of Record :- A system of record is the source of the data that feed the data
warehouse. Data in a data warehouse differ from operational systems data in that they can
only be read, not modified. Thus, it is necessary that a data warehouse be populated with
the highest quality data available, i.e., data that are most timely, complete, accurate, and
Integration and Transformation Programs :- Even the highest quality operational data
cannot usually be copied, as is, into a data warehouse. As operational data items pass
programs convert them from application-specific data into enterprise data. These
values; Supplying logic to choose between multiple data sources; Summarizing, tallying,
and merging data from multiple sources. When either operational or Data Warehouse
7
environments change, integration and transformation programs are modified to reflect
that change.
Archives :- Data Warehouse archives contain old data (normally over two years old) of
significant, continuing interest and value to the enterprise. There is usually a massive
amount of data stored in the Data Warehouse archives, with a low incidence of access.
Archive data are most often used for forecasting and trend analysis. Archives include not
only old data (in raw or summarized form); they also include the metadata that describes
Metadata :- One of the most important parts of a Data Warehouse is its metadata - or
data about data. Also called Data Warehouse architecture, metadata is integral to all
levels of the Data Warehouse, but exists and functions in a different dimension from
other warehouse data. Metadata that is used by Data Warehouse developers to manage
and control Data Warehouse creation and maintenance resides outside the Data
Warehouse. (Perkins)
Along with the various components of a data warehouse there are various structures of a
There are various structures of a data warehouse that a corporation can adopt based on its
needs. The physical data warehouse, the logical data warehouse and the data mart.
8
Figure 3
Physical Data Warehouse :- is a physical database in which all the data for the data
warehouse are stored, along with metadata and processing logic for scrubbing,
Logical Data Warehouse :- like physical data warehouse also contains metadata,
including enterprise rules and processing logic for scrubbing, organizing, packaging and
processing the data, but does not contain actual data. Instead, it contains the information
necessary to access the data wherever they reside. This structure is effective only when
there is a single source for the data and they are known to be accurate and timely.
warehouse development process, an enterprise builds a series of physical (or logical) data
marts over time and links them via an enterprise-wide logical data warehouse or feeds
9
DATA WAREHOUSING METOHODS
All of these fall into one of two categories: the big bang approach or the iterative
approach.
A big bang methodology tries to solve all known problems by creating a huge data
warehouse before you release it for evaluation and testing. Many people believe that this
data to be incorporated, and your intimate knowledge of your business and data, you may
be able to accomplish your warehousing project with a big bang methodology. But, there
To create a data warehouse, the corporation must plan its warehouse, evaluate and install
the necessary software and hardware, collect business requirements, and become familiar
with its corporate data. While these tasks are taking place:
• The business goals of the corporation can change due to changes in the market or
technology.
• Management supporters can lose interest in this project if you don't keep them
• The corporate data could change (they may start collecting Web log data)
• New releases of the firm’s chosen software may become available (warehousing
is still an evolving market and even the best tools continue to improve and
change).
10
The items listed above are just a few of the business and technical changes that could
impact your plans. If you are not plugged into the proper channels, any one of these
changes could cause your project to fail because you cannot quickly respond the
necessary changes.
Iterative Approach
With an iterative methodology, the corporation breaks its warehousing project into small,
manageable chunks, referred to as projects. In the iterative approach, the same planning
tasks are performed that are required in the big bang approach, but evaluation of all of
your deliverables up front is not required. The corporation must design its overall
architecture, but when entering the planning phase, it needs to concentrate only on its first
project or iteration. After each project, review of the architecture, its development
The value of smaller projects within the larger warehousing process is:
solution quickly. This keeps the management supporters involved and interested
in the project.
• It can adjust to changes in the business requirements faster because the team is
testing, which provides it with user needs and defect reports. Additionally, users
are provided with better feedback when they can see the system than when they
have to envision it from a slide presentation. This feedback can improve the
11
corporation’s goals and processes for the next iteration.
(www.sas.com/rnd/warehousing)
Although the business may ask for everything to be delivered by the warehouse at once,
taking a "big bang" approach may not be a prudent step. Instead, breaking the project into
parts or releases, and establishing a clear set of objectives and definition of success for
each release of the project will be appropriate. Each release will need to go through the
and support. By delivering functionality and business value with each short release, a
corporation can integrate its data, create the proper roles, train its users, gain insight and
assimilate lessons learned. At the same time, by using an iterative approach a corporation
can adjust its data warehouse's content to add new sources, which were not previously
considered, as it gains insight into its business and analytics. Perhaps most important, an
iterative approach enables the project team to demonstrate a few quick wins in terms of
business value delivered. Due to its iterative nature, a data warehouse is a journey not a
As it is exhibited in Figure 4, the iteration methodology thus starts with the Initial
warehousing project. Followed by the Analysis phase; evaluation of the feasibility of the
data warehouse, gathering business requirements, and getting agreement on the goals and
purpose of the warehouse. Third is the Design phase; analyzing and designing the
actual data warehouse structure and population of the data warehouse. Fifth is the Testing
phase; data cleansing and parameter clarification (may send back to the design phase for
another iteration). Followed by, the Implementation phase; the rolling out the production
12
environment and providing user training. And finally the Maintenance phase; monitoring,
Initial Organization
Analysis
Design
Development
Testing
Implementation
Maintenance
Figure 4
At this point, provisions for an easy path for user feedback should be established and
review of the five steps must be done and necessary adjustments should be made. After
the review of the process, the next iteration or project should be started from the
assessment phase. The assessment and requirements phases should require less process
As in any endeavor selecting the correct tools is paramount for success. As the old
Chinese adage says,” To accomplish a goal, make sure the proper tools are selected."
13
Given the complexity of the data warehousing system and the cross-departmental
implications of the project, it is easy to see why the proper selection of tools and
personnel is very important. This section of the paper will present information on such
selections.
There are two steps that top management is concerned with when building a data
warehouse. One step is choosing a vendor. However, in doing so there are certain basic
charge an annual support fee that is 15-20% of the software product license. But
the question is, will any software issues be handled promptly by the vendor or
not?
of consulting proposal does the vendor give? Is the personnel requirements and
consulting team before signing on the dotted line. On the education front, what
type of training is available? And how much is the consulting team willing to do
knowledge transfer? Does the consulting team purposely hold off information so
that either 1) you will need to send more people to vendor's education classes, or
2) you will need to hire additional consulting to make any changes to the system.
3. Stability :- More than anything else, this is probably the most important measure.
It may even be more important than the current functionalities that the tool itself
provides, for the simple reason that questions whether the vendor is going to be
around for a while or not, or will it be able to make enhancements to its tool?
14
The other step is selecting the right team to build the data warehouse. In this there are two
possibilities; one is to use external consultants and the other, to hire permanent
employees.
fact of the matter is, even today, people with extensive data warehousing
backgrounds are difficult to find. With that, when there is a need to ramp up a
1. They are less expensive. With hourly rates for experienced data warehousing
professionals running from $100/hr and up, and even more for Big-5 or vendor
2. They are less likely to leave. With consultants, whether they are on contract, via a
Big-5 firm, or one of the tool vendor firms, they are likely to leave at a moment's
notice. This makes knowledge transfer very important. Of course, the flip side is
that these consultants are much easier to get rid of, too.
However, management must understand that there are various entities that play important
1. Project Manager: This person will oversee the progress and be responsible for the
15
2. DBA: This role is responsible to keep the database running smoothly. Additional
tasks for this role may be to plan and execute a backup/recovery plan, as well as
performance tuning.
3. Technical Architect: This role is responsible for developing and implementing the
4. ETL Developer: This role is responsible for planning, developing, and deploying
the extraction, transformation, and loading routine for the data warehouse.
5. Front End Developer: This person is responsible for developing the front-end,
Creeth) The role of the OLAP Developer is thus very crucial. He/She is
7. Trainer: A significant role is the trainer. After the data warehouse is implemented,
a person on the data warehouse team needs to work with the end users to get them
familiar with how the front end is set up so that the end users can get the most
8. Data Modeler: This role is responsible for taking the data structure that exists in
the enterprise and model it into a schema that is suitable for OLAP analysis.
16
ADVANTAGES and DISADVANTAGES OF DATA WAREHOUSING
Data warehousing has been increasingly popular in many organizations around the world.
It is not with blind belief that corporations are investing millions of dollars in data
warehousing projects.
company. Data warehousing makes retrieving information so easy that when a user query
inconsistencies and differences already resolved. This makes it much easier and more
efficient to run queries over data that originally came from different sources. (WHIPS)
Along with its numerous ease of use benefits data warehousing provides other qualitative
services and their performances, ability to make quick and proper analysis that pave the
way for better decision making can be gained from a successful data warehousing project
and thus, give the company a strong competitive advantage over the competition. (Smith)
analysis.
• The significant savings from improved data quality across the enterprise. (Smith)
17
• The ability to run complex queries easily and efficiently since query execution
does not involve data translation and communication with remote sources.
• Convenience for end users since they can use a single data model and query
language.
• Simplicity of the system design. For example, there is no need to perform query
approaches.
• Information at the warehouse is under the control of the warehouse users; thus it
From the management’s point of view the benefits and rewards are abounding for a
company that builds and maintains a data warehouse correctly. The corporation will
make dramatic cost savings and its revenues will soar. Furthermore, there will be increase
and the ability to identify and keep the most profitable customers while getting a better
picture of who they are, and it's easy to see why data warehousing is spreading faster. For
example, the telecom industry uses data warehouses to target customers who may want
certain phone services rather than doing "blanket" phone and mail campaigns and
aggravating customers with unsolicited calls during dinner. Some of the soft benefits of
data warehousing come in the technology's effect on users. When built and used
correctly, a warehouse changes users' jobs, granting them faster access to more accurate
data and allowing them to give better customer service. A company must not forget,
however, that the goal for any data warehousing project is to lower operating costs and
18
Even though the benefits of data warehousing by far outweigh the disadvantages there are
certain disadvantages of data warehousing that companies must pay heed to. The most
• Expensive initial data warehouse set up. However, after the system is in place the
cost should be low and cover only the maintenance and future modifications of
the system. Also, there is high cost in getting data translated and copied to
existing databases in time for being useful for the end user.
• A data warehouse takes time to build and time should be given to the project and
the difficulties in getting a data warehouse up and running and developed should
not be underestimated.
is often resisted until familiarity is gained with the new approach. (OCS
Consulting)
• Cost and time is also borne to develop the required new skill-set for warehouse
hardware, software and structure requires careful consideration and how they will
management of the current environment it will mean that overall less time is
• Data held in one place highlights data integrity problems and vulnerability from
the public domain thus advanced security to prevent unwanted users, including
19
competitors from accessing the data base will be of critical importance for the
CONCLUSION
Thus, we see that data warehousing does not have to be an enigma to the managers of
crucial for the companies to run a successful business. Management must believe and
even if management has to invest a massive amount of capital to build a data warehouse
it must do in hindsight of the myriad benefits that will crop up. Any company that doesn’t
see the importance and benefits of data warehousing and is blinded by the cost and
daunted by the size of the task will feel the devastating impact of the competing business
20
REFERENCES
Louis Rolleigh and Joe Thomas, Data Integration: The Warehouse Foundation, White
Papers, copyright 2002 Acxiom Corporation. Available at:
http://www.acxiom.com/displayMain/0,1494,USA~en~383~197~0~0,00.html
Spyro D. Karakizis, How to Win With Your Data Warehouse: Advice from A Data
Warehouse Expert, ©1996-2002 Accenture. Available at: http://www.accenture.com
Chuo-Han Lee (2002), Data Warehouse and Data Warehousing Copyright © 2001, 2002
Available at: http://www.1keydata.com/datawarehousing/datawarehouse.html
21
Tom Wailgum (2001), What is a Data Warehouse?
Available at: http://www.darwinmag.com/learn/curve/column.html?
ArticleID=50
22