You are on page 1of 22

DATA WAREHOUSING:

WHAT A MANAGER NEEDS TO KNOW

PENJOR NGUDUP

12th August 2002

MBUS 626 INFORMATION SYSTEMS THEORY AND PRACTICE

Dr. CHEN
ABSTRACT

Many corporations are experiencing significant business benefits of using data warehouse

technology. Users report gains in market competitiveness through increased revenue and

reduced costs through information management. Data warehousing is thus a major issue

within most organizations, and thus the development of a data warehouse with a strong

base is essential. This paper aims to present the important concepts of Data Warehousing

such as Data Warehousing tools and the benefits of Data Warehousing, that a manager

must understand in order to execute a successful Data Warehousing project in his/her

company.

Keywords: Data Warehouse Technology, Market Competitiveness,

Data Warehousing tools, benefits of Data Warehousing.

INTRODUCTION
2
It has become a strategic imperative for corporations to know more about its customers

and prospects than ever before. Corporations are competing in a world that is moving

faster, and in more directions, than at any time in history. The need for information is

growing at an increasing rate thus, the more we know, the more we need to know. Where

once it was the job of the information technologists to study customer data; nowadays

even the president of the company may need to sieve through the databases of the

corporation to retrieve clues for better marketplace performance. (Rolleigh and Thomas,

2002)

The importance of data warehousing in the commercial segment arises from the need for

enterprises to gather all of their information into a single place for in-depth analysis, and

the desire to decouple such analysis from online transaction processing systems. (Widom,

1995)

But this isn't an easy endeavor. Before people can get ready access to data, and be able to

make meaningful analysis of it, a lot of behind-the-scenes groundwork must be done. It is

insane to just go out and have your information technology organization buy a data

warehouse. If any corporation does take such an action it will undoubtedly join the ranks

of many big-name corporations that have made humongous investments that have failed

to provide any return on investment (ROI).

If a company does indeed want to succeed with data warehousing, it has to build cross-

organizational consensus and support for a way of business that is empowered by real

customer data. And then the warehouse has to be tailored for the specific requirements of

the company, this means a lot of careful, tedious and time-consuming steps are required.

(Rolleigh and Thomas, 2002)

Let's look at what goes into creating a rich data warehouse, and what we need to know

about it. This paper will first introduce the concept of data warehouse in a simple

3
straightforward manner followed by the major components of a data warehouse and the

various structures of a data warehouse. The paper will then follow on by presenting the

data warehousing methodologies. And finally it will discuss the advantages and

disadvantages of data warehousing ending with the conclusion.

WHAT IS A DATA WAREHOUSE

Data warehousing is a concept. It is a set of hardware and software components that can

be used to better analyze the massive amounts of data that companies are accumulating to

make better business decisions. Bill Inmon widely considered as the 'father' of data

warehousing describes it as:

"A subject-oriented, integrated, time-variant, non-volatile collection of data in support of

management's decision -making process"

This implies that within a warehouse, the data will be organized by entity (such as

customer or product) rather than application (sales or purchase), that both current and

historical, time stamped data will be present and that once stored in the warehouse it

cannot be changed (Figure 1). (Davis et al, 1999)

Figure 1

4
However, a simple definition of data warehouse, as Ralph Kimball puts it, is “(a) data

warehouse is a copy of transaction data specifically structured for querying and analysis.”

(Kimball, 1996)

But this definition does not encompass the entirety of data warehousing. Sometimes non-

transaction data are stored in a data warehouse - though probably 95-99% of the data

usually are transaction data. Additionally, "querying and reporting" rather than "query

and analysis" is key when talking about the functionality of data warehousing because the

main output from data warehouse systems are either tabular listings (queries) with

minimal formatting or highly formatted "formal" reports. Queries and reports generated

from data stored in a data warehouse may or may not be used for analysis. (Greenfield)

Nevertheless, data warehousing doesn't just make data available. Data warehousing is the

process of making your operational data available to your business managers and

decision support applications. Proper warehousing focuses on efficient information

access. Of course, this efficiency doesn't happen magically. Corporations must first

identify what it is that they require from the data and the decision support applications,

and then they must evaluate the current operational data to determine how to transform

that data into what adds value to the output provided by the corporations. The tools that

you choose for your warehousing solution will take data from your operational systems

(extract it), convert your operational data into business information using your defined

business rules (transform it), and create a data warehouse (load it).

(www.sas.com/rnd/warehousing)

Finally, we can explain data warehouse as it being analogous to a physical warehouse.

Operational systems create data "parts" that are loaded into the warehouse. Some of those

parts are summarized into information "components" and stored in the warehouse. Data

5
Warehouse users make requests and are delivered information "products" that are created

from the components and parts stored in the warehouse.

A Data Warehouse is typically a blending of technologies, including relational and

multidimensional databases, client/server architecture, extraction/transformation

programs, graphical user interfaces, and more.

Data warehousing is one of the hottest industry trends - for good reason. A well-defined

and properly implemented data warehouse can be a valuable competitive tool. (Perkins)

THE COMPONENTS OF A DATA WAREHOUSE

The following describes the components of a data warehouse (Figure. 2)

Figure 2

Summarized Data :- There are two kinds of summarized data, lightly summarized data

and highly summarized data. Lightly summarized data are the hallmark of a Data

Warehouse. All departments in a corporation do not have the same information

requirements, so effective Data Warehouse design provides for customized, lightly

summarized data for every department. Highly summarized data are primarily for the

executives. Highly summarized data can come from either the lightly summarized data
6
used by enterprise elements or from current detail. If executives require more detailed

information they have the capability of accessing increasing levels of detail through a

"drill down" process.

Current Detail :- The heart of a Data Warehouse is its current detail, where the bulk of

data resides. Current detail comes directly from operational systems and may be stored as

raw data or as aggregations of raw data. Every data entity in current detail is a snapshot,

at a moment in time, representing the instance when the data are accurate. Current detail

is typically two to five years old. Current detail refreshment occurs as frequently as

necessary to support enterprise requirements.

System of Record :- A system of record is the source of the data that feed the data

warehouse. Data in a data warehouse differ from operational systems data in that they can

only be read, not modified. Thus, it is necessary that a data warehouse be populated with

the highest quality data available, i.e., data that are most timely, complete, accurate, and

have the best structural conformance to the data warehouse.

Integration and Transformation Programs :- Even the highest quality operational data

cannot usually be copied, as is, into a data warehouse. As operational data items pass

from their systems of record to a data warehouse, integration and transformation

programs convert them from application-specific data into enterprise data. These

integration and transformation programs perform functions such as: Reformatting,

recalculating, or modifying key structures; Adding time elements; Identifying default

values; Supplying logic to choose between multiple data sources; Summarizing, tallying,

and merging data from multiple sources. When either operational or Data Warehouse

7
environments change, integration and transformation programs are modified to reflect

that change.

Archives :- Data Warehouse archives contain old data (normally over two years old) of

significant, continuing interest and value to the enterprise. There is usually a massive

amount of data stored in the Data Warehouse archives, with a low incidence of access.

Archive data are most often used for forecasting and trend analysis. Archives include not

only old data (in raw or summarized form); they also include the metadata that describes

the old data's characteristics.

Metadata :- One of the most important parts of a Data Warehouse is its metadata - or

data about data. Also called Data Warehouse architecture, metadata is integral to all

levels of the Data Warehouse, but exists and functions in a different dimension from

other warehouse data. Metadata that is used by Data Warehouse developers to manage

and control Data Warehouse creation and maintenance resides outside the Data

Warehouse. (Perkins)

Along with the various components of a data warehouse there are various structures of a

data warehouse too.

STRUCTURES OF A DATA WAREHOUSE

There are various structures of a data warehouse that a corporation can adopt based on its

needs. The physical data warehouse, the logical data warehouse and the data mart.

8
Figure 3

STRUCTURES OF A DATA WAREHOUSE

Physical Data Warehouse :- is a physical database in which all the data for the data

warehouse are stored, along with metadata and processing logic for scrubbing,

organizing, packaging and processing the detail data.

Logical Data Warehouse :- like physical data warehouse also contains metadata,

including enterprise rules and processing logic for scrubbing, organizing, packaging and

processing the data, but does not contain actual data. Instead, it contains the information

necessary to access the data wherever they reside. This structure is effective only when

there is a single source for the data and they are known to be accurate and timely.

Data Mart :- is a subset of an enterprise-wide data warehouse, which typically supports

an enterprise element (department, region, function, etc.). As part of an iterative data

warehouse development process, an enterprise builds a series of physical (or logical) data

marts over time and links them via an enterprise-wide logical data warehouse or feeds

them from a single physical warehouse. (Perkins)

9
DATA WAREHOUSING METOHODS

Several warehousing methodologies are used throughout the warehousing community.

All of these fall into one of two categories: the big bang approach or the iterative

approach.

Big Bang Approach

A big bang methodology tries to solve all known problems by creating a huge data

warehouse before you release it for evaluation and testing. Many people believe that this

process is necessary to deliver on your objectives. Based on your objectives, amount of

data to be incorporated, and your intimate knowledge of your business and data, you may

be able to accomplish your warehousing project with a big bang methodology. But, there

are some considerations to take heed of:

To create a data warehouse, the corporation must plan its warehouse, evaluate and install

the necessary software and hardware, collect business requirements, and become familiar

with its corporate data. While these tasks are taking place:

• The business goals of the corporation can change due to changes in the market or

technology.

• Management supporters can lose interest in this project if you don't keep them

involved and show rapid results.

• The corporate data could change (they may start collecting Web log data)

• New releases of the firm’s chosen software may become available (warehousing

is still an evolving market and even the best tools continue to improve and

change).

10
The items listed above are just a few of the business and technical changes that could

impact your plans. If you are not plugged into the proper channels, any one of these

changes could cause your project to fail because you cannot quickly respond the

necessary changes.

Iterative Approach

With an iterative methodology, the corporation breaks its warehousing project into small,

manageable chunks, referred to as projects. In the iterative approach, the same planning

tasks are performed that are required in the big bang approach, but evaluation of all of

your deliverables up front is not required. The corporation must design its overall

architecture, but when entering the planning phase, it needs to concentrate only on its first

project or iteration. After each project, review of the architecture, its development

process, and the corporation’s business requirements is done.

The value of smaller projects within the larger warehousing process is:

• It shows a faster return on the company investment because it delivers one

solution quickly. This keeps the management supporters involved and interested

in the project.

• It can adjust to changes in the business requirements faster because the team is

small. Manageable projects that have short delivery schedules.

• Early involvement by the corporation’s user community provides real-situation

testing, which provides it with user needs and defect reports. Additionally, users

are provided with better feedback when they can see the system than when they

have to envision it from a slide presentation. This feedback can improve the

11
corporation’s goals and processes for the next iteration.

(www.sas.com/rnd/warehousing)

Although the business may ask for everything to be delivered by the warehouse at once,

taking a "big bang" approach may not be a prudent step. Instead, breaking the project into

parts or releases, and establishing a clear set of objectives and definition of success for

each release of the project will be appropriate. Each release will need to go through the

complete system development lifecycle of requirements, design, build, test, implement,

and support. By delivering functionality and business value with each short release, a

corporation can integrate its data, create the proper roles, train its users, gain insight and

assimilate lessons learned. At the same time, by using an iterative approach a corporation

can adjust its data warehouse's content to add new sources, which were not previously

considered, as it gains insight into its business and analytics. Perhaps most important, an

iterative approach enables the project team to demonstrate a few quick wins in terms of

business value delivered. Due to its iterative nature, a data warehouse is a journey not a

destination. There is always more data which can be integrated. (Karakizis)

As it is exhibited in Figure 4, the iteration methodology thus starts with the Initial

Organization phase, identifying the corporation’s readiness for undertaking a data

warehousing project. Followed by the Analysis phase; evaluation of the feasibility of the

data warehouse, gathering business requirements, and getting agreement on the goals and

purpose of the warehouse. Third is the Design phase; analyzing and designing the

warehouse system architecture. Followed by the Development phase; creation of the

actual data warehouse structure and population of the data warehouse. Fifth is the Testing

phase; data cleansing and parameter clarification (may send back to the design phase for

another iteration). Followed by, the Implementation phase; the rolling out the production

12
environment and providing user training. And finally the Maintenance phase; monitoring,

updating and cleansing data. (Wierschem et al)

Initial Organization

Analysis

Design
Development

Testing

Implementation

Maintenance

Figure 4

At this point, provisions for an easy path for user feedback should be established and

review of the five steps must be done and necessary adjustments should be made. After

the review of the process, the next iteration or project should be started from the

assessment phase. The assessment and requirements phases should require less process

time after successive iterations.

GETTING THE RIGHT TOOLS TO BUILD A DATA WAREHOUSE

As in any endeavor selecting the correct tools is paramount for success. As the old

Chinese adage says,” To accomplish a goal, make sure the proper tools are selected."

13
Given the complexity of the data warehousing system and the cross-departmental

implications of the project, it is easy to see why the proper selection of tools and

personnel is very important. This section of the paper will present information on such

selections.

There are two steps that top management is concerned with when building a data

warehouse. One step is choosing a vendor. However, in doing so there are certain basic

but nonetheless critical issues that have to be evaluated.

1. Support :- What type of support is offered? It is industry standard for vendors to

charge an annual support fee that is 15-20% of the software product license. But

the question is, will any software issues be handled promptly by the vendor or

not?

2. Professional Services :- This encompasses consulting and education. What type

of consulting proposal does the vendor give? Is the personnel requirements and

consulting rates reasonable? It might be wise to speak with members of the

consulting team before signing on the dotted line. On the education front, what

type of training is available? And how much is the consulting team willing to do

knowledge transfer? Does the consulting team purposely hold off information so

that either 1) you will need to send more people to vendor's education classes, or

2) you will need to hire additional consulting to make any changes to the system.

3. Stability :- More than anything else, this is probably the most important measure.

It may even be more important than the current functionalities that the tool itself

provides, for the simple reason that questions whether the vendor is going to be

around for a while or not, or will it be able to make enhancements to its tool?

14
The other step is selecting the right team to build the data warehouse. In this there are two

possibilities; one is to use external consultants and the other, to hire permanent

employees.

The pros of hiring external consultants are:

1. They are usually more experienced in data warehousing implementations. The

fact of the matter is, even today, people with extensive data warehousing

backgrounds are difficult to find. With that, when there is a need to ramp up a

team quickly, the easiest route to go is to hire external consultants.

The pros of hiring permanent employees are:

1. They are less expensive. With hourly rates for experienced data warehousing

professionals running from $100/hr and up, and even more for Big-5 or vendor

consultants, hiring permanent employees is a much more economical option.

2. They are less likely to leave. With consultants, whether they are on contract, via a

Big-5 firm, or one of the tool vendor firms, they are likely to leave at a moment's

notice. This makes knowledge transfer very important. Of course, the flip side is

that these consultants are much easier to get rid of, too.

However, management must understand that there are various entities that play important

roles in a data warehouse project. They are;

1. Project Manager: This person will oversee the progress and be responsible for the

success of the data warehousing project.

15
2. DBA: This role is responsible to keep the database running smoothly. Additional

tasks for this role may be to plan and execute a backup/recovery plan, as well as

performance tuning.

3. Technical Architect: This role is responsible for developing and implementing the

overall technical architecture of the data warehouse, from the backend

hardware/software to the client desktop configurations.

4. ETL Developer: This role is responsible for planning, developing, and deploying

the extraction, transformation, and loading routine for the data warehouse.

5. Front End Developer: This person is responsible for developing the front-end,

whether it be client-server or over the web.

6. OLAP (On-Line Analytical Processing) Developer: OLAP is the foundation for a

range of essential business applications, including sales and marketing analysis,

planning, budgeting, statutory consolidation, profitability analysis, balanced

scorecard, performance measurement and data warehouse reporting. (Pendse and

Creeth) The role of the OLAP Developer is thus very crucial. He/She is

responsible for the development of OLAP cubes.

7. Trainer: A significant role is the trainer. After the data warehouse is implemented,

a person on the data warehouse team needs to work with the end users to get them

familiar with how the front end is set up so that the end users can get the most

benefit out of the data warehouse system.

8. Data Modeler: This role is responsible for taking the data structure that exists in

the enterprise and model it into a schema that is suitable for OLAP analysis.

(Chuo-Han Lee, 2002)

16
ADVANTAGES and DISADVANTAGES OF DATA WAREHOUSING

Data warehousing has been increasingly popular in many organizations around the world.

It is not with blind belief that corporations are investing millions of dollars in data

warehousing projects.

A successful data warehouse project will provide numerous lasting benefits to a

company. Data warehousing makes retrieving information so easy that when a user query

is submitted to the warehouse, the needed information is already there, with

inconsistencies and differences already resolved. This makes it much easier and more

efficient to run queries over data that originally came from different sources. (WHIPS)

Along with its numerous ease of use benefits data warehousing provides other qualitative

advantages too. It enables improved knowledge of relationships among products and

services and their performances, ability to make quick and proper analysis that pave the

way for better decision making can be gained from a successful data warehousing project

and thus, give the company a strong competitive advantage over the competition. (Smith)

The benefits of the development of a data warehouse would include:

• More accurate predictions of customer demand based on the use of trends

analysis.

• The response improvement in direct marketing campaigns through the use of

household demographics and current customer analysis.

• The improvement in vendor relations and price reductions by targeting selected

vendors with increased level of purchasing over the enterprise.

• The significant savings from improved data quality across the enterprise. (Smith)

17
• The ability to run complex queries easily and efficiently since query execution

does not involve data translation and communication with remote sources.

• Convenience for end users since they can use a single data model and query

language.

• Simplicity of the system design. For example, there is no need to perform query

optimization over heterogeneous sources, a very difficult problem faced by other

approaches.

• Information at the warehouse is under the control of the warehouse users; thus it

can be stored safely and reliably for as long as necessary. (WHIPS)

From the management’s point of view the benefits and rewards are abounding for a

company that builds and maintains a data warehouse correctly. The corporation will

make dramatic cost savings and its revenues will soar. Furthermore, there will be increase

in analysis of marketing databases to cross-sell products, less storage on the mainframe

and the ability to identify and keep the most profitable customers while getting a better

picture of who they are, and it's easy to see why data warehousing is spreading faster. For

example, the telecom industry uses data warehouses to target customers who may want

certain phone services rather than doing "blanket" phone and mail campaigns and

aggravating customers with unsolicited calls during dinner. Some of the soft benefits of

data warehousing come in the technology's effect on users. When built and used

correctly, a warehouse changes users' jobs, granting them faster access to more accurate

data and allowing them to give better customer service. A company must not forget,

however, that the goal for any data warehousing project is to lower operating costs and

generate revenue—this is an investment, after all, and quantifiable ROI should be

expected over time. (Wailgum, 2001)

18
Even though the benefits of data warehousing by far outweigh the disadvantages there are

certain disadvantages of data warehousing that companies must pay heed to. The most

important disadvantages are:

• Expensive initial data warehouse set up. However, after the system is in place the

cost should be low and cover only the maintenance and future modifications of

the system. Also, there is high cost in getting data translated and copied to

existing databases in time for being useful for the end user.

• A data warehouse takes time to build and time should be given to the project and

the difficulties in getting a data warehouse up and running and developed should

not be underestimated.

• The re-education of the programmers often proves to be a disadvantage as change

is often resisted until familiarity is gained with the new approach. (OCS

Consulting)

• Cost and time is also borne to develop the required new skill-set for warehouse

developers and end users.

• A data warehouse is complex to develop it cannot just be bought as an off-the-

shelf product and is designed specific for an organization needs. Choice of

hardware, software and structure requires careful consideration and how they will

progressively work together in the future.

• The data warehouse will require management, although in comparison to the

management of the current environment it will mean that overall less time is

actually required in the Data Warehousing approach.

• Data held in one place highlights data integrity problems and vulnerability from

the public domain thus advanced security to prevent unwanted users, including

19
competitors from accessing the data base will be of critical importance for the

company. (Davis et al, 1999)

CONCLUSION

Thus, we see that data warehousing does not have to be an enigma to the managers of

companies. Even though it is an exhausting task a successful data warehousing project is

crucial for the companies to run a successful business. Management must believe and

understand that Data warehousing is of strategic importance to a company. Therefore,

even if management has to invest a massive amount of capital to build a data warehouse

it must do in hindsight of the myriad benefits that will crop up. Any company that doesn’t

see the importance and benefits of data warehousing and is blinded by the cost and

daunted by the size of the task will feel the devastating impact of the competing business

that have undertaken successful data.

20
REFERENCES

Louis Rolleigh and Joe Thomas, Data Integration: The Warehouse Foundation, White
Papers, copyright 2002 Acxiom Corporation. Available at:
http://www.acxiom.com/displayMain/0,1494,USA~en~383~197~0~0,00.html

Jennifer Widom (1995), Research Problems in Data Warehousing, Proceedings of the


4th Int'l Conference on Information and Knowledge Management (CIKM), Nov. 1995
Available at: http://www-db.stanford.edu/pub/papers/warehouse-research.ps

M. Davis, Z. Galzie, M. Silcox (1999), Data Warehousing


Available at: http://www.student.city.ac.uk/~dz542/cw.html

Ralph Kimball (1996), The Data Warehouse Tool Kit, Wiley.

Larry Greenfield, A Definition of Data Warehousing, copyright 1995-2002


Available at: http://www.dwinfocenter.org/defined.html

The Data Warehousing Concept,


Available at: http://www.sas.com/rnd/warehousing/concept.html

Alan Perkins, A Strategic Approach to Data Warehouse Engineering


White Papers, copyright 1997-1998 Visible Systems Corporation. Available at:
http://www.dmreview.com/portal_ros.cfm?
NavID=92&WhitePaperID=108&PortalID=12

Spyro D. Karakizis, How to Win With Your Data Warehouse: Advice from A Data
Warehouse Expert, ©1996-2002 Accenture. Available at: http://www.accenture.com

David Wierschem, Ph.D. Jeremy McMillen, M.S. Randy McBroom, Ph.D.,


Methodology for Developing an Academic Data Warehouse,
Office of Institutional Resource and Planning Texas A&M University-Commerce.
Available at: http://www.panda.auburn.edu/sair/3-3000-PP.pdf

Nigel Pendse and Richard Creeth, The OLAP Report


Available at: http://www.olapreport.com/

Chuo-Han Lee (2002), Data Warehouse and Data Warehousing Copyright © 2001, 2002
Available at: http://www.1keydata.com/datawarehousing/datawarehouse.html

WHIPS (WareHouse Information Prototype at Stanford), Data Warehousing At Stanford,


Available at: http://www-db.stanford.edu/warehousing/warehouse.html

Anna Marie Smith, Data Warehousing: A Short Overview,


Available at: http://datawarehouse.ittoolbox.com/browse.asp?
c=DWPeerPublishing&r=%2Fpub%2FAS120301a%2Epdf

21
Tom Wailgum (2001), What is a Data Warehouse?
Available at: http://www.darwinmag.com/learn/curve/column.html?
ArticleID=50

OCS Consulting, Data Warehousing Presentation To VIEWS Pharmaceutical SIG


Available at: http://www.ocs-consulting.com/businessintelligence_papers_cdw.asp

22

You might also like