Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
Ebook608 pages14 hours

Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses

Rating: 4.5 out of 5 stars

4.5/5

()

Read preview

About this ebook

Data virtualization can help you accomplish your goals with more flexibility and agility. Learn what it is and how and why it should be used with Data Virtualization for Business Intelligence Systems. In this book, expert author Rick van der Lans explains how data virtualization servers work, what techniques to use to optimize access to various data sources and how these products can be applied in different projects. You’ll learn the difference is between this new form of data integration and older forms, such as ETL and replication, and gain a clear understanding of how data virtualization really works. Data Virtualization for Business Intelligence Systems outlines the advantages and disadvantages of data virtualization and illustrates how data virtualization should be applied in data warehouse environments. You’ll come away with a comprehensive understanding of how data virtualization will make data warehouse environments more flexible and how it make developing operational BI applications easier. Van der Lans also describes the relationship between data virtualization and related topics, such as master data management, governance, and information management, so you come away with a big-picture understanding as well as all the practical know-how you need to virtualize your data.

  • First independent book on data virtualization that explains in a product-independent way how data virtualization technology works.
  • Illustrates concepts using examples developed with commercially available products.
  • Shows you how to solve common data integration challenges such as data quality, system interference, and overall performance by following practical guidelines on using data virtualization.
  • Apply data virtualization right away with three chapters full of practical implementation guidance.
  • Understand the big picture of data virtualization and its relationship with data governance and information management.
LanguageEnglish
Release dateJul 25, 2012
ISBN9780123978172
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
Author

Rick van der Lans

Rick F. van der Lans is an independent consultant, author, and lecturer specializing in business intelligence, data warehousing, and database technology. He is the managing director of R20/Consultancy which is based in The Netherlands. Rick has advised many large companies worldwide on defining their data warehouse architectures. He is the chairman of the annual European BI and Data Warehousing Conference organized in London, he writes for B-eye-Network.com, BI-platform.nl, and for Database Magazine. He is the author of several books on database technology, including Introduction to SQL (Addison-Wesley, 2006), currently in its fourth edition.

Related to Data Virtualization for Business Intelligence Systems

Related ebooks

Enterprise Applications For You

View More

Related articles

Reviews for Data Virtualization for Business Intelligence Systems

Rating: 4.333333333333333 out of 5 stars
4.5/5

3 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Data Virtualization for Business Intelligence Systems - Rick van der Lans

    http://www.linkedin.com/pub/rick-van-der-lans/9/207/223

    Chapter 1

    Introduction to Data Virtualization

    1.1 Introduction

    This chapter explains how data virtualization can be used to develop more agile business intelligence systems. By applying data virtualization, it will become easier to change systems. New reports can be developed, and existing reports can be adapted more easily and quickly. This agility is an important aspect for users of business intelligence systems. Their world is changing faster and faster, and therefore their supporting business intelligence systems must change at the same pace.

    First, we examine the changes that are taking place in the fast-moving business intelligence industry. Next, we discuss what data virtualization is and what the impact will be of applying this technology to business intelligence systems. To get a better understanding of data virtualization, its relationship with other technologies and ideas, such as abstraction, encapsulation, data integration, and enterprise information integration, is described. In other words, this chapter presents a high-level overview of data virtualization. These topics are all discussed in more detail in subsequent chapters.

    1.2 The World of Business Intelligence Is Changing

    The primary reason why organizations have developed business intelligence systems is to support and improve their decision-making processes. This means that the main users of such systems are those who make the decisions. They decide on, for example, when to order more raw materials, how to streamline the purchasing process, which customers to offer a discount, and whether to outsource or insource transport.

    This world of decision making is changing, and the biggest change is that organizations have to react faster, which means decisions have to be made faster. There is less and less time available to make (sometimes crucial) decisions. Studies are supporting this phenomenon. For example, the study by the Aberdeen Group in March 2011 shows that 43 percent of enterprises are finding it harder to make timely decisions (Figure 1.1). Managers increasingly find they have less time to make decisions after certain business events occur. The consequence is that it has to be possible to change existing reports faster and to develop new reports more quickly.

    Figure 1.1 Study by the Aberdeen Group. The middle bar indicates that 43 percent of the respondents indicate that the time window for making decisions is shortening; see [2] .

    But this is not the only change. New data sources are becoming available for analysis and reporting. Especially in the first years of business intelligence, the only data available for reporting and analytics was internal data related to business processes. For example, systems had been built to store all incoming orders, all the customer data, and all invoices. Nowadays, there are many more systems that offer valuable data, such as weblogs, email servers, call center systems, and document management systems. Analyzing this data can lead to a better understanding of what customers think about a firm’s services and products, how efficient the website is, and the best ways to find good customers.

    New data sources can also be found outside the boundaries of the organization. For example, websites, social media networks, and government data all might contain data that, when combined with internal data, can lead to new insights. Organizations are interested in combining their own internal data with these new data sources and thereby enriching the analytical and reporting capabilities.

    As in every field—whether it’s medical, pharmaceutical, telecommunications, electronics, or, as in this case, business intelligence—new technologies create new opportunities and therefore change that field. For business intelligence, new technologies have become available, including analytical database servers, mobile business intelligence tools, in-database analytics, massive internal memory, highly parallelized hardware platforms, cloud, and solid state disk technology. All of these new technologies can dramatically expand an organization’s analytical and reporting features: it will support forms of decision making that most organizations have not even considered yet, and it will allow organizations to analyze data in only a few minutes that would otherwise have taken days with older technologies.

    Another obvious change relates to the new groups of users interested in applying business intelligence. Currently, most users of business intelligence systems are decision makers at strategical and tactical management levels. In most cases these users can work perfectly with data that isn’t 100 percent up to date. Data that is a day, a week, or maybe even a month old is more than sufficient for them. The change is that currently decision makers working at the operational level are attracted by business intelligence. They understand its potential value for them, and therefore they want to exploit the power of reporting and analytical tools. However, in most cases, they can’t operate with old data. They want to analyze operational data, data that is 100 percent (or at least close to 100 percent) up to date.

    All of these changes, especially faster decision making, are hard to implement in current business intelligence systems because it requires a dramatic redesign. This is because most business intelligence systems that were developed over the last 20 years are based on a chain of databases (Figure 1.2). Data is transformed and copied from one database to another until it reaches an endpoint: a database being accessed by a reporting or analytical tool. Each transformation process extracts, cleanses, integrates, and transforms the data, and then loads it into the next database in the chain. This process continues until the data reaches a quality level and form suitable for the reporting and analytical tools. These transformation processes are normally referred to as ETL (Extract Transform Load).

    Figure 1.2 Many business intelligence systems are based on a chain of databases and transformation processes.

    This chain of databases and transformation processes is long, complex, and highly interconnected. Each change made to a report or to data can lead to a myriad of changes throughout the chain. It might take days, or even weeks, before an apparently simple change has been implemented throughout the chain. The effect is that the business intelligence department can’t keep up with the speed of change required by the business. This leads to an application backlog and has a negative impact on the decision-making speed and quality of the organization. In addition, because so many transformation processes are needed and each of them takes time, it’s hard to deliver operational data at an endpoint of the chain, such as a data mart.

    What is needed is an agile architecture that is easy to change, and the best way to do that is to create an architecture that consists of fewer components, which means fewer databases and fewer transformation processes. When you have a small number of components, there are fewer things that require changes. In addition, fewer components simplifies the architectures, which also increases the agility level.

    This is where data virtualization comes in. In a nutshell, data virtualization is an alternative technology of transforming available data into the form needed for reporting and analytics. It requires fewer databases and fewer transformation processes. In other words, using data virtualization in a business intelligence system leads to a shorter chain. Fewer databases have to be developed and managed, and there will be fewer transformation processes. The bottom line is that applying data virtualization simplifies business intelligence architectures and therefore leads to more agile business intelligence architectures that fit the business intelligence needs of current organizations: simple is more agile.

    1.3 Introduction to Virtualization

    The term data virtualization is based on the word virtualization. Virtualization is not a new concept in the IT industry. Probably the first application of virtualization was in the 1960s when IBM used this concept to split mainframes into separate virtual machines, which made it possible for one machine to run multiple applications concurrently. Also in the 1960s, virtual memory was introduced using a technique called paging. Memory virtualization was used to simulate more memory than was physically available in a machine. Nowadays, almost everything can be virtualized, including processors, storage (see [3]), networks, data centers (see [4]), and operating systems. VMWare and Cloud can also be regarded as virtualization technologies.

    In general, virtualization means that applications can use a resource without any concern for where it resides, what the technical interface is, how it has been implemented, which platform it uses, and how much of it is available. A virtualization solution encapsulates the resource in such a way that all those technical details become hidden and the application can work with a simpler interface.

    The first time I was involved in a project in which a resource was virtualized was very early in my career. An application had to be written that could work with different user interface technologies. One was called Teletext and was developed for TV sets, and another was a character-based terminal. Being the technical designer, I decided to develop an API that the application would use to get data on a screen and to get input back. This API was a layer of software that would hide the user interface technology in use for the rest of the application. Without knowing it, I had designed a user interface virtualization layer. Since then, I’ve always tried to design systems in such a way that implementation details of certain technologies are hidden for other parts of an application in order to simplify

    Enjoying the preview?
    Page 1 of 1