Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
4.5/5
()
About this ebook
Data virtualization can help you accomplish your goals with more flexibility and agility. Learn what it is and how and why it should be used with Data Virtualization for Business Intelligence Systems. In this book, expert author Rick van der Lans explains how data virtualization servers work, what techniques to use to optimize access to various data sources and how these products can be applied in different projects. You’ll learn the difference is between this new form of data integration and older forms, such as ETL and replication, and gain a clear understanding of how data virtualization really works. Data Virtualization for Business Intelligence Systems outlines the advantages and disadvantages of data virtualization and illustrates how data virtualization should be applied in data warehouse environments. You’ll come away with a comprehensive understanding of how data virtualization will make data warehouse environments more flexible and how it make developing operational BI applications easier. Van der Lans also describes the relationship between data virtualization and related topics, such as master data management, governance, and information management, so you come away with a big-picture understanding as well as all the practical know-how you need to virtualize your data.
- First independent book on data virtualization that explains in a product-independent way how data virtualization technology works.
- Illustrates concepts using examples developed with commercially available products.
- Shows you how to solve common data integration challenges such as data quality, system interference, and overall performance by following practical guidelines on using data virtualization.
- Apply data virtualization right away with three chapters full of practical implementation guidance.
- Understand the big picture of data virtualization and its relationship with data governance and information management.
Rick van der Lans
Rick F. van der Lans is an independent consultant, author, and lecturer specializing in business intelligence, data warehousing, and database technology. He is the managing director of R20/Consultancy which is based in The Netherlands. Rick has advised many large companies worldwide on defining their data warehouse architectures. He is the chairman of the annual European BI and Data Warehousing Conference organized in London, he writes for B-eye-Network.com, BI-platform.nl, and for Database Magazine. He is the author of several books on database technology, including Introduction to SQL (Addison-Wesley, 2006), currently in its fourth edition.
Related to Data Virtualization for Business Intelligence Systems
Related ebooks
Business Metadata: Capturing Enterprise Knowledge Rating: 4 out of 5 stars4/5Data Architecture: A Primer for the Data Scientist: A Primer for the Data Scientist Rating: 5 out of 5 stars5/5Agile Data Warehousing for the Enterprise: A Guide for Solution Architects and Project Leaders Rating: 0 out of 5 stars0 ratingsData Lake Development with Big Data Rating: 0 out of 5 stars0 ratingsBusiness Intelligence Guidebook: From Data Integration to Analytics Rating: 4 out of 5 stars4/5Building a Scalable Data Warehouse with Data Vault 2.0 Rating: 4 out of 5 stars4/5Guerrilla Analytics: A Practical Approach to Working with Data Rating: 5 out of 5 stars5/5Business Intelligence: The Savvy Manager's Guide Rating: 4 out of 5 stars4/5Managing Data in Motion: Data Integration Best Practice Techniques and Technologies Rating: 0 out of 5 stars0 ratingsModern Enterprise Business Intelligence and Data Management: A Roadmap for IT Directors, Managers, and Architects Rating: 0 out of 5 stars0 ratingsBuilding Big Data Applications Rating: 0 out of 5 stars0 ratingsMicrosoft SQL Server 2014 Business Intelligence Development Beginner’s Guide Rating: 0 out of 5 stars0 ratingsHDInsight Essentials - Second Edition Rating: 0 out of 5 stars0 ratingsDeveloping High Quality Data Models Rating: 0 out of 5 stars0 ratingsSoftware Architecture for Big Data and the Cloud Rating: 0 out of 5 stars0 ratingsIBM InfoSphere: A Platform for Big Data Governance and Process Data Governance Rating: 2 out of 5 stars2/5Big Data: Principles and Paradigms Rating: 0 out of 5 stars0 ratingsSpreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science Rating: 0 out of 5 stars0 ratingsData Engineering on Azure Rating: 0 out of 5 stars0 ratingsThe Visual Imperative: Creating a Visual Culture of Data Discovery Rating: 4 out of 5 stars4/5Business Analytics for Managers: Taking Business Intelligence Beyond Reporting Rating: 5 out of 5 stars5/5Implement Oracle Business Intelligence Rating: 5 out of 5 stars5/5The Data and Analytics Playbook: Proven Methods for Governed Data and Analytic Quality Rating: 5 out of 5 stars5/5Multi-Domain Master Data Management: Advanced MDM and Data Governance in Practice Rating: 5 out of 5 stars5/5Principles of Data Integration Rating: 5 out of 5 stars5/5Data Mapping for Data Warehouse Design Rating: 5 out of 5 stars5/5Data Modeling Essentials Rating: 4 out of 5 stars4/5
Enterprise Applications For You
Scrivener For Dummies Rating: 4 out of 5 stars4/5Learn Windows PowerShell in a Month of Lunches Rating: 0 out of 5 stars0 ratingsCreating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Bitcoin For Dummies Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology Rating: 0 out of 5 stars0 ratingsSharePoint 2016 For Dummies Rating: 5 out of 5 stars5/5Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture Rating: 4 out of 5 stars4/5Excel Formulas and Functions 2020: Excel Academy, #1 Rating: 4 out of 5 stars4/5QuickBooks 2023 All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsExcel 2019 For Dummies Rating: 3 out of 5 stars3/550 Useful Excel Functions: Excel Essentials, #3 Rating: 5 out of 5 stars5/5Notion for Beginners: Notion for Work, Play, and Productivity Rating: 4 out of 5 stars4/5Enterprise AI For Dummies Rating: 3 out of 5 stars3/5Learning Python Rating: 5 out of 5 stars5/5Using Word 2019: The Step-by-step Guide to Using Microsoft Word 2019 Rating: 0 out of 5 stars0 ratingsQuickBooks 2021 For Dummies Rating: 0 out of 5 stars0 ratingsManaging Humans: Biting and Humorous Tales of a Software Engineering Manager Rating: 4 out of 5 stars4/5Excel Tips and Tricks Rating: 0 out of 5 stars0 ratingsMastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online Rating: 0 out of 5 stars0 ratingsExcel 2016 For Dummies Rating: 4 out of 5 stars4/5The New Email Revolution: Save Time, Make Money, and Write Emails People Actually Want to Read! Rating: 5 out of 5 stars5/5Essential Office 365 Third Edition: The Illustrated Guide to Using Microsoft Office Rating: 3 out of 5 stars3/5Experts' Guide to OneNote Rating: 5 out of 5 stars5/5The Basics of Hacking and Penetration Testing: Ethical Hacking and Penetration Testing Made Easy Rating: 4 out of 5 stars4/5
Reviews for Data Virtualization for Business Intelligence Systems
3 ratings0 reviews
Book preview
Data Virtualization for Business Intelligence Systems - Rick van der Lans
http://www.linkedin.com/pub/rick-van-der-lans/9/207/223
Chapter 1
Introduction to Data Virtualization
1.1 Introduction
This chapter explains how data virtualization can be used to develop more agile business intelligence systems. By applying data virtualization, it will become easier to change systems. New reports can be developed, and existing reports can be adapted more easily and quickly. This agility is an important aspect for users of business intelligence systems. Their world is changing faster and faster, and therefore their supporting business intelligence systems must change at the same pace.
First, we examine the changes that are taking place in the fast-moving business intelligence industry. Next, we discuss what data virtualization is and what the impact will be of applying this technology to business intelligence systems. To get a better understanding of data virtualization, its relationship with other technologies and ideas, such as abstraction, encapsulation, data integration, and enterprise information integration, is described. In other words, this chapter presents a high-level overview of data virtualization. These topics are all discussed in more detail in subsequent chapters.
1.2 The World of Business Intelligence Is Changing
The primary reason why organizations have developed business intelligence systems is to support and improve their decision-making processes. This means that the main users of such systems are those who make the decisions. They decide on, for example, when to order more raw materials, how to streamline the purchasing process, which customers to offer a discount, and whether to outsource or insource transport.
This world of decision making is changing, and the biggest change is that organizations have to react faster, which means decisions have to be made faster. There is less and less time available to make (sometimes crucial) decisions. Studies are supporting this phenomenon. For example, the study by the Aberdeen Group in March 2011 shows that 43 percent of enterprises are finding it harder to make timely decisions (Figure 1.1). Managers increasingly find they have less time to make decisions after certain business events occur. The consequence is that it has to be possible to change existing reports faster and to develop new reports more quickly.
Figure 1.1 Study by the Aberdeen Group. The middle bar indicates that 43 percent of the respondents indicate that the time window for making decisions is shortening; see [2] .
But this is not the only change. New data sources are becoming available for analysis and reporting. Especially in the first years of business intelligence, the only data available for reporting and analytics was internal data related to business processes. For example, systems had been built to store all incoming orders, all the customer data, and all invoices. Nowadays, there are many more systems that offer valuable data, such as weblogs, email servers, call center systems, and document management systems. Analyzing this data can lead to a better understanding of what customers think about a firm’s services and products, how efficient the website is, and the best ways to find good
customers.
New data sources can also be found outside the boundaries of the organization. For example, websites, social media networks, and government data all might contain data that, when combined with internal data, can lead to new insights. Organizations are interested in combining their own internal data with these new data sources and thereby enriching the analytical and reporting capabilities.
As in every field—whether it’s medical, pharmaceutical, telecommunications, electronics, or, as in this case, business intelligence—new technologies create new opportunities and therefore change that field. For business intelligence, new technologies have become available, including analytical database servers, mobile business intelligence tools, in-database analytics, massive internal memory, highly parallelized hardware platforms, cloud, and solid state disk technology. All of these new technologies can dramatically expand an organization’s analytical and reporting features: it will support forms of decision making that most organizations have not even considered yet, and it will allow organizations to analyze data in only a few minutes that would otherwise have taken days with older technologies.
Another obvious change relates to the new groups of users interested in applying business intelligence. Currently, most users of business intelligence systems are decision makers at strategical and tactical management levels. In most cases these users can work perfectly with data that isn’t 100 percent up to date. Data that is a day, a week, or maybe even a month old is more than sufficient for them. The change is that currently decision makers working at the operational level are attracted by business intelligence. They understand its potential value for them, and therefore they want to exploit the power of reporting and analytical tools. However, in most cases, they can’t operate with old data. They want to analyze operational data, data that is 100 percent (or at least close to 100 percent) up to date.
All of these changes, especially faster decision making, are hard to implement in current business intelligence systems because it requires a dramatic redesign. This is because most business intelligence systems that were developed over the last 20 years are based on a chain of databases (Figure 1.2). Data is transformed and copied from one database to another until it reaches an endpoint: a database being accessed by a reporting or analytical tool. Each transformation process extracts, cleanses, integrates, and transforms the data, and then loads it into the next database in the chain. This process continues until the data reaches a quality level and form suitable for the reporting and analytical tools. These transformation processes are normally referred to as ETL (Extract Transform Load).
Figure 1.2 Many business intelligence systems are based on a chain of databases and transformation processes.
This chain of databases and transformation processes is long, complex, and highly interconnected. Each change made to a report or to data can lead to a myriad of changes throughout the chain. It might take days, or even weeks, before an apparently simple change has been implemented throughout the chain. The effect is that the business intelligence department can’t keep up with the speed of change required by the business. This leads to an application backlog and has a negative impact on the decision-making speed and quality of the organization. In addition, because so many transformation processes are needed and each of them takes time, it’s hard to deliver operational data at an endpoint of the chain, such as a data mart.
What is needed is an agile architecture that is easy to change, and the best way to do that is to create an architecture that consists of fewer components, which means fewer databases and fewer transformation processes. When you have a small number of components, there are fewer things that require changes. In addition, fewer components simplifies the architectures, which also increases the agility level.
This is where data virtualization comes in. In a nutshell, data virtualization is an alternative technology of transforming available data into the form needed for reporting and analytics. It requires fewer databases and fewer transformation processes. In other words, using data virtualization in a business intelligence system leads to a shorter chain. Fewer databases have to be developed and managed, and there will be fewer transformation processes. The bottom line is that applying data virtualization simplifies business intelligence architectures and therefore leads to more agile business intelligence architectures that fit the business intelligence needs of current organizations: simple is more agile.
1.3 Introduction to Virtualization
The term data virtualization is based on the word virtualization. Virtualization is not a new concept in the IT industry. Probably the first application of virtualization was in the 1960s when IBM used this concept to split mainframes into separate virtual machines, which made it possible for one machine to run multiple applications concurrently. Also in the 1960s, virtual memory was introduced using a technique called paging. Memory virtualization was used to simulate more memory than was physically available in a machine. Nowadays, almost everything can be virtualized, including processors, storage (see [3]), networks, data centers (see [4]), and operating systems. VMWare and Cloud can also be regarded as virtualization technologies.
In general, virtualization means that applications can use a resource without any concern for where it resides, what the technical interface is, how it has been implemented, which platform it uses, and how much of it is available. A virtualization solution encapsulates the resource in such a way that all those technical details become hidden and the application can work with a simpler interface.
The first time I was involved in a project in which a resource was virtualized was very early in my career. An application had to be written that could work with different user interface technologies. One was called Teletext and was developed for TV sets, and another was a character-based terminal. Being the technical designer, I decided to develop an API that the application would use to get data on a screen and to get input back. This API was a layer of software that would hide the user interface technology in use for the rest of the application. Without knowing it, I had designed a user interface virtualization layer. Since then, I’ve always tried to design systems in such a way that implementation details of certain technologies are hidden for other parts of an application in order to simplify