You are on page 1of 85

Chapter 19

Current and Emerging Trends Transparencies

Pearson Education Limited, 2004

Chapter 19 Objectives

Requirements for advanced database applications. Why RDBMSs currently not well suited to supporting these. Main concepts of DDBMSs. Main concepts of database replication. Main concepts of OODBMSs and ORDBMSs. Main concepts of data warehousing. Main concepts of OLAP and data mining. Approaches for integrating databases into the web environment. Pearson Education Limited,
2004

Advanced Database Applications

Computer-Aided Design (CAD) Computer-Aided Manufacturing (CAM) Office Information Systems (OIS) and Multimedia Systems Geographic Information Systems (GIS) Interactive and Dynamic Web sites.

Pearson Education Limited, 2004

Advanced Database Applications


Computer-Aided Design (CAD) Stores data relating to mechanical and electrical design, for example, buildings, airplanes, and integrated circuit chips. Designs of this type have some common characteristics:

Data has many types, each with a small number of instances. Designs may be very large.
Pearson Education Limited, 2004 4

Advanced Database Applications


Design is not static but evolves through time. Updates are far-reaching. Involves version control and configuration management. Cooperative engineering. Stores similar data to CAD, plus data about discrete production.
Pearson Education Limited, 2004

Computer-Aided Manufacturing (CAM)

Office Information Systems (OIS) and Multimedia Systems

Stores data relating to computer control of information in a business, including electronic mail, documents, invoices, etc. Modern systems now handle free-form text, photographs, diagrams, audio and video sequences. Documents may have specific structure, perhaps described using mark-up language such as SGML, HTML, or XML.
Pearson Education Limited, 2004 6

Geographic Information Systems (GIS)

GIS database stores spatial and temporal information, such as that used in land management and underwater exploration. Much of data is derived from survey and satellite photographs, and tends to be very large. Searches may involve identifying features based on shape, color, texture, using advanced pattern-recognition techniques.
Pearson Education Limited, 2004 7

Interactive and Dynamic Web Sites

Consider online catalog for selling clothes. Web site maintains preferences for previous visitors to site and allows visitor to:

obtain 3D rendering of any item based on color, size, fabric, etc.; modify rendering to account for movement, illumination, backdrop, occasion, etc.; select accessories to go with the outfit, from items presented in a sidebar;

Pearson Education Limited, 2004

Interactive and Dynamic Web Sites

Need to handle multimedia content and to interactively modify display based on user preferences and user selections. Also have added complexity of providing 3D rendering.

Pearson Education Limited, 2004

Weaknesses of RDBMSs

Poor Representation Entities

of

Real

World

Normalization leads to relations that do not correspond to entities in real world. Relational model has only one construct for representing data and data relationships: the table. Relational model is semantically overloaded.
Pearson Education Limited, 2004 10

Semantic Overloading

Weaknesses of RDBMSs

Poor Support for Business Rules Limited Operations

RDBMSs only have a fixed set of operations which cannot be extended.

Difficulty Handling Recursive Queries

Extremely difficult to produce recursive queries. Extension proposed to relational algebra to handle this type of query is unary transitive (recursive) closure operation.
Pearson Education Limited, 2004 11

Weaknesses of RDBMSs

Impedance Mismatch

Most DMLs lack computational completeness. To overcome this, SQL can be embedded in a high-level 3GL. This produces an impedance mismatch - mixing different programming paradigms.

Pearson Education Limited, 2004

12

DDBMSs - Concepts
Distributed Database
A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network.

Distributed DBMS
Software system that permits the management of the distributed database and makes the distribution transparent to users.
Pearson Education Limited, 2004

13

DDBMSs- Concepts

Collection of logically-related shared data. Data split into fragments. Fragments may be replicated. Fragments/replicas allocated to sites. Sites linked by a communications network. Data at each site is under control of a DBMS. DBMSs handle local appns autonomously. Each DBMS participates in at least one global appn. Pearson Education Limited,
2004 14

DDBMS

Pearson Education Limited, 2004

15

Distributed Processing
Centralized database that can be accessed over a computer network.

Pearson Education Limited, 2004

16

Advantages of DDBMSs

Reflects organizational structure Improved shareability and local autonomy Improved availability Improved reliability Improved performance Economics Modular growth
Pearson Education Limited, 2004 17

Disadvantages of DDBMSs

Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex
Pearson Education Limited, 2004 18

Replication Servers
Replication

Process of generating and reproducing multiple copies of data at one or more sites.

Provides users with access to current data where and when they need it. Provides number of benefits, including improved performance when centralized resources get overloaded, increased reliability and data availability, and support for mobile computing and data warehousing.
Pearson Education Limited, 2004 19

Synch vs Asynch Replication


Synchronous updates to replicated data are part of enclosing transaction.

If one or more sites that hold replicas are unavailable transaction cannot complete. Large number of messages required to coordinate synchronization.

Asynchronous - target database updated after source database modified.

Delay in regaining consistency may range from few seconds to several hours or even days.
Pearson Education Limited, 2004 20

Replication - Functionality

At basic level, has to be able to copy data from one database to another (synch. or asynch.). Other functions include:

Scalability. Mapping and Transformation. Object Replication. Specification of Replication Schema. Subscription mechanism. Initialization mechanism.
Pearson Education Limited, 2004 21

Replication - Data Ownership

Ownership relates to which privilege to update the data. Main types of ownership are:

site

has

Master/slave (or asymmetric replication), Workflow, Update-anywhere (or peer-to-peer symmetric replication).

or

Pearson Education Limited, 2004

22

Replication - Master/Slave Ownership

Asynchronously replicated data is owned by one (master) site, and can be updated by only that site. Using publish-and-subscribe metaphor, master site makes data available. Other sites subscribe to data owned by master site, receiving read-only copies. Potentially, each site can be master site for non-overlapping data sets, but update conflicts cannot occur.
Pearson Education Limited, 2004 23

Replication - Workflow Ownership

Avoids update conflicts, while providing more dynamic ownership model. Allows right to update replicated data to move from site to site. However, at any one moment, only ever one site that may update that particular data. Example is order processing system, which follows steps, such as order entry, credit approval, invoicing, shipping, and so on.
Pearson Education Limited, 2004 24

Replication Ownership

Update-Anywhere

Creates peer-to-peer environment where multiple sites have equal rights to update replicated data. Allows local sites to function autonomously, even when other sites are not available. Shared ownership can lead to conflict scenarios and have to detect conflict and resolve it.
Pearson Education Limited, 2004 25

OODBMSs

No one agreed object data model. One definition:

Object-Oriented Data Model (OODM) Data model that captures semantics of objects supported in object-oriented programming. Object-Oriented Database (OODB) Persistent and sharable collection of objects defined by an OODM. Object-Oriented DBMS (OODBMS) Manager of an OODB.
Pearson Education Limited, 2004 26

Origins of the OODM

Pearson Education Limited, 2004

27

Advantages of OODBMSs

Enriched Modeling Capabilities. Extensibility. Removal of Impedance Mismatch. More Expressive Query Language. Support for Schema Evolution. Support for Long Duration Transactions. Applicability to Advanced Database Applications. Pearson Education Improved Performance. Limited, 2004 28

Disadvantages of OODBMSs

Lack of Experience. Lack of Standards. Competition from RDBMSs. Complexity. Lack of Support for Views. Lack of Support for Security.

Pearson Education Limited, 2004

29

ORDBMSs

Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited to advanced database applications, and added functionality is required. Reject claim that ORDBMSs will not provide sufficient functionality or will be too slow to cope adequately with new complexity. Can remedy shortcomings of relational model by extending model with OO features.
Pearson Education Limited, 2004 30

ORDBMSs - Features

OO features being added include:


user-extensible types, encapsulation, inheritance, polymorphism, dynamic binding of methods, complex objects including non-1NF objects, object identity.
Pearson Education Limited, 2004

31

ORDBMSs - Features

However, no single extended relational model. All models:


share basic relational tables and query language, all have some concept of object, some can store methods (or procedures or triggers).

Pearson Education Limited, 2004

32

Advantages of ORDBMSs

Resolves many of known weaknesses of RDBMS. Reuse and sharing:

reuse comes from ability to extend server to perform standard functionality centrally; gives rise to increased productivity both for developer and end-user.

Preserves significant body of knowledge and experience gone into developing relational applications.
Pearson Education Limited, 2004 33

Disadvantages of ORDBMSs

Complexity. Increased costs. Proponents of relational approach believe simplicity and purity of relational model are lost. Some believe RDBMS is being extended for what will be a minority of applications. OO purists not attracted by extensions either.
Pearson Education Limited, 2004 34

Evolution of Data Warehousing

Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective services to customer. This resulted in accumulation of growing amounts of data in operational databases. Now focus on ways to use operational data to support decision-making, as a means of gaining competitive advantage. Pearson Education Limited,
2004

35

Evolution of Data Warehousing

Operational systems were never designed to support such business activities, so using such systems may not be easy solution. Businesses typically have numerous operational systems with overlapping and sometimes contradictory definitions (such as data types). Challenge is to turn archives of data into a source of knowledge, so that a single integrated/consolidated view of organizations data is presented Pearson Education Limited, to user. 2004

36

The Evolution of Data Warehousing

Data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources.

Pearson Education Limited, 2004

37

Data Warehousing Concepts


Consolidated/integrated view of corporate data drawn from disparate operational data sources and a range of end-user access tools capable of supporting simple to highly complex queries to support decision-making. Data described as being a subjectoriented, integrated, time-variant, and non-volatile (Inmon, 1993).
Pearson Education Limited, 2004 38

Subject-Oriented Data

Warehouse is organized around major subjects of the enterprise (e.g. customers, products, sales) rather than major application areas (e.g. customer invoicing, stock control, product sales).

This is reflected in the need to store decision-support data rather than application-oriented data.
Pearson Education Limited, 2004 39

Integrated Data

Data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent. Integrated data source must be made consistent to present a unified view of the data to the users.
Pearson Education Limited, 2004

40

Time-Variant Data

Data in the warehouse is only accurate and valid at some point in time or over some time interval. Time-variance is also shown in extended time that data is held, implicit or explicit association of time all data, and the fact that the represents a series of snapshots.
Pearson Education Limited, 2004

the the with data

41

Non-Volatile Data

Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis. New data is always added as a supplement to the database, rather than a replacement.

Pearson Education Limited, 2004

42

Typical Architecture of a DW

Pearson Education Limited, 2004

43

Typical Architecture of a DW

Operational data:

Supplied from mainframes, proprietary file systems, private workstations and servers, and external systems such as the Internet. Repository of current and integrated operational data used for analysis. Often structured and supplied with data in the same way as the data warehouse. May act simply as a staging area for data to be moved into the warehouse.
Pearson Education Limited, 2004 44

Operational data store (ODS):


Typical Architecture of a DW

Load Manager:

Performs all operations associated with extraction and loading of data into warehouse. Performs all operations associated with management of data in the warehouse, such as merging data sources. Performs all associated with management of user queries. Pearson Education Limited,
2004

Warehouse Manager

Query Manager

45

Typical Architecture of a DW

Detailed data:

Not stored online but made available by summarizing data to the next level of detail. However, detailed data regularly added to warehouse to supplement summarized data. Predefined and generated by warehouse manager and stored in warehouse. Purpose is to speed up performance of queries. Updated continuously as new data is loaded into the warehouse.
Pearson Education Limited, 2004 46

Lightly and highly summarized data:

Typical Architecture of a DW

Meta-data (data about data):

Used by all processes in the warehouse. Principal purpose of data warehousing is to provide information to business users for strategic decision-making. Users interact with warehouse using end-user access tools. Warehouse must efficiently support ad hoc and routine analysis. Includes EIS, OLAP and data mining tools.
Pearson Education Limited, 2004 47

End-user access tools:

Data Mart
Subset of data warehouse that supports requirements of particular department or business function.

Characteristics include:

Holds subset of data in warehouse in summary form. Focuses on requirements of one department or business function. Can be stand-alone or linked to warehouse. Popular because less complex than warehouse.
Pearson Education Limited, 2004 48

Architecture of a Data Mart

Can be two-tier or three-tier database applications: Data warehouse is the optional first tier. Data mart is the second tier. End-user workstation is the third tier. Data is distributed among tiers.

Pearson Education Limited, 2004

49

Reasons for Creating a Data Mart

Give users access to data they need to analyze most often. Provide data in a form that matches the collective view of the data by group of users in department or business area. Improve end-user response time due to reduction in volume of data to be accessed. Provide appropriately structured data as dictated by requirements of end-user access Pearson Education Limited, tools. 2004 50

Reasons for Creating a Data Mart

Simpler to build compared with establishing a corporate data warehouse. Cost of implementation is normally less than that required to establish a data warehouse. Potential users are more clearly defined and can be more easily targeted to obtain support for a data mart project rather than a corporate data warehouse project.
Pearson Education Limited, 2004 51

Introducing OLAP
Dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data.

Describes a technology that uses a multidimensional view of aggregate data to provide quick access to strategic information for purposes of advanced analysis.
Pearson Education Limited, 2004 52

Introducing OLAP

Enables users to gain deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to wide variety of possible views of data. Allows users to view corporate data in such a way that it is a better model of the true dimensionality of the enterprise.
Pearson Education Limited, 2004 53

Introducing OLAP

Can easily answer who? and what? questions, however, ability to answer what if? and why? type questions distinguishes OLAP from general-purpose query tools. Types of analysis ranges from basic navigation and browsing (slicing and dicing), to calculations, to more complex analyses such as time series and complex modeling. Pearson Education Limited,
2004

54

Examples of OLAP Applications

Pearson Education Limited, 2004

55

OLAP Applications

Essential requirement of all OLAP applications is ability to provide users with just-in-time (JIT) information, to make effective decisions about an organization's strategic directions. JIT information is computed data that usually reflects complex relationships and is often calculated on the fly. Practical only if response times are consistently short and data model flexible.
Pearson Education Limited, 2004 56

OLAP Applications

Although OLAP applications are found in widely divergent functional areas, all have following key features:

multi-dimensional views of data; support for complex calculations; time intelligence.

Time intelligence is key feature of almost any analytical application as performance is almost always judged over time.
Pearson Education Limited, 2004 57

OLAP Benefits

Increased productivity of end-users. Reduced backlog of applications development for IT staff. Retention of organizational control over the integrity of corporate data. Reduced query drag and network traffic on OLTP systems or on the data warehouse. Improved potential revenue and profitability.
Pearson Education Limited, 2004 58

Data Mining
Process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions.

Involves analysis of data and use of software techniques for finding hidden and unexpected patterns and relationships in sets of data. Pearson Education Limited,
2004

59

Data Mining

Focus is to reveal information that is hidden and unexpected.


Patterns and relationships are identified by examining the underlying rules and features in the data. Tends to work from the data up and most accurate results normally require large volumes of data to deliver reliable conclusions.
Pearson Education Limited, 2004 60

Data Mining

Starts by developing an optimal representation of structure of sample data, during which time knowledge is acquired and extended to larger sets of data. Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing. Relatively new technology, however already used in a number of industries.
Pearson Education Limited, 2004 61

Some Applications of Data Mining

Retail / Marketing

Identifying buying patterns of customers. Finding associations among customer demographic characteristics. Predicting response to mailing campaigns. Market basket analysis.

Pearson Education Limited, 2004

62

Some Applications of Data Mining

Banking

Detecting patterns of fraudulent credit card use. Identifying loyal customers. Predicting customers likely to change their credit card affiliation. Determining credit card spending by customer groups.

Pearson Education Limited, 2004

63

Some Applications of Data Mining

Insurance

Claims analysis. Predicting which customers will buy new policies. Characterizing patient behavior to predict surgery visits. Identifying successful medical therapies for different illnesses.
Pearson Education Limited, 2004 64

Medicine

Data Mining Operations

Four main operations include:


Predictive modeling. Database segmentation. Link analysis. Deviation detection.

Recognized associations between the applications and corresponding operations.

e.g. Direct marketing strategies use database segmentation.


Pearson Education Limited, 2004 65

Data Mining Techniques

Techniques are specific implementations of the data mining operations.


Each operation has its own strengths and weaknesses. Data mining tools sometimes offer a choice of operations to implement a technique.
Pearson Education Limited, 2004

66

Data Mining Techniques

Criteria for selection of tool includes:


Suitability for certain input data types. Transparency of the mining output. Tolerance of missing variable values. Level of accuracy possible. Ability to handle large volumes of data.

Pearson Education Limited, 2004

67

Web-database integration

Just over a decade after its conception in 1989, Web is arguably most popular and powerful networked information system to date. Growth has been near exponential and it has started an information revolution that will continue through the next decade. Now combination of the Web and databases brings many new opportunities for creating advanced database applications.
Pearson Education Limited, 2004 68

Web-database integration

Compelling platform for delivery and dissemination of data-centric, interactive applications. Organizations now rapidly building new database applications or reengineering existing ones to take advantage of Web as strategic platform for implementing innovative business solutions, in effect becoming Web-centric organizations.
Pearson Education Limited, 2004 69

Static and Dynamic Web Pages

HTML/XML document stored in file is static Web page. Content of dynamic Web page is generated each time it is accessed. Thus, dynamic Web page can:

respond to user input from browser; be customized by and for each user.

Requires hypertext to be generated by servers.


Pearson Education Limited, 2004 70

Static and Dynamic Web Pages

Need scripts that perform conversions from different data formats into HTML/XML onthe-fly. As a database is dynamic, changing as users create, insert, update, and delete data, then generating dynamic Web pages is a much more appropriate approach than creating static ones.
Pearson Education Limited, 2004

71

Requirements for Web-DBMS Integration

Ability to access valuable corporate data in a secure manner. Data- and vendor-independent connectivity to allow freedom of choice in DBMS selection. Ability to interface to database independent of any proprietary browser or Web server. Connectivity solution that takes advantage of all the features of an organizations DBMS.
Pearson Education Limited, 2004 72

Requirements for Web-DBMS Integration

Open architecture to allow interoperability with a variety of systems and technologies. Cost-effective solution that allows for scalability, growth, and changes in strategic directions, and helps reduce applications development costs. Support for transactions that span multiple HTTP requests.
Pearson Education Limited, 2004

73

Requirements for Web-DBMS Integration

Support for session- and application-based authentication. Acceptable performance. Minimal administration overhead. Set of high-level productivity tools to allow applications to be developed, maintained, and deployed with relative ease and speed.

Pearson Education Limited, 2004

74

Approaches to Integrating Web and DBMSs

Scripting Languages. Common Gateway Interface (CGI). HTTP Cookies. Extending the Web Server. Java, JDBC, SQLJ, Servlets, and JSP. Vendor-specific solutions such as:

Microsoft Web Solution Platform: ASP and ADO. Oracle Internet Platform.
Pearson Education Limited, 2004 75

XML (eXtensible Markup Language)

Most documents on Web currently stored and transmitted in HTML. One strength of HTML is its simplicity. However, its simplicity is also one of its weaknesses, with growing need from users who want tags to simplify some tasks and make HTML documents more attractive and dynamic.
Pearson Education Limited, 2004 76

XML

To satisfy this demand, vendors introduced some browser-specific HTML tags, which made it difficult to develop sophisticated, widely viewable Web documents. W3C has produced a new standard called XML, which could preserve the general application independence that makes HTML portable and powerful.
Pearson Education Limited, 2004

77

XML
Meta-language (language for describing other languages) that enables designers to create their own customized tags to provide functionality not available with HTML.

Restricted version of SGML (Standard Generalized Markup Language), designed especially for Web documents.
Pearson Education Limited, 2004 78

XML

Set to impact every aspect of programming including graphical interfaces, embedded systems, distributed systems, and database management. Becoming de facto standard for data communication within software industry, and quickly replacing EDI as primary medium for data interchange among businesses. Some analysts believe it will become language in which most documents are Pearson Education Limited, created and stored, both on and off Internet.
2004

79

XML and Databases

As amount of data in XML expands, there will be increasing demand to store, retrieve, and query this data. Two main models anticipated:

data-centric document-centric.

Pearson Education Limited, 2004

80

XML Data-centric model

Fact that data is stored/transferred as XML is incidental. In this case, data could be stored in RDBMS, ORDBMS, or OODBMS. Oracle has completely integrated XML into its Oracle 9i system.

XML can be stored as entire documents using data types XMLType or CLOB/BLOB or can be decomposed into its constituent elements and stored that way. Oracle query language has been extended to Pearson Education Limited, permit searchingof XML-based content. 2004 81

XML Document-centric model

Documents designed for human consumption (eg. books, newspapers, email). Data may be irregular/incomplete, and structure may change rapidly or unpredictably. Unfortunately, RDBMSs, ORDBMSs, and OODBMSs do not handle data of this nature particularly well. Content management systems are important tools for handling these types of documents. Underlying such a system, may now find a native XML database. Education Limited, Pearson
2004 82

Native XML Database

Defines (logical) data model for an XML document (as opposed to the data in that document) and stores and retrieves documents according to that model. At a minimum, model must include elements, attributes, PCDATA, and document order. XML document must be the unit of (logical) storage although it is not restricted by any underlying physical storage model (so are not ruled traditional DBMSs Pearson Education Limited, out) . 2004 83

XML Query Languages

DBMS vendors have extended SQL to handle query of XML-based content. Standardization of XML extensions to SQL is known as SQL/XML and initial work has been submitted to ISO and ANSI. In addition, W3C formed an XML Query Working Group to produce:

data model for XML documents, set of query operators on this model, query language based on these query operators (called XQuery). Pearson Education Limited,
2004 84

XML XQuery

Queries operate on single documents or fixed collections of documents. Can select entire documents or subtrees of documents that match conditions based on document content and structure. Queries can also construct new documents based on what has been selected. Ultimately, collections of XML documents will be accessed like databases. Web Technology is highly dynamic so expect significant developments over the next years. Pearson Education Limited,
2004 85

You might also like