Professional Documents
Culture Documents
Chapter 19 Objectives
Requirements for advanced database applications. Why RDBMSs currently not well suited to supporting these. Main concepts of DDBMSs. Main concepts of database replication. Main concepts of OODBMSs and ORDBMSs. Main concepts of data warehousing. Main concepts of OLAP and data mining. Approaches for integrating databases into the web environment. Pearson Education Limited,
2004
Computer-Aided Design (CAD) Computer-Aided Manufacturing (CAM) Office Information Systems (OIS) and Multimedia Systems Geographic Information Systems (GIS) Interactive and Dynamic Web sites.
Data has many types, each with a small number of instances. Designs may be very large.
Pearson Education Limited, 2004 4
Design is not static but evolves through time. Updates are far-reaching. Involves version control and configuration management. Cooperative engineering. Stores similar data to CAD, plus data about discrete production.
Pearson Education Limited, 2004
Stores data relating to computer control of information in a business, including electronic mail, documents, invoices, etc. Modern systems now handle free-form text, photographs, diagrams, audio and video sequences. Documents may have specific structure, perhaps described using mark-up language such as SGML, HTML, or XML.
Pearson Education Limited, 2004 6
GIS database stores spatial and temporal information, such as that used in land management and underwater exploration. Much of data is derived from survey and satellite photographs, and tends to be very large. Searches may involve identifying features based on shape, color, texture, using advanced pattern-recognition techniques.
Pearson Education Limited, 2004 7
Consider online catalog for selling clothes. Web site maintains preferences for previous visitors to site and allows visitor to:
obtain 3D rendering of any item based on color, size, fabric, etc.; modify rendering to account for movement, illumination, backdrop, occasion, etc.; select accessories to go with the outfit, from items presented in a sidebar;
Need to handle multimedia content and to interactively modify display based on user preferences and user selections. Also have added complexity of providing 3D rendering.
Weaknesses of RDBMSs
of
Real
World
Normalization leads to relations that do not correspond to entities in real world. Relational model has only one construct for representing data and data relationships: the table. Relational model is semantically overloaded.
Pearson Education Limited, 2004 10
Semantic Overloading
Weaknesses of RDBMSs
Extremely difficult to produce recursive queries. Extension proposed to relational algebra to handle this type of query is unary transitive (recursive) closure operation.
Pearson Education Limited, 2004 11
Weaknesses of RDBMSs
Impedance Mismatch
Most DMLs lack computational completeness. To overcome this, SQL can be embedded in a high-level 3GL. This produces an impedance mismatch - mixing different programming paradigms.
12
DDBMSs - Concepts
Distributed Database
A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network.
Distributed DBMS
Software system that permits the management of the distributed database and makes the distribution transparent to users.
Pearson Education Limited, 2004
13
DDBMSs- Concepts
Collection of logically-related shared data. Data split into fragments. Fragments may be replicated. Fragments/replicas allocated to sites. Sites linked by a communications network. Data at each site is under control of a DBMS. DBMSs handle local appns autonomously. Each DBMS participates in at least one global appn. Pearson Education Limited,
2004 14
DDBMS
15
Distributed Processing
Centralized database that can be accessed over a computer network.
16
Advantages of DDBMSs
Reflects organizational structure Improved shareability and local autonomy Improved availability Improved reliability Improved performance Economics Modular growth
Pearson Education Limited, 2004 17
Disadvantages of DDBMSs
Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex
Pearson Education Limited, 2004 18
Replication Servers
Replication
Process of generating and reproducing multiple copies of data at one or more sites.
Provides users with access to current data where and when they need it. Provides number of benefits, including improved performance when centralized resources get overloaded, increased reliability and data availability, and support for mobile computing and data warehousing.
Pearson Education Limited, 2004 19
If one or more sites that hold replicas are unavailable transaction cannot complete. Large number of messages required to coordinate synchronization.
Delay in regaining consistency may range from few seconds to several hours or even days.
Pearson Education Limited, 2004 20
Replication - Functionality
At basic level, has to be able to copy data from one database to another (synch. or asynch.). Other functions include:
Scalability. Mapping and Transformation. Object Replication. Specification of Replication Schema. Subscription mechanism. Initialization mechanism.
Pearson Education Limited, 2004 21
Ownership relates to which privilege to update the data. Main types of ownership are:
site
has
Master/slave (or asymmetric replication), Workflow, Update-anywhere (or peer-to-peer symmetric replication).
or
22
Asynchronously replicated data is owned by one (master) site, and can be updated by only that site. Using publish-and-subscribe metaphor, master site makes data available. Other sites subscribe to data owned by master site, receiving read-only copies. Potentially, each site can be master site for non-overlapping data sets, but update conflicts cannot occur.
Pearson Education Limited, 2004 23
Avoids update conflicts, while providing more dynamic ownership model. Allows right to update replicated data to move from site to site. However, at any one moment, only ever one site that may update that particular data. Example is order processing system, which follows steps, such as order entry, credit approval, invoicing, shipping, and so on.
Pearson Education Limited, 2004 24
Replication Ownership
Update-Anywhere
Creates peer-to-peer environment where multiple sites have equal rights to update replicated data. Allows local sites to function autonomously, even when other sites are not available. Shared ownership can lead to conflict scenarios and have to detect conflict and resolve it.
Pearson Education Limited, 2004 25
OODBMSs
Object-Oriented Data Model (OODM) Data model that captures semantics of objects supported in object-oriented programming. Object-Oriented Database (OODB) Persistent and sharable collection of objects defined by an OODM. Object-Oriented DBMS (OODBMS) Manager of an OODB.
Pearson Education Limited, 2004 26
27
Advantages of OODBMSs
Enriched Modeling Capabilities. Extensibility. Removal of Impedance Mismatch. More Expressive Query Language. Support for Schema Evolution. Support for Long Duration Transactions. Applicability to Advanced Database Applications. Pearson Education Improved Performance. Limited, 2004 28
Disadvantages of OODBMSs
Lack of Experience. Lack of Standards. Competition from RDBMSs. Complexity. Lack of Support for Views. Lack of Support for Security.
29
ORDBMSs
Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited to advanced database applications, and added functionality is required. Reject claim that ORDBMSs will not provide sufficient functionality or will be too slow to cope adequately with new complexity. Can remedy shortcomings of relational model by extending model with OO features.
Pearson Education Limited, 2004 30
ORDBMSs - Features
user-extensible types, encapsulation, inheritance, polymorphism, dynamic binding of methods, complex objects including non-1NF objects, object identity.
Pearson Education Limited, 2004
31
ORDBMSs - Features
share basic relational tables and query language, all have some concept of object, some can store methods (or procedures or triggers).
32
Advantages of ORDBMSs
reuse comes from ability to extend server to perform standard functionality centrally; gives rise to increased productivity both for developer and end-user.
Preserves significant body of knowledge and experience gone into developing relational applications.
Pearson Education Limited, 2004 33
Disadvantages of ORDBMSs
Complexity. Increased costs. Proponents of relational approach believe simplicity and purity of relational model are lost. Some believe RDBMS is being extended for what will be a minority of applications. OO purists not attracted by extensions either.
Pearson Education Limited, 2004 34
Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective services to customer. This resulted in accumulation of growing amounts of data in operational databases. Now focus on ways to use operational data to support decision-making, as a means of gaining competitive advantage. Pearson Education Limited,
2004
35
Operational systems were never designed to support such business activities, so using such systems may not be easy solution. Businesses typically have numerous operational systems with overlapping and sometimes contradictory definitions (such as data types). Challenge is to turn archives of data into a source of knowledge, so that a single integrated/consolidated view of organizations data is presented Pearson Education Limited, to user. 2004
36
Data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources.
37
Subject-Oriented Data
Warehouse is organized around major subjects of the enterprise (e.g. customers, products, sales) rather than major application areas (e.g. customer invoicing, stock control, product sales).
This is reflected in the need to store decision-support data rather than application-oriented data.
Pearson Education Limited, 2004 39
Integrated Data
Data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent. Integrated data source must be made consistent to present a unified view of the data to the users.
Pearson Education Limited, 2004
40
Time-Variant Data
Data in the warehouse is only accurate and valid at some point in time or over some time interval. Time-variance is also shown in extended time that data is held, implicit or explicit association of time all data, and the fact that the represents a series of snapshots.
Pearson Education Limited, 2004
41
Non-Volatile Data
Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis. New data is always added as a supplement to the database, rather than a replacement.
42
Typical Architecture of a DW
43
Typical Architecture of a DW
Operational data:
Supplied from mainframes, proprietary file systems, private workstations and servers, and external systems such as the Internet. Repository of current and integrated operational data used for analysis. Often structured and supplied with data in the same way as the data warehouse. May act simply as a staging area for data to be moved into the warehouse.
Pearson Education Limited, 2004 44
Typical Architecture of a DW
Load Manager:
Performs all operations associated with extraction and loading of data into warehouse. Performs all operations associated with management of data in the warehouse, such as merging data sources. Performs all associated with management of user queries. Pearson Education Limited,
2004
Warehouse Manager
Query Manager
45
Typical Architecture of a DW
Detailed data:
Not stored online but made available by summarizing data to the next level of detail. However, detailed data regularly added to warehouse to supplement summarized data. Predefined and generated by warehouse manager and stored in warehouse. Purpose is to speed up performance of queries. Updated continuously as new data is loaded into the warehouse.
Pearson Education Limited, 2004 46
Typical Architecture of a DW
Used by all processes in the warehouse. Principal purpose of data warehousing is to provide information to business users for strategic decision-making. Users interact with warehouse using end-user access tools. Warehouse must efficiently support ad hoc and routine analysis. Includes EIS, OLAP and data mining tools.
Pearson Education Limited, 2004 47
Data Mart
Subset of data warehouse that supports requirements of particular department or business function.
Characteristics include:
Holds subset of data in warehouse in summary form. Focuses on requirements of one department or business function. Can be stand-alone or linked to warehouse. Popular because less complex than warehouse.
Pearson Education Limited, 2004 48
Can be two-tier or three-tier database applications: Data warehouse is the optional first tier. Data mart is the second tier. End-user workstation is the third tier. Data is distributed among tiers.
49
Give users access to data they need to analyze most often. Provide data in a form that matches the collective view of the data by group of users in department or business area. Improve end-user response time due to reduction in volume of data to be accessed. Provide appropriately structured data as dictated by requirements of end-user access Pearson Education Limited, tools. 2004 50
Simpler to build compared with establishing a corporate data warehouse. Cost of implementation is normally less than that required to establish a data warehouse. Potential users are more clearly defined and can be more easily targeted to obtain support for a data mart project rather than a corporate data warehouse project.
Pearson Education Limited, 2004 51
Introducing OLAP
Dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data.
Describes a technology that uses a multidimensional view of aggregate data to provide quick access to strategic information for purposes of advanced analysis.
Pearson Education Limited, 2004 52
Introducing OLAP
Enables users to gain deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to wide variety of possible views of data. Allows users to view corporate data in such a way that it is a better model of the true dimensionality of the enterprise.
Pearson Education Limited, 2004 53
Introducing OLAP
Can easily answer who? and what? questions, however, ability to answer what if? and why? type questions distinguishes OLAP from general-purpose query tools. Types of analysis ranges from basic navigation and browsing (slicing and dicing), to calculations, to more complex analyses such as time series and complex modeling. Pearson Education Limited,
2004
54
55
OLAP Applications
Essential requirement of all OLAP applications is ability to provide users with just-in-time (JIT) information, to make effective decisions about an organization's strategic directions. JIT information is computed data that usually reflects complex relationships and is often calculated on the fly. Practical only if response times are consistently short and data model flexible.
Pearson Education Limited, 2004 56
OLAP Applications
Although OLAP applications are found in widely divergent functional areas, all have following key features:
Time intelligence is key feature of almost any analytical application as performance is almost always judged over time.
Pearson Education Limited, 2004 57
OLAP Benefits
Increased productivity of end-users. Reduced backlog of applications development for IT staff. Retention of organizational control over the integrity of corporate data. Reduced query drag and network traffic on OLTP systems or on the data warehouse. Improved potential revenue and profitability.
Pearson Education Limited, 2004 58
Data Mining
Process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions.
Involves analysis of data and use of software techniques for finding hidden and unexpected patterns and relationships in sets of data. Pearson Education Limited,
2004
59
Data Mining
Data Mining
Starts by developing an optimal representation of structure of sample data, during which time knowledge is acquired and extended to larger sets of data. Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing. Relatively new technology, however already used in a number of industries.
Pearson Education Limited, 2004 61
Retail / Marketing
Identifying buying patterns of customers. Finding associations among customer demographic characteristics. Predicting response to mailing campaigns. Market basket analysis.
62
Banking
Detecting patterns of fraudulent credit card use. Identifying loyal customers. Predicting customers likely to change their credit card affiliation. Determining credit card spending by customer groups.
63
Insurance
Claims analysis. Predicting which customers will buy new policies. Characterizing patient behavior to predict surgery visits. Identifying successful medical therapies for different illnesses.
Pearson Education Limited, 2004 64
Medicine
66
Suitability for certain input data types. Transparency of the mining output. Tolerance of missing variable values. Level of accuracy possible. Ability to handle large volumes of data.
67
Web-database integration
Just over a decade after its conception in 1989, Web is arguably most popular and powerful networked information system to date. Growth has been near exponential and it has started an information revolution that will continue through the next decade. Now combination of the Web and databases brings many new opportunities for creating advanced database applications.
Pearson Education Limited, 2004 68
Web-database integration
Compelling platform for delivery and dissemination of data-centric, interactive applications. Organizations now rapidly building new database applications or reengineering existing ones to take advantage of Web as strategic platform for implementing innovative business solutions, in effect becoming Web-centric organizations.
Pearson Education Limited, 2004 69
HTML/XML document stored in file is static Web page. Content of dynamic Web page is generated each time it is accessed. Thus, dynamic Web page can:
respond to user input from browser; be customized by and for each user.
Need scripts that perform conversions from different data formats into HTML/XML onthe-fly. As a database is dynamic, changing as users create, insert, update, and delete data, then generating dynamic Web pages is a much more appropriate approach than creating static ones.
Pearson Education Limited, 2004
71
Ability to access valuable corporate data in a secure manner. Data- and vendor-independent connectivity to allow freedom of choice in DBMS selection. Ability to interface to database independent of any proprietary browser or Web server. Connectivity solution that takes advantage of all the features of an organizations DBMS.
Pearson Education Limited, 2004 72
Open architecture to allow interoperability with a variety of systems and technologies. Cost-effective solution that allows for scalability, growth, and changes in strategic directions, and helps reduce applications development costs. Support for transactions that span multiple HTTP requests.
Pearson Education Limited, 2004
73
Support for session- and application-based authentication. Acceptable performance. Minimal administration overhead. Set of high-level productivity tools to allow applications to be developed, maintained, and deployed with relative ease and speed.
74
Scripting Languages. Common Gateway Interface (CGI). HTTP Cookies. Extending the Web Server. Java, JDBC, SQLJ, Servlets, and JSP. Vendor-specific solutions such as:
Microsoft Web Solution Platform: ASP and ADO. Oracle Internet Platform.
Pearson Education Limited, 2004 75
Most documents on Web currently stored and transmitted in HTML. One strength of HTML is its simplicity. However, its simplicity is also one of its weaknesses, with growing need from users who want tags to simplify some tasks and make HTML documents more attractive and dynamic.
Pearson Education Limited, 2004 76
XML
To satisfy this demand, vendors introduced some browser-specific HTML tags, which made it difficult to develop sophisticated, widely viewable Web documents. W3C has produced a new standard called XML, which could preserve the general application independence that makes HTML portable and powerful.
Pearson Education Limited, 2004
77
XML
Meta-language (language for describing other languages) that enables designers to create their own customized tags to provide functionality not available with HTML.
Restricted version of SGML (Standard Generalized Markup Language), designed especially for Web documents.
Pearson Education Limited, 2004 78
XML
Set to impact every aspect of programming including graphical interfaces, embedded systems, distributed systems, and database management. Becoming de facto standard for data communication within software industry, and quickly replacing EDI as primary medium for data interchange among businesses. Some analysts believe it will become language in which most documents are Pearson Education Limited, created and stored, both on and off Internet.
2004
79
As amount of data in XML expands, there will be increasing demand to store, retrieve, and query this data. Two main models anticipated:
data-centric document-centric.
80
Fact that data is stored/transferred as XML is incidental. In this case, data could be stored in RDBMS, ORDBMS, or OODBMS. Oracle has completely integrated XML into its Oracle 9i system.
XML can be stored as entire documents using data types XMLType or CLOB/BLOB or can be decomposed into its constituent elements and stored that way. Oracle query language has been extended to Pearson Education Limited, permit searchingof XML-based content. 2004 81
Documents designed for human consumption (eg. books, newspapers, email). Data may be irregular/incomplete, and structure may change rapidly or unpredictably. Unfortunately, RDBMSs, ORDBMSs, and OODBMSs do not handle data of this nature particularly well. Content management systems are important tools for handling these types of documents. Underlying such a system, may now find a native XML database. Education Limited, Pearson
2004 82
Defines (logical) data model for an XML document (as opposed to the data in that document) and stores and retrieves documents according to that model. At a minimum, model must include elements, attributes, PCDATA, and document order. XML document must be the unit of (logical) storage although it is not restricted by any underlying physical storage model (so are not ruled traditional DBMSs Pearson Education Limited, out) . 2004 83
DBMS vendors have extended SQL to handle query of XML-based content. Standardization of XML extensions to SQL is known as SQL/XML and initial work has been submitted to ISO and ANSI. In addition, W3C formed an XML Query Working Group to produce:
data model for XML documents, set of query operators on this model, query language based on these query operators (called XQuery). Pearson Education Limited,
2004 84
XML XQuery
Queries operate on single documents or fixed collections of documents. Can select entire documents or subtrees of documents that match conditions based on document content and structure. Queries can also construct new documents based on what has been selected. Ultimately, collections of XML documents will be accessed like databases. Web Technology is highly dynamic so expect significant developments over the next years. Pearson Education Limited,
2004 85