You are on page 1of 55

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/200043078

Federated Database Systems for Managing Distributed,


Heterogeneous, and Autonomous Databases

Article  in  ACM Computing Surveys · September 1990


DOI: 10.1145/96602.96604 · Source: doi.acm.org

CITATIONS READS
2,123 828

2 authors, including:

Amit Sheth
Wright State University
984 PUBLICATIONS   27,568 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

PREDOSE PROJECT: A STUDY OF SOCIAL WEB DATA ON BUPRENORPHINE ABUSE USING SEMANTIC WEB TECHNOLOGY
(R21DA030571/Daniulaityte; Sheth, PIs) View project

Hazards SEES: Social and Physical Sensing Enabled Decision Support for Disaster Management and Response View project

All content following this page was uploaded by Amit Sheth on 22 May 2014.

The user has requested enhancement of the downloaded file.


Federated Database Systems for Managing Distributed,
Heterogeneous, and Autonomous Databases’
AMIT P. SHETH
Bellcore, lJ-210, 444 Hoes Lane, Piscataway, New Jersey 08854

JAMES A. LARSON
Intel Corp., HF3-02, 5200 NE Elam Young Pkwy., Hillsboro, Oregon 97124

A federated database system (FDBS) is a collection of cooperating database systems that


are autonomous and possibly heterogeneous. In this paper, we define a reference
architecture for distributed database management systems from system and schema
viewpoints and show how various FDBS architectures can be developed. We then define a
methodology for developing one of the popular architectures of an FDBS. Finally, we
discuss critical issues related to developing and operating an FDBS.

Categories and Subject Descriptors: D.2.1 [Software Engineering]: Requirements/


Specifications-methodologies; D.2.10 [Software Engineering]: Design; H.0
[Information Systems]: General; H.2.0 [Database Management]: General; H.2.1
[Database Management]: Logical Design--data models, schema and subs&ma; H.2.4
[Database Management]: Systems; H.2.5 [Database Management]: Heterogeneous
Databases; H.2.7 [Database Management]: Database Administration
General Terms: Design, Management
Additional Key Words and Phrases: Access control, database administrator, database
design and integration, distributed DBMS, federated database system, heterogeneous
DBMS, multidatabase language, negotiation, operation transformation, query processing
and optimization, reference architecture, schema integration, schema translation, system
evolution methodology, system/schema/processor architecture, transaction management

INTRODUCTION tern (DBMS), and one or more databases


that it manages. A federated database sys-
Federated Database System
tem (FDBS) is a collection of cooperating
A database system (DBS) consists of soft- but autonomous component database sys-
ware, called a database management sys- tems (DBSs). The component DBSs are

’ The views and conclusions in this paper are those of the authors and should not be interpreted as necessarily
representing the official policies, either expressed or implied, of Bellcore, Intel Corp., or the authors’ past or
present affiliations. It is the policy of Bellcore to avoid any statements of comparative analysis or evaluation
of vendors’ products. Any mention of products or vendors in this document is done where necessary for the
sake of scientific accuracy and precision, or for background information to a point of technology analysis, or to
provide an example of a technology for illustrative purposes and should not be construed as either positive or
negative commentary on that product or that vendor. Neither the inclusion of a product or a vendor in this
paper nor the omission of a product or a vendor should be interpreted as indicating a position or opinion of
that product or vendor on the part of the author(s) or of Bellcore.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its
date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To
copy otherwise, or to republish, requires a fee and/or specific permission.
0 1990 ACM 0360-0300/90/0900-0183 $01.50

ACM Computing Surveys, Vol. 22, No. 3, September 1990


184 l Amit Sheth and James Larson

CONTENTS or component DBMS, can be a centralized


or distributed DBMS or another FDBMS.
The component DBMSs can differ in such
aspects as data models, query languages,
INTRODUCTION and transaction management capabilities.
Federated Database System One of the significant aspects of an
Characteristics of Database Systems FDBS is that a component DBS can con-
Taxonomy of Multi-DBMS and Federated tinue its local operations and at the same
Database Systems
Scope and Organization of this Paper time participate in a federation. The inte-
1. REFERENCE ARCHITECTURE gration of component DBSs may be man-
1.1 System Components of a Reference aged either by the users of the federation
Architecture or by the administrator of the FDBS
1.2 Processor Types in the Reference
Architecture
together with the administrators of the
1.3 Schema Types in the Reference Architecture component DBSs. The amount of integra-
2. SPECIFIC FEDERATED DATABASE tion depends on the needs of federation
SYSTEM ARCHITECTURES users and desires of the administrators
2.1 Loosely Coupled and Tightly Coupled FDBSs
of the component DBSs to participate in
2.2 Alternative FDBS Architectures
2.3 Allocating Processors and Schemas the federation and share their databases.
to Computers The term federated database system was
2.4 Case Studies coined by Hammer and McLeod [ 19791 and
3. FEDERATED DATABASE SYSTEM Heimbigner and McLeod [1985]. Since its
EVOLUTION PROCESS
3.1 Methodology for Developing a Federated
introduction, the term has been used for
Database System several different but related DBS archi-
4. FEDERATED DATABASE SYSTEM tectures. As explained in this Introduc-
DEVELOPMENT TASKS tion, we use the term in its broader con-
4.1 Schema Translation text and include additional architectural
4.2 Access Control
4.3 Negotiation
alternatives as examples of the federated
4.4 Schema Integration architecture.
5. FEDERATED DATABASE SYSTEM The concept of federation exists in many
OPERATION contexts. Consider two examples from the
5.1 Query Formulation
5.2 Command Transformation
political domain-the United Nations
5.3 Query Processing and Optimization (UN) and the Soviet Union. Both entities
5.4 Global Transaction Management exhibit varying levels of autonomy and
6. FUTURE RESEARCH AND UNSOLVED heterogeneity among the components (sov-
PROBLEMS
ereign nations and the republics, respec-
ACKNOWLEDGMENTS
REFERENCES tively). The autonomy and heterogeneity is
BIBLIOGRAPHY greater in the UN than in the Soviet Union.
GLOSSARY The power of the federation body (the Gen-
APPENDIX: Features of Some eral Assembly of the UN and the central
FDBS/Multi-DBMS Efforts
government of the Soviet Union, respec-
tively) with respect to its components in
the two cases is also different. Just as peo-
ple do not agree on an ideal model or the
integrated to various degrees. The software utility of a federation for the political
that provides controlled and coordinated bodies and the governments, the database
manipulation of the component DBSs is context has no single or ideal model of
called a federated database management federation. A key characteristic of a feder-
system (FDBMS) (see Figure 1). ation, however, is the cooperation among
Both databases and DBMSs play impor- independent systems. In terms of an FDBS,
tant roles in defining the architecture of an it is reflected by controlled and sometimes
FDBS. Component database refers to a da- limited integration of autonomous DBSs.
tabase of a component DBS. A component The goal of this survey is to discuss the
DBS can participate in more than one fed- application of the federation concept for
eration. The DBMS of a component DBS, managing existing heterogeneous and au-
ACM Computing Surveys, Vol. 22, No. 3, September 1990
Federated Database Systems l 185
FDBS
FDBMS

...

Figure 1. An FDBS and its components.

tonomous DBSs. We describe various ar- tion management strategies) has been pro-
chitectural alternatives and components of posed by Elmagarmid [1987]. Such a
a federated database system and explore characterization is particularly relevant to
the issues related to developing and oper- the study and development of transaction
ating such a system. The survey assumes management in FDBMS, an aspect of
an understanding of the concepts in basic FDBS that is beyond the scope of this
database management textbooks [ Ceri and paper.
Pelagatti 1984; Date 1986; Elmasri and
Navathe 1989; Tsichritzis and Lochovsky Distribution
19821 such as data models, the ANSI/
SPARC schema architecture, database de- Data may be distributed among multiple
sign, query processing and optimization, databases. These databases may be stored
transaction management, and distributed on a single computer system or on multiple
database management. computer systems, co-located or geograph-
ically distributed but interconnected by a
Characteristics of Database Systems
communication system. Data may be dis-
tributed among multiple databases in dif-
Systems consisting of multiple DBSs, of ferent ways. These include, in relational
which FDBSs are a specific type, may be terms, vertical and horizontal database par-
characterized along three orthogonal di- titions. Multiple copies of some or all of the
mensions: distribution, heterogeneity, and data may be maintained. These copies need
autonomy. These dimensions are discussed not be identically structured.
below with an intent to classify and define Benefits of data distribution, such as in-
such systems. Another characterization creased availability and reliability as well
based on the dimensions of the networking as improved access times, are well known
environment [single DBS, many DBSs in a [Ceri and Pelagatti 19841. In a distributed
local area network (LAN), many DBSs in DBMS, distribution of data may be in-
a wide area network (WAN), many net- duced; that is, the data may be deliberately
works], update related functions of partic- distributed to take advantage of these ben-
ipating DBSs (e.g., no update, nonatomic efits. In the case of FDBS, much of the
updates, atomic updates), and the types of data distribution is due to the existence of
heterogeneity (e.g., data models, transac- multiple DBSs before an FDBS is built.
ACM Computing Surveys, Vol. 22, No. 3, September 1990
186 l Amit Sheth and James Larson
derlying data model used to define data
DatabaseSystems structures and constraints. Both represen-
Differencesin DBMS tation (structure and constraints) and lan-
-data models guage aspects can lead to heterogeneity.
(structures,constraints,querylanguages)
-system levelsupport l Differences in structure: Different
(concurrencycontrol,commit,recovery) data models provide different structural
SemanticHeterogeneity primitives [e.g., the information modeled
C using a relation (table) in the relational
OperatingSystem 0 model may be modeled as a record type
-file systems m in the CODASYL model]. If the two rep-
-naming, file types,operations m
U resentations have the same information
-transaction support n content, it is easier to deal with the dif-
-interprocess communication I
C ferences in the structures. For example,
Hardware/System a address can be represented as an entity
-instruction set t in one schema and as a composite attri-
-data formats8 representation I
0 bute in another schema. If the informa-
-configuration n tion content is not the same, it may be
very difficult to deal with the difference.
Figure 2. Types of heterogeneities. As another example, some data models
(notably semantic and object-oriented
models) support generalization (and
property inheritance) whereas others do
not.
Many types of heterogeneity are due to l Differences in constraints: Two data
technological differences, for example, dif- models may support different con-
ferences in hardware, system software straints. For example, the set type in a
(such as operating systems), and commu- CODASYL schema may be partially
nication systems. Researchers and devel- modeled as a referential integrity con-
opers have been working on resolving such straint in a relational schema. CODA-
heterogeneities for many years. Several SYL, however, supports insertion and
commercial distributed DBMSs are avail- retention constraints that are not cap-
able that run in heterogeneous hardware tured by the referential integrity con-
and system software environments. straint alone. Triggers (or some other
The types of heterogeneities in the da- mechanism) must be used in relational
tabase systems can be divided into those systems to capture such semantics.
due to the differences in DBMSs and those l Differences in query languages:
due to the differences in the semantics of Different languages are used to manipu-
data (see Figure 2). late data represented in different data
models. Even when two DBMSs support
the same data model, differences in their
Heterogeneities due to Differences in DBMSs
query languages (e.g., QUEL and SQL)
An enterprise may have multiple DBMSs. or different versions of SQL supported
Different organizations within the enter- by two relational DBMSs could contrib-
prise may have different requirements and ute to heterogeneity.
may select different DBMSs. DBMSs Differences in the system aspects of the
purchased over a period of time may be DBMSs also lead to heterogeneity. Exam-
different due to changes in technology. Het- ples of system level heterogeneity include
erogeneities due to differences in DBMSs differences in transaction management
result from differences in data models and primitives and techniques (including
differences at the system level. These are concurrency control, commit protocols,
described below. Each DBMS has an un- and recovery), hardware and system

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 187
software requirements, and communication ams between 61 and 75, it may not be
capabilities. possible to correlate it to a score in
DB2.CLASS.SCORE because both 73 and
77 would have been represented by a score
Semantic Heterogeneity
of 7.5.
Semantic heterogeneity occurs when there Detecting semantic heterogeneity is a
is a disagreement about the meaning, inter- difficult problem. Typically, DBMS sche-
pretation, or intended use of the same or mas do not provide enough semantics to
related data. A recent panel on semantic interpret data consistently. Heterogeneity
heterogeneity [Cercone et al. 19901 showed due to differences in data models also con-
that this problem is poorly understood and tributes to the difficulty in identifica-
that there is not even an agreement regard- tion and resolution of semantic hetero-
ing a clear definition of the problem. Two geneity. It is also difficult to decouple
examples to illustrate the semantic heter- the heterogeneity due to differences in
ogeneity problem follow. DBMSs from those resulting from semantic
Consider an attribute MEAL-COST of heterogeneity.
relation RESTAURANT in database DBl
that describes the average cost of a meal
per person in a restaurant without service Autonomy
charge and tax. Consider an attribute by
the same name (MEAL-COST) of relation The organizational entities that manage
BOARDING in database DB2 that de- different DBSs are often autonomous. In
scribes the average cost of a meal per per- other words, DBSs are often under separate
son including service charge and tax. Let and independent control. Those who con-
both attributes have the same syntactic trol a database are often willing to let others
properties. Attempting to compare at- share the data only if they retain control.
tributes DBl.RESTAURANTS.MEAL- Thus, it is important to understand the
COST and DBS.BOARDING.MEAL- aspects of component autonomy and how
COST is misleading because they are they can be addressed when a component
semantically heterogeneous. Here the DBS participates in an FDBS.
heterogeneity is due to differences in A component DBS participating in an
the definition (i.e., in the meaning) of FDBS may exhibit several types of auton-
related attributes [Litwin and Abdellatif omy. A classification discussed by Veijalai-
19861. nen and Popescu-Zeletin [ 19881 includes
As a second example, consider an attri- three types of autonomy: design, commu-
nication, and execution. These and an ad-
bute GRADE of relation COURSE in
database DBl. Let COURSE.GRADE de- ditional type of component autonomy
scribe the grade of a student from the set called association autonomy are discussed
of values {A, B, C, D, FJ. Consider another below.
attribute SCORE of relation CLASS in da- Design autonomy refers to the ability of
tabase DB2. Let SCORE denote a normal- a component DBS to choose its own design
ized score on the scale of 0 to 10 derived by with respect to any matter, including
first dividing the weighted score of all ex-
ams on the scale of 0 to 100 in the course (a) The data being managed (i.e., the Uni-
and then rounding the result to the nearest verse of Discourse),
half-point. DBl.COURSE.GRADE and (b) The representation (data model, query
DBB.CLASS.SCORE are semantically het- language) and the naming of the data
erogeneous. Here the heterogeneity is due elements,
to different precision of the data values (c) The conceptualization or semantic
taken by the related attributes. For exam- interpretation of the data (which
ple, if grade C in DBl.COURSE.GRADE greatly contributes to the problem of
corresponds to a weighted score of all ex- semantic heterogeneity),

ACM Computing Surveys, Vol. 22, No. 3, September 1990


188 l Amit Sheth and James Larson

(d) Constraints (e.g., semantic integrity and resources (i.e., the data it manages)
constraints and the serializability cri- with others. This includes the ability to
teria) used to manage the data, associate or disassociate itself from the fed-
(e) The functionality of the system (i.e., eration and the ability of a component DBS
the operations supported by system), to participate in one or more federations.
(f) The association and sharing with other Association autonomy may be treated as
systems (see association autonomy be- a part of the design autonomy or as an
low), and autonomy in its own right. Alonso and
Barbara [1989] discuss the issues that are
k) The implementation (e.g., record and
relevant to this type of autonomy.
file structures, concurrency control
A subset of the above types of autonomy
algorithms).
were also identified by Heimbigner and
Heterogeneity in an FDBS is primarily McLeod [1985]. Du et al. [1990] use the
caused by design autonomy among compo- term local autonomy for the autonomy of a
nent DBSs. component DBS. They define two types of
The next two types of autonomy involve local autonomy requirements: operation
the DBMS of a component DBS. Commu- autonomy requirements and service auton-
nication autonomy refers to the ability of omy requirements. Operation autonomy re-
a component DBMS to decide whether quirements relate to the ability of a
to communicate with other component component DBS to exercise control over its
DBMSs. A component DBMS with com- database. These include the requirements
munication autonomy is able to decide related to design and execution autonomy.
when and how it responds to a request from Service autonomy requirements relate to the
another component DBMS. right of each component DBS to make de-
Execution autonomy refers to the ability cisions regarding the services it provides to
of a component DBMS to execute local other component DBSs. These include the
operations (commands or transactions sub- requirements related to association and
mitted directly by a local user of the com- communication autonomy. Garcia-Molina
ponent DBMS) without interference from and Kogan [1988] provide a different clas-
external operations (operations submitted sification of the types of autonomy. Their
by other component DBMSs or FDBMSs) classification is particularly relevant to the
and to decide the order in which to execute operating system and transaction manage-
external operations. Thus, an external sys- ment issues.
tem (e.g., FDBMS) cannot enforce an order The need to maintain the autonomy of
of execution of the commands on a com- component DBSs and the need to share
ponent DBMS with execution autonomy. data often present conflicting require-
Execution autonomy implies that a com- ments. In many practical environments, it
ponent DBMS can abort any operation that may not be desirable to support the auton-
does not meet its local constraints and that omy of component DBSs fully. Two exam-
its local operations are logically unaffected ples of relaxing the component autonomy
by its participation in an FDBS. Further- follow:
more, the component DBMS does not need
to inform an external system of the order l Association autonomy requires that each
in which external operations are executed component DBS be free to associate or
and the order of an external operation with disassociate itself from the federation.
respect to local operations. Operationally, This would require that the FDBS be
a component DBMS exercises its execution designed so that its existence and opera-
autonomy by treating external operations tion are not dependent on any single
in the same way as local operations. component DBS. Although this may be a
Association autonomy implies that a com- desirable design goal, the FDBS may
ponent DBS has the ability to decide moderate it by requiring that the entry
whether and how much to share its func- or departure of a component DBS must
tionality (i.e., the operations it supports) be based on an agreement between the

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 189

federation (i.e., its representative entity Different architectures and types of


such as the administrator of the FDBS) FDBSs are created by different levels of
and the component DBS (i.e., the admin- integration of the component DBSs and by
istrator of a component DBS) and cannot different levels of global (federation) serv-
be a unilateral decision of the component ices. We will use the taxonomy shown in
DBS. Figure 3 to compare the architectures of
l Execution autonomy allows a component various research and development efforts.
DBS to decide the order in which exter- This taxonomy focuses on the autonomy
nal and local operations are performed. dimension. Other taxonomies are possible
Futhermore, the component DBS need by focusing on the distribution and heter-
not inform the external system (e.g., ogeneity dimensions. Some recent publica-
FDBS) of this order. This latter aspect tions discussing various architectures or
of autonomy may, however, be relaxed by different taxonomies include Eliassen and
informing the FDBS of the order of Veijalainen [ 19881, Litwin and Zeroual
transaction execution (or transaction [ 19881, Ozsu and Valduriez [ 19901, and
wait-for graph) to allow simpler and Ram and Chastain [ 19891.
more efficient management of global MDBSs can be classified into two types
transactions. based on the autonomy of the component
DBSs: nonfederated database systems and
federated database systems. A nonfederated
Taxonomy of Multi-DBMS and Federated
database system is an integration of com-
Database Systems
ponent DBMSs that are not autonomous.
A DBS may be either centralized or distrib- It has only one level of management,2 and
uted. A centralized DBS system consists of all operations are performed uniformly. In
a single centralized DBMS managing a sin- contrast to a federated database system, a
gle database on the same computer system. nonfederated database system does not dis-
A distributed DBS consists of a single dis- tinguish local and nonlocal users. A partic-
tributed DBMS managing multiple data- ular type of nonfederated database system
bases. The databases may reside on a single in which all databases are fully integrated
computer system or on multiple computer to provide a single global (sometimes called
systems that may differ in hardware, sys- enterprise or corporate) schema can be
tem software, and communication support. called a unified MDBS. It logically appears
A multidatabase system (MDBS) supports to its users like a distributed DBS.
operations on multiple component DBSs. A federated database system consists of
Each component DBS is managed by (per- component DBSs that are autonomous yet
haps a different) component DBMS. A participate in a federation to allow partial
component DBS in an MDBS may be cen- and controlled sharing of their data. Asso-
tralized or distributed and may reside on ciation autonomy implies that the compo-
the same computer or on multiple com- nent DBSs have control over the data they
puters connected by a communication sub- manage. They cooperate to allow different
system. An MDBS is called a homogeneous degrees of integration. There is no central-
MDBS if the DBMSs of all component ized control in a federated architecture be-
DBSs are the same; otherwise it is called a cause the component DBSs (and their
heterogeneous MDBS. A system that only database administrators) control access to
allows periodic, nontransaction-based ex- their data.
change of data among multiple DBMSs FDBS represents a compromise between
(e.g., EXTRACT [Hammer and Timmer- no integration (in which users must explic-
man 19891) or one that only provides access itly interface with multiple autonomous da-
to multiple DBMSs one at a time (e.g., no tabases) and total integration (in which
joins across two databases) is not called an
MDBS. The former is a data exchange sys- * This definition may be diluted to include two levels
tem; the latter is a remote DBMS interface of management, where the global level has the author-
[Sheth 1987a]. ity for controlling data sharing.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


190 l Amit Sheth and James Larson
Multidatabase will consist of heterogeneous component
Systems DBSs. In the rest of this paper, we will use
the term FDBS to describe a heterogeneous
distributed DBS with autonomy of compo-
nent DBSs.
Nonfederated Federated FDBSs can be categorized as loosely
DatabaseSystems DatabaseSystems coupled or tightly coupled based on who
e.g., UNIBASE /\ manages the federation and how the com-
[Brzezinskiet 784 \ ponents are integrated. An FDBS is loosely
coupled if it is the user’s responsibility to
create and maintain the federation and
Loosely Coupled Tightly Coupled there is no control enforced by the feder-
e.g., MRDSM ated system and its administrators. Other
[Litwin 19851 terms used for loosely coupled FDBSs are
interoperable database system [Litwin and
/\ Abdellatif 19861 and multidatabase system
Single Multiple [Litwin et al. 1982].3 A federation is tightly
Federation Fedsrations coupled if the federation and its adminis-
e.g., DDTS e.g., Mermaid
trator(s) have the responsibility for creat-
[Dwyerand Larson19871 [Templetonet al. 1987a]
ing and maintaining the federation and
Figure 3. Taxonomy of multidatabase systems. actively control the access to component
DBSs. Association autonomy dictates that,
in both cases, sharing of any part of a
autonomy of each component DBS is sac- component database or invoking a capabil-
rificed so that users can access data through ity (i.e., an operation) of a component DBS
a single global interface but cannot directly is controlled by the administrator of the
access a DBMS as a local user). The fed- component DBS.
erated architecture is well suited for mi- A federation is built by a selective and
grating a set of autonomous and stand- controlled integration of its components.
alone DBSs (i.e., DBSs that are not sharing The activity of developing an FDBS results
data) to a system that allows partial and in creating a federated schema upon which
controlled sharing of data without affecting operations (i.e., query and/or updates) are
existing applications (and hence preserving performed. A loosely coupled FDBS always
significant investment in existing applica- supports multiple federated schemas. A
tion software). tightly coupled FDBS may have one or
To allow controlled sharing while pre- more federated schemas. A tightly coupled
serving the autonomy of component DBSs FDBS is said to have single federation if it
and continued execution of existing appli- allows the creation and management of
cations, an FDBS supports two types of only one federated schema.* Having a single
operations: local and global (or federation).
This dichotomy of local and global opera- 3 The term multidatabase has been used by different
people to mean different things. For example, Litwin
tions is an essential feature of an FDBS. [1985] and Rusinkiewicz et al. [1989] use the term
Global operations involve data access using multidatabase to mean loosely coupled FDBS (or in-
the FDBMS and may involve data managed teroperable system) in our taxonomy; Ellinghaus et al.
by multiple component DBSs. Component [1988] and Veijalainen and Popescu-Zeletin [1988] use
it to mean client-server type of FDBS in our taxon-
DBSs must grant permission to access the omy; and Dayal and Hwang [1984], Belcastro et al.
data they manage. Local operations are [1988], and Breitbart and Silberschatz [1988] use it to
submitted to a component DBS directly. mean tightly coupled FDBS in our taxonomy.
They involve only data in that component 4 Note that a tightly coupled FDBS with a single
DBS. A component DBS, however, does not federated schema is not the same as a unified MDBS
but is a special case of the latter. It espouses the
need to distinguish between local and global federation concepts such as autonomy of component
operations. In moSt environment% the DBMS~, dichotomy of operations, and controlled
FDBS will also be heterogeneous, that is, sharing that a unified MDBS does not.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 191
federated schema helps in maintaining uni- A type of FDBS architecture called the
formity in semantic interpretation of the client-server architecture has been dis-
integrated data. A tightly coupled FDBS is cussed by Ge et al. [ 19871 and Eliassen and
said to have multiple federations if it allows Veijalainen [1988]. In such a system, there
the creation and management of multiple is an explicit contract between a client and
federated schemas. Having multiple feder- one or more servers for exchanging infor-
ated schemas readily allows multiple inte- mation through predefined transactions. A
grations of component DBSs. Constraints client-server system typically does not al-
involving multiple component DBS, how- low ad hoc transactions because the server
ever, may be difficult to enforce. An orga- is designed to respond to a set of predefined
nization wanting to exercise tight control requests. The schema architecture of a
over the data (treated as a corporate re- client-server system is usually quite simple.
source) and the enforcement of constraints The schema of each server is directly
(including the so-called business rules) may mapped to the schema of the client. Thus
choose to allow only one federated schema. the client-server architecture can be con-
The terms federated database system and sidered to be a tightly coupled one for
federated database architecture were intro- FDBS with multiple federations.
duced by Heimbigner and McLeod [1985]
to mean “collection of components to unite
Scope and Organization of this Paper
loosely coupled federation in order to share
and exchange information” and “an orga- Issues involved in managing an FDBS deal
nization model based on equal, autonomous with distribution, heterogeneity, and au-
databases, with sharing controlled by ex- tonomy. Issues related to distribution have
plicit interfaces.” The multidatabase archi- been addressed in past research and devel-
tecture of Litwin et al. [1982] shares many opment efforts on distributed DBMSs. We
features of the above architecture. These will concentrate on the issues of autonomy
definitions include what we have defined as and heterogeneity. Recent surveys on the
loosely coupled FDBSs. The key FDBS related topics include Barker and Ozsu
concepts, however, are autonomy of com- [1988]; Litwin and Zeroual [1988]; Ram
ponents, and partial and controlled sharing and Chastain [ 19891, and Siegel [1987].
of data. These can also be supported when The remainder of this paper is organized
the components are tightly coupled. Hence as follows. In Section 1 we discuss a refer-
we include both loosely and tightly coupled ence architecture for DBSs. Two types of
FDBSs in our definition of FDBSs. system components-processors and sche-
MRDSM [Litwin 19851, OMNIBASE mas-are particularly applicable to FDBSs.
[Rusinkiewicz et al. 19891, and CALIDA In Section 2 we use the processors and
[Jacobson et al. 19881 are examples of schemas to define various FDBS architec-
loosely coupled FDBSs. In CALIDA, fed- tures. In Section 3 we discuss the phases in
erated schemas are generated by a database an FDBS evolution process. We also dis-
administrator rather than users as’in other cuss a methodology for developing a tightly
loosely coupled FDBSs. Users must be rel- coupled FDBS with multiple federations.
atively sophisticated in other loosely cou- In Section 4 we discuss four important
pled FDBSs to be able to define schemas/ tasks in developing an FDBS: schema
views over multiple component DBSs. translation, access control, negotiation, and
SIRIUS-DELTA [Litwin et al. 19821 and schema integration. In Section 5 we discuss
DDTS [Dwyer and Larson 19871 can be four tasks relevant to operating an FDBS:
categorized as tightly coupled FDBSs with query formulation, command transforma-
single federation. Mermaide [Templeton tion, query processing and optimization,
et al. 1987131and Multibase [Landers and and transaction management. Section 6
Rosenberg 19821 are examples of tightly summarizes and discusses issues that need
coupled FDBSs with multiple federations. further research and development. The
paper ends with references, a comprehen-
@Mermaid is a trademark of Unisys Corporation. sive bibliography, a glossary of the terms

ACM Computing Surveys, Vol. 22, No. 3, September 1990


192 l Amit Sheth and James Larson
used throughout this paper, and an appen- structure descriptions) (e.g., table defi-
dix comparing some features of relevant nitions in a relational model), and entity
prototype efforts. types and relationship types in the
entity-relationship model.
1. REFERENCE ARCHITECTURE l Mappings: Mappings are functions that
correlate the schema objects in one
A reference architecture is necessary to schema to the schema objects in another
clarify the various issues and choices within schema.
a DBS. Each component of the reference
architecture deals with one of the impor- These basic components can be com-
tant issues of a database system, federated bined in different ways to produce different
or otherwise, and allows us to ignore details data management architectures. Figure 4
irrelevant to that issue. We can concentrate illustrates the iconic symbols used for each
on a small number of issues at a time by of these basic components. The reasons for
analyzing a single component. A reference choosing these components are as follows:
architecture provides the framework in
l Most centralized, distributed, and feder-
which to understand, categorize, and com-
ated database systems can be expressed
pare different architectural options for de-
using these basic components.
veloping federated database systems.
Section 1.1 discusses the basic system com- l These components hide many of the
ponents of a reference architecture. Section implementation details that are not
1.2 discusses various types of processors relevant to understanding the im-
and the operations they perform on com- portant differences among alternate
mands and data. Section 1.3 discusses a architectures.
schema architecture of a reference archi- Two basic components, processors and
tecture. Other reference architectures de- schemas, play especially important roles
scribed in the literature include Blakey in defining various architectures. The pro-
[ 19871, Gligor and Luckenbaugh [ 19841, cessors are application-independent soft-
and Larson [ 19891. ware modules of a DBMS. Schemas are
application-specific components that de-
1.1 System Components of a Reference fine database contents and structure. They
Architecture are developed by the organizations to which
the users belong. Users of a DBS include
A reference architecture consists of various
both persons performing ad hoc operations
system components. Basic types of system
and application programs.
components in our reference architecture
are as follows:
1.2 Processor Types in the Reference
Data: Data are the basic facts and in- Architecture
formation managed by a DBS.
Database: A database is a repository of Data management architectures differ in
data structured according to a data the types of processors present and the
model. relationships among those processors.
There are four types of processors, each
Commands: Commands are requests
performing different functions on data ma-
for specific actions that are either entered
nipulation commands and accessed data:
by a user or generated by a processor.
transforming processors, filtering proces-
Processors: Processors are software sors, constructing processors, and accessing
modules that manipulate commands and processors. Each of the processor types is
data. discussed below.
Schemas: Schemas are descriptions of
data managed by one or more DBMSs. A 1.2.1 Transforming Processor
schema consists of schema objects and
their interrelationships. Schema objects Transforming processors translate com-
are typically class definitions (or data mands from one language, called source

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 193

Component Icon (with [Onuegbe et al. 1983; Zaniolo 19791,


Example) allowing a CODASYL DBS to be proc-
Type
essed using SQL commands.
l A program generator that translates SQL
commands into equivalent COBOL pro-
Processor grams allowing a file system to be proc-
essed using SQL commands.
Command For some command-transforming pro-
cessors, there may exist companion data-
transforming processors that convert data
Data
produced by the transformed commands
<--ii-> into data compatible with the commands
in the source format. For example, a data-
transforming processor that is the com-
panion to the above SQL-to-CODASYL
Schema
command-transforming processor is a table
builder that accepts individual database
records produced by the CODASYL DBMS
and builds complete tables for display to
Information
the SQL user.
Mapping Figure 5(a) illustrates a pair of compan-
ion transforming processors. Using infor-
mation from schema A, schema B, and the
mappings between them, the command-
transforming processor converts com-
Database mands expressed using schema A’s descrip-
tion into commands expressed using
schema B’s description. Using the
same information, the companion data-
transforming processor transforms data
Figure 4. Basic system components of the data man-
agement reference architecture.
described using schema B’s description
into data described using schema A’s
description.
To perform these transformations, a
language, to another language, called target
transforming processor needs mappings be-
language, or transform data from one
tween the objects of each schema. The task
format (source format) to another format
of schema translation involves transform-
(target format). Transforming processors
ing a schema (schema A) describing data in
provide a type of data independence called
one data model into an equivalent schema
data model transparency in which the data
(schema B) describing the same data in a
structures and commands used by one pro-
different data model. This task also gener-
cessor are hidden from other processors.
ates the mappings that correlate the
Data model transparency hides the dif-
schema objects in one schema (schema B)
ferences in query languages and data for-
to the schema objects in another schema
mats. For example, the data structures
(schema A). The task of command transfor-
used by one processor can be modified to
mation entails using these mappings to
improve overall efficiency without requiring
translate commands involving the schema
changes to other processors. Examples of
objects of one schema (schema B) into com-
command-transforming processors include
mands involving the schema objects of the
the following:
other schema (schema A). The schema
l A command transformer that trans- translation problem and the command
lates SQL commands into CODASYL transformation problem are further dis-
data manipulation language commands cussed in Sections 4.1 and 5.2, respectively.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


194 . Amit Sheth and James Larson

CA Schema B

(b)

Figure5. Transforming processors. (a) A pair of companion transforming processors.


(b) An abstract transforming processor.

Mappings are associated with a trans- command-transforming processor and data


forming processor in one of two ways. In converter pair.
the first case, the mappings are encoded
into the transforming processor’s logic,
making the transforming processor specific 1.2.2 Filtering Processor
to the schemas. Alternatively, the map- Filtering processors constrain the com-
pings are stored in a separate data structure mands and associated data that can be
and accessed by the transforming processor passed to another processor. Associated
when converting commands and data. This with each filtering processor are mappings
is a more general approach. It may also be that describe the constraints on commands
possible to generate a transforming proces- and data. These constraints may either be
sor for transforming specific commands embedded into the code of the filtering
or data automatically. For example, an processor or be specified in a separate data
SQL-to-COBOL program generator might structure. Examples of filtering processors
generate a specific data-transforming pro- include the following:
cessor, the generated COBOL program,
that converts data to the required form. Syntactic constraint checker, which
For the remainder of this paper we will checks commands to verify that they are
illustrate a command-transforming proces- syntactically correct.
sor and data converter pair as a single Semantic integrity constraint checker,
transforming processor as illustrated in which performs one or more of the follow-
Figure 4(b). This higher-level abstraction ing functions: (a) checks commands to
enables us to hide the differences between verify that they will not violate semantic
a single data-transforming processor, a sin- integrity constraints, (b) modifies com-
gle command-transforming processor, or a mands in such a manner that when the

ACM Computing Surveys, Vol. 22, No. 3, September 1990


I CommandFiltering
Processor

the Data Structures

(4

(b)

Figure 6. Filtering processors. (a) A pair of companion filtering processors. (b) An abstract filtering processor.

commands are interpreted, semantic in- than one way to translate an update com-
tegrity constraints will automatically be mand. We do not discuss the view update
enforced, or (c) verifies that data pro- task in more detail because we feel that a
duced by another processor does not vi- loosely coupled FDBS is not well suited to
olate any semantic integrity constraint. support updates, and solving this problem
l Access controller, which verifies that the in a tightly coupled FDBS is very similar
user is permitted to perform the com- to solving it in a centralized or distributed
mand on the indicated data or verifies DBMS [Sheth et al. 1988a].
that the user is permitted to use data
produced by another processor. 1.2.3 Constructing Processor
Figure 6(a) illustrates two filtering pro- Constructing processors partition and/or
cessors, one that controls commands and replicate an operation submitted by a single
one that controls data. Again, we will ab- processor into operations that are accepted
stract command- and data-filtering proces- by two or more other processors. Construct-
sors into a single filtering processor as ing processors also merge data produced by
illustrated in Figure 6(b). several processors into a single data set for
An important task that may be solved by consumption by another single processor.
a filtering processor is that of view update. They can support location, distribution,
This task occurs when the differences in and replication transparencies because a
data structures between the view and the processor submitting a command does not
schema is such that there may be more need to know the location, distribution, and

ACM Computing Surveys, Vol. 22, No. 3, September 1990


196 . Amit Sheth and James Larson

<a>

iYGzA /Data Exoressed\

(2 Schema A

(b)
Figure 7. Constructing processors. (a) A pair of constructing processors. (b) An abstract constructing
processor.

number of processors participating in pro- of companion constructing processors. Us-


cessing that command. ing information from schema A, schema B,
Tasks that can be handled by construct- schema C, and the mappings from schema
ing processors include the following: A to schemas B and C, the command de-
composer uses the commands expressed us-
Schema integration: Integrating mul- ing the schema A objects to generate the
tiple schemes into a single schema commands using the objects in schemas B
Negotiation: Determining what proto- and C. Schema A is an integrated schema
col should be used among the owners of that contains a description of all or parts
various schemas to be integrated in de- of the data described by schemas B and C.
termining the contents of an integrated Using the same information, the data
schema merger generates data in the format of
Query (command) decomposition schema A objects from data in the formats
and optimization: Decomposing and of the objects in schemas B and C.
optimizing a query (command) expressed Again we will abstract the command par-
on an integrated schema titioner and data merger pair into a single
constructing processor as illustrated in
Global transaction management:
Performing the concurrency and atomic- Figure 7(b).
ity control
1.2.4 Accessing Processor
These issues are further discussed in Sec- An accessing processor accepts commands
tions 4 and 5. Figure 7(a) illustrates a pair and produces data by executing the
ACM Computing Surveys, Vol. 22, No. 3, September 1990
Federated Database Systems l

commands against a database. It may ac-


cept commands from several processors
and interleave the processing of those com-
mands. Examples of accessing processors
include the following:
l A file management system that executes
access procedures against stored file
l A special application program that ac-
cepts commands and generates data to be
returned to the processor generating the Figure 8. Accessing processor.
commands
l A data manager of a DBMS containing
data access methods those structures. It is an attempt to de-
l A dictionary manager that manages ac- scribe all data of interest to an enterprise.
cess to dictionary data In the context of the ANSI/X3/SPARC
architecture, it is a database schema as
Figure 8 illustrates an accessing processor expressed in the data definition language
that accepts data manipulation commands of a centralized DBMS. The internal
and uses access methods to retrieve data schema describes physical characteristics of
from the database. the logical data structures in the conceptual
Issues that are addressed by accessing schema. These characteristics include in-
processors include local concurrency con- formation about the placement of records
trol, commitment, backup, and recovery. on physical storage devices, the placement
These problems and their solutions are ex- and type of indexes and physical represen-
tensively discussed in the literature for cen- tation of relationships between logical rec-
tralized and distributed DBMSs. Some of ords. Much of the description in the
the issues of adapting these problems to internal schema can be changed without
deal with heterogeneity and autonomy in having to change the conceptual schema.
the FDBSs are discussed in Section 5.4. By making changes to the description in
the internal schema and making the cor-
1.3 Schema Types in the Reference responding changes to the data in the da-
Architecture tabase, it is possible to change the physical
In this section, we first review the standard representation without changing any appli-
three-level schema architecture for central- cation program source code. Thus it is
ized DBMSs. We then extend it to a five- possible to fine tune the physical represen-
level architecture that addresses the tation of data and optimize the perfor-
requirements of dealing with distribution, mance of the DBMS in providing database
autonomy, and heterogeneity in an FDBS. access for selected applications.
Most users do not require access to all of
the data in a database. Thus they do not
1.3.1 ANSIISPARC Three-Level Schema
require access to all of the schema objects
Architecture
in the conceptual schema. Each user or
The ANSI/X3/SPARC Study Group on class of users may require access to only a
Database Systems outlined a three-level portion of the database. The subset of the
data description architecture [Tsichritzis database that may be accessed by a user or
and Klug 19781. The three levels of data a class of users is described by an external
description are the conceptual schema, the schema. Because different users may need
internal schema, and the external schema. access to different portions of the database,
A conceptual schema describes the con- each user or a class of users may require a
ceptual or logical data structures (i.e., the separate external schema.
schema consists of objects that provide a In terms of the above constructs, filtering
conceptual- or logical-level description of processors use the information in the ex-
the database) and the relationships among ternal schemas to control what data can be
ACM Computing Surveys, Vol. 22, No. 3, September 1990
198 . Amit Sheth and James Larson

Filtering
Processor n

Transforming
Processor

m Internal

Accessing
Processor

Figure 9. System architecture of a centralized DBMS.

accessed by which users. A transforming chitecture for federated systems shown in


processor translates commands expressed Figure 10. A system architecture consisting
using the conceptual schema objects into of both processors and schemas of an FDBS
commands using the internal schema ob- is shown in Figure 11.
jects. An accessing processor executes the The five-level schema architecture of an
commands to retrieve data from the phys- FDBS includes the following:
ical media. A system architecture consist-
ing of both processors and schemas of a Local Schema: A local schema is the con-
centralized DBS is shown in Figure 9. ceptual schema of a component DBS. A
local schema is expressed in the native data
model of the component DBMS, and hence
1.3.2 A Five-Level Schema Architecture for different local schemas may be expressed
Federated Databases in different data models.
The three-level schema architecture is ad- Component Schema: A component
equate for describing the architecture of a schema is derived by translating local sche-
centralized DBMS. It, however, is inade- mas into a data model called the canonical
quate for describing the architecture of an or common data model (CDM) of the FDBS.
FDBS. The three-level schema must be ex- Two reasons for defining component sche-
tended to support the three dimensions of mas in a CDM are (1) they describe the
a federated database system-distribution, divergent local schemas using a single rep-
heterogeneity, and autonomy. Examples of resentation and (2) semantics that are
extended schema architectures include a missing in a local schema can be added to
four-level schema architecture in Mermaid its component schema. Thus they facilitate
[Templeton et al. 1987131,five-level schema negotiation and integration tasks per-
architectures in DDTS [Devor et al. 1982b] formed when developing a tightly coupled
and SIRIUS-DELTA [Litwin et al. 19821, FDBS. Similarly, they facilitate negotia-
and others [Blakey 1987; Ram and tion and specification of views and multi-
Chastain 19891. We have adapted these database queries in a loosely coupled
architectures for our five-level schema ar- FDBS.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems . 199

I
Local
bb b
Schema

Figure 10. Five-level schema architecture of an FDBS.

onstructinq Processor onstructing Processor onstructina Processor

Filtering Processor Filtering Processor Filtering Processor

(F) (F) (Campon:nt)


Transforming Processor Transforming Processor

Figure 11. System architecture for an FDBS.


I)’
The process of schema translation from schema objects. Transforming processors
a local schema to a component schema use these mappings to transform com-
generates the mappings between the com- mands on a component schema into com-
ponent schema objects and the local mands on the corresponding local schema.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


200 l Amit Sheth and James Larson

Such transforming processors and the com- similar to that of federated schema is rep-
ponent schemas support the heterogeneity resented by the terms import schema
feature of an FDBS. [Heimbigner and McLeod 19851, global
schema [Landers and Rosenberg 1982J,
Export Schema: Not all data of a com- global conceptual schema [Litwin et al.
ponent DBS may be available to the fed- 19821, unified schema, and enterprise
eration and its users. An export schema schema, although the terms other than im-
represents a subset of a component schema port schemas are usually used when there
that is available to the FDBS. It may in- is only one such schema in the system.
clude access control information regarding
its use by specific federation users. The External Schema: An external schema
purpose of defining export schemas is to defines a schema for a user and/or appli-
facilitate control and management of asso- cation or a class of users/applications. Rea-
ciation autonomy. A filtering processor can sons for the use of external schemas are as
be used to provide the access control as follows:
specified in an export schema by limiting
the set of allowable operations that can be l Customization: A federated schema
submitted on the corresponding component can be quite large, complex, and difficult
schema. Such filtering processors and the to change. An external schema can be
export schemas support the autonomy fea- used to specify a subset of information in
ture of an FDBS. a federated schema that is relevant to the
Alternatively, the data available to the users of the external schema. They can
FDBS can be defined as the transactions be changed more readily to meet chang-
that can be executed by a component DBS ing users’ needs. The data model for an
(e.g., [Ge et al. 1987; Heimbigner and external schema may be different than
McLeod 1985; Veijalainen and Popescu- that of the federated schema.
Zeletin 19881). In this paper, however, we Additional integrity constraints:
will not consider that case of exporting Additional integrity constraints can also
transactions. be specified in the external schema.
Access control: Export schemas pro-
Federated Schema: A federated schema vide access control with respect to the
is an integration of multiple export sche- data managed by the component data-
mas. A federated schema also includes the bases. Similarly, external schemas pro-
information on data distribution that is vide access control with respect to the
generated when integrating export sche- data managed by the FDBS.
mas. Some systems use a separate schema
called a distribution schema or an allocation A filtering process analyzes the com-
schema to contain this information. A con- mands on an external schema to ensure
structing processor transforms commands their conformance with access control and
on the federated schema into the com- integrity constraints of the federated
mands on one or more export schemas. schema. If an external schema is in a dif-
Constructing processors and the federated ferent data model from that of the federated
schemas support the distribution feature of schema, a transforming processor is also
an FDBS. needed to transform commands on the ex-
There may be multiple federated sche- ternal schema into commands on the fed-
mas in an FDBS, one for each class of erated schema.
federation users. A class of federation users Most existing prototype FDBSs support
is a group of users and/or applications per- only one data model for all the external
forming a related set of activities. For ex- schemas and one query language interface.
ample, in a corporate environment, all Exceptions are a version of Mermaid that
managers may be one class of federation supported two query language interfaces,
users, and all employees and applications SQL and ARIEL, and a version of DDTS
in the accounting department may be an- that supported SQL and GORDAS (a
other class of federation users. A concept query language for an extended ER model).

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems 201

Future systems are likely to provide ing local schema. The additional semantics
more support for multimode1 external are supplied by the FDBS developer during
schemas and multiquery language interfaces the schema design, integration, and trans-
[Cardenas 1987; Kim 19891. lation processes.
The five-level schema architecture
Besides adding to the levels in the presented above has several possible
schema architecture, heterogeneity and au- redundancies.
tonomy requirements may also dictate
changes in the content of a schema. For Redundancy between external and
example, if an FDBS has multiple hetero- federated schemas: External schemas
geneous DBMSs providing different data can be considered redundant with feder-
management capabilities, a component ated schemas since a federated schema
schema should contain information on the could be generated for every different
operations supported by a component federation user. This is the case in the
DBMS. schema architecture of Heimbigner and
An FDBS may be required to support McLeod [ 19851 (they use the term import
local and external schemas expressed in schema rather than federated schema). In
different data models. To facilitate their loosely coupled FDBSs, a user defines the
design, integration, and maintenance, how- federated schema by integrating export
ever, all component, export, and federated schemas. Thus there is usually no need
schemas should be in the same data model. for an additional level. In tightly coupled
This data model is called canonical or com- FDBSs, however, it may be desirable to
mon data model (CDM). A language asso- generate a few federated schemas for
ciated with the CDM is called an internal widely different classes of users and to
command language. All commands on fed- customize these further by defining ex-
erated, export, and component schemas are ternal schemas. Such external schemas
expressed using this internal command can also provide additional access
language. control.
Database design and integration is a Redundancy between an external
complex process involving not only the schema of a component DBS and an
structure of the data stored in the databases export schema: If a component DBMS
but also the semantics (i.e., the meaning supports proper access control security
and use) of the data. Thus it is desirable to features for its external schemas and if
use a high-level, semantic data model [Hull translating a local schema into a compo-
and King 1987; Peckham and Maryanski nent schema is not required (e.g., the data
19881 for the CDM. Using concepts from model of the component DBMS is the
object-oriented programming along with a same as CDM of the FDBS), then the
semantic data model may also be appropri- external schemas of a component DBS
ate for use as a CDM [Kaul et al. 19901. may be used as an export schema in the
Although many existing FDBS prototypes five-level schema architecture (external
use some form of the relational model as schemas of component DBSs are not
the CDM (Appendix), we believe that fu- shown in the five-level schema architec-
ture systems are more likely to use a se- ture of Figure 10).
mantic data model or a combination of an Redundancy between component
object-oriented model and a semantic data schemas and local schemas: When
model. Most of the semantic data models component DBSs uses CDM of the
will adequately meet requirements of a FDBS and have the same functionality,
CDM, and the choice of a particular one is it is unnecessary to define component
likely to be subjective. Because a CDM schemas.
using a semantic data model may provide
richer semantic constructs than the data Figure 12 shows an example in which
models used to express the local schemas, some of the schema levels are not used. No
the component schema may contain more external schemas are defined over Feder-
semantic information than the correspond- ated Schema 2 (all of it is presented to all

ACM Computing Surveys, Vol. 22, No. 3, September 1990


202 l Amit Sheth and James Larson

Figure 12. Example FDBS schemas with missing schemas at some levels.

federation users using it). Component in one type of schema may also vary from
Schema 2 is the same as the Local Schema those in another type of schema. For ex-
2 (the data model of the Component DBMS ample, a federated schema may have
2 is the same as the CDM). No export schema objects describing the capabilities
schema is defined over Component Schema of the various component DBMSs in the
3 (all of it is exported to the FDBS). system, whereas no such objects exist in
An important type of information asso- the local schemas.
ciated with all FDBS schemas is the map- Two important features of the schema
pings. These correlate schema objects at architecture are how autonomy is preserved
one level with the schema objects at the and how access control is managed. These
next lower level of the architecture. Thus, involve exercising control over schemas at
there are mappings from each external different levels. Two types of administra-
schema to the federated schema over which tive individuals are involved in developing,
it is defined. Similarly, there are mappings controlling, and managing an FDBS:
from each federated schema to all of the
export schemas that define it. The map- l A component DBS administrator (com-
pings may either be stored as a part of the ponent DBA) manages a component
schema information or as distinct objects DBS. There is one component DBA5 for
within the FDBS data dictionary (which each component DBS. The local, com-
also stores schemas). The amount of dic- ponent, and export schemas are con-
tionary information needed to describe a trolled by the component DBAs of the
schema object in one type of schema may respective component DBSs. A key man-
be different from that needed for another agement function of a component DBA
type of schema. For example, the descrip-
tion of an entity type in a federated schema
may include the names of the users that ’ Here a database administrator is a logical entity. In
reality, multiple authorized individuals may play the
can access it, whereas such information is role of a single (logical) DBA, or the same individual
not stored for an entity type in a compo- may play the role of the component DBA for multiple
nent schema. The types of schema objects component DBSs.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 203

is to define the export schemas that spec- mas” in Heimbigner and McLeod [1985]),
ify the access rights of federation users defining a view using a set of operators
to access different data in the component (e.g., defining “superviews” in Motro
databases. and Buneman [1981]), or defining a view
. A federation DBA defines and manages a using a query in a multidatabase lan-
federated schema and the external sche- guage ([Czejdo et al. 1987; Litwin and
mas related to the federated schema. Abdellatif 19861; see Section 5.1). In a
There can be one federation DBA for tightly coupled FDBS, it takes the form of
each federated schema or one federation schema integration ([Batini et al. 19861; see
DBA for the entire FDBS. Each federa- Section 4.4).
tion DBA in a tightly coupled FDBS is a A typical process of developing federated
specially authorized system administra- schemas in a loosely coupled FDBS is as
tor and is not a federation user. In a follows. Each federation user is the admin-
loosely coupled FDBS, federated schemas istrator of his or her own federated schema.
are defined and maintained by the users, First, a federation user looks at the avail-
not by the system-assigned federation able set of export schemas to determine
DBA. This is further discussed in Sec- which ones describe data he or she would
tion 2.1. like to access. Next, the federation user
defines a federated schema by importing
2. SPECIFIC FEDERATED DATABASE the export schema objects by using a user
SYSTEM ARCHITECTURES interface or an application program or by
defining a multidatabase language query
The architecture of an FDBS is primarily that references export schema objects. The
determined by which schemas are present, user is responsible for understanding the
how they are arranged, and how they are semantics of the objects in the export sche-
constructed. In this section, we begin by mas and resolving the DBMS and semantic
discussing the loosely coupled and tightly heterogeneity. In some cases, component
coupled architectures of our taxonomy in DBMS dictionaries and/or the federated
additional detail. Then we discuss how sev- DBMS dictionary may be consulted for ad-
eral alternate architectures can be derived ditional information. Finally, the federated
from the five-level schema architecture by schema is named and stored under account
inserting additional basic components, re- of the federation user who is its owner. It
moving all basic components of a specific can be referenced or deleted at any time by
type, and arranging the components of the that federation user.
five-level schema architecture in different A typical scenario for the administration
ways. We then discuss assignment of com- of a tightly coupled FDBS is as follows. For
ponents to computers. Finally, we briefly simplicity, we assume single (logical) fed-
discuss four case studies. eration DBA for the entire tightly coupled
FDBS. Export schemas are created by ne-
2.1 Loosely Coupled and Tightly Coupled gotiation between a component DBA and
FDBSs the federation DBA; the component DBA
has authority or control over what is in-
With the background of Section 1, we dis-
cluded in the export schemas. The federa-
cuss distinctions between the loosely cou-
pled and tightly coupled FDBSs in more tion DBA is usually allowed to read the
detail. component schemas to help determine
what data are available and where they are
located and then negotiate for their access.
2.1.1 Creation and Administration of Federated
The federation DBA creates and controls
Schemas
the federated schemas. External schemas
The process of creating a federated schema are created by negotiation between a fed-
takes different forms. In a loosely coupled eration user (or a class of federation users)
FDBS, it typically takes the form of schema and the federation DBA who has the
importation (e.g., defining “import sche- authority over what is included in each

ACM Computing Surveys, Vol. 22, No. 3, September 1990


204 l Amit Sheth and James Larson

external schema. It may be possible to in- eration DBA [Litwin and Abdellatif
stitute detailed and well-defined negotia- 19861.
tion protocols as well as business rules (or
some types of constraints) for creating, An example of multiple semantics is as
modifying, and maintaining the federated follows. Suppose that there are two export
schemas. schemas, each containing the entity SHOE.
Based on how often the federated sche- The colors of SHOE in one component
mas are created and maintained as well as schema, schemal, are brown, tan, cream,
on their stability, an FDBS may be termed white, and black. The colors of SHOE in
dynamic or static. Properties of a dynamic the other component schema, schema2, are
FDBS are as follows: (a) A federated brown, tan, white, and black. Users defin-
schema can be promptly created and ing different federated schemas may define
dropped; (b) there is no predetermined pro- different mappings that are relevant to
cess for controlling the creation of a feder- their applications. For example,
ated schema. As described above, defining
a federated schema in a loosely coupled l User1 maps cream in his federated sche-
FDBS is like creating a view over the sche- mas to cream in schema1 and tan in
mas of the component DBSs. Since such a schema2,
federated schema may be managed on the l User2 maps cream in her federated
fly (created, changed, dropped easily) by a schema to tan or cream in schema1 and
user, loosely coupled FDBSs are dynamic. tan or white in schema2.
A tightly coupled federation is almost al-
ways static because creating a federated Proponents of the loosely coupled archi-
schema is like database schema integration. tecture argue that a federated schema cre-
A federated schema in a tightly coupled ated and maintained by a single federation
FDBS evolves gradually and in a more con- DBA is utopian and totalitarian in nature
trolled fashion. [Litwin 1987; Rusinkiewicz 19871. We feel
that a loosely coupled approach may be
better suited for integrating a large number
2.1.2 Case for Loosely Coupled FDBS of very autonomous read only databases
A loosely coupled FDBS provides an inter- accessible over communication networks
face to deal with multiple component (e.g., public databases of the types dis-
DBMSs directly. A typical way to formulate cussed by Litwin and Abdellatif [ 19861).
queries is to use a multidatabase language User management of federated schemas
(see Section 5.1). This architecture has the means that the FDBMS can do little to
- following advantages: optimize queries. In most cases, however,
the users are free to use their own under-
l A user can precisely specify relationships standing of the component DBSs to design
and mappings among objects in the ex- a federated schema and to specify queries
port schema. This is desirable when the to achieve good performance.
federation DBA is unable to specify the
mappings in order to integrate data in
multiple databases in a manner meaning- 2.1.3 Case for Tightly Coupled FDBS
ful to the user’s precise needs [Litwin The loosely coupled approach may be ill
and Abdellatif 19861. suited for more traditional business or cor-
l It is also possible to support multiple porate databases, where system control (via
semantics since different users can im- DBAs that represent local and federation
port or integrate export schemas differ- level authories) is desirable, where the users
ently and maintain different mappings are naive and would find it difficult to
from their federated schemas to export perform negotiation and integration them-
schemas. This can be a significant advan- selves, or where location, distribution, and
tage when the needs of the federation replication transparencies are desirable.
users cannot be anticipated by the fed- Furthermore, in our opinion, a loosely

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 205

coupled FDBS is not suitable for update grated to develop a single federated schema.
operations. Updating in a loosely coupled Sometimes an organization will insist on
FDBS may degrade data integrity. When a having a single federated schema (also
user of a loosely coupled FDBSs creates called enterprise schema or global concep-
a federated schema using a view definition tual schema) to have a single point of con-
process, view update transformations are trol for all data sharing in the organization
often not determined. The users may not across the component DBS boundaries. Us-
have complete information on the compo- ing a single federated schema helps in de-
nent DBSs and different users may use fining uniform semantics of the data in the
different semantic interpretations of the FDBS. With a single federated schema, it
data managed by the component DBSs (i.e., is also easier to enforce constraints that
loosely coupled FDBSs support multiple cross export schemas (and hence multiple
semantic interpretations). Thus different databases) then when multiple federated
users can define different federated sche- schemas are allowed.
mas over the same component DBSs, and Because one federated schema is created
different transformations may be chosen by integrating all export schemas and be-
for the same updates submitted on different cause this federated schema supports data
federated schemas. Similar problems can requirements of all federation users, it may
occur in a tightly coupled FDBS with mul- become too large and hence difficult to
tiple federations but can be resolved at the create and maintain. In this case, it may
time of federated schema creation through become necessary to support external sche-
schema integration. A federation DBA cre- mas for different federation users.
ating a federated schema using a schema A tightly coupled FDBS with multiple
integration process can be expected to have federations allows the tailoring of the use
more complete knowledge of the compo- of the FDBS with respect to multiple
nent DBSs and other federated schemas. classes of federation users with different
In addition to the update transformation data access requirements. Integrations of
issue, transaction management issues need the same set of schemas can lead to differ-
to be addressed (see Section 5.4). ent integrated schemas if different seman-
A tightly coupled FDBS provides loca- tics are used. Thus this architecture can
tion, replication, and distribution transpar- support multiple semantics, but the seman-
ency. This is accomplished by developing a tics are decided upon by the federation
federated schema that integrates multiple DBAs when defining the federated schemas
export schemas. The transparencies are and their mappings to the export schemas.
managed by the mappings between the fed- A federation user can select from among
erated schema and the export schemas, and multiple alternative mappings by selecting
a federation user can query using a classical from among multiple federated schemas.
query language against the federated When an FDBS allows updates, multiple
schema with an illusion that he or she is semantics could lead to inconsistencies. For
accessing a single system. A loosely coupled this reason, federation DBAs have to be
system usually provides none of these very careful in developing the federated
transparencies. Hence a user of a loosely schemas and their mappings to the export
coupled FDBS has to be sophisticated to schemas. Updates are easier to support in
find appropriate export schemas that can tightly coupled FDBSs where DBAs care-
provide required data and to define map- fully define mappings than in a loosely
pings between his or her federated schema coupled FDBS where the users define the
and export schemas. Lack of adequate se- mappings.
mantics in the component schemas make
this task particularly difficult. Let us now 2.2 Alternative FDBS Architectures
discuss two alternatives for tightly coupled
FDBSs in more detail. In this section, we discuss how processors
In a tightly coupled FDBS with a single and schemas are combined to create various
federation, all export schemas are inte- FDBS architectures.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


206 . Amit Sheth and James Larson
2.2.1 A Complete Architecture of a Tightly component schemas. Mermaid [Temple-
Coupled FDBS ton et al. 1987b] falls into this category.‘j
An architecture of a tightly coupled FDBS, No filtering processors or export
shown in Figure 11, consists of multiple schemas: All of the component schemas
basic components as described below. are integrated into a single federated
schema resulting in a tightly coupled sys-
Multiple
l export schemas and filter- tem in which component DBAs do not
ing processors: Any number of exter- control what users can access. This ar-
nal schemas can be defined, each with its chitecture fails to support component
own filtering processor. Each external DBS autonomy fully. UNIBASE [Brze-
schema supports the data requirements zinski et al. 19841 is in this category, and
of a single federation user or a class of hence it is classified as a nonfederated
federation users. system.
l Multiple federated schemas and con- No constructing processor: The user
structing processors: Any number of or programmer performs the constructing
federated schemas can be defined, each process via a query or application pro-
with its own constructing processor. Each gram containing references to multiple
federated schema may integrate different export schemas. The programmer must
export schemas (and the same export be aware of what data are available in
schema may be integrated differently in each export schema and whether data are
different federated schemas). replicated at multiple sites. This archi-
l Multiple export schemas and filter- tecture, classified as a loosely coupled
ing processors: Multiple export sche- FDBS, fails to support location, distri-
mas represent different parts of a bution, and replication transparencies. If
database to be integrated into different data are copied or moved between com-
federated schemas. A filtering processor ponent databases, any query or applica-
associated with an export schema sup- tion using them must be modified.
ports access control for the related com- In practice, two processors may be com-
ponent schema. bined into a single module, or two schemas
. Multiple component schemas and may be combined into a single implemen-
transforming processors: Each com- tation schema. For example, a component
ponent schema represents a different schema and its export schemas are fre-
component database expressed in the quently combined into a single schema with
CDM. Each transforming processor a single processor that performs both trans-
transforms a command expressed on the formation and filtering.
associated component schema into one or
more commands on the corresponding 2.2.3 Architectures with Additional Basic
local schema. Components
There are several types of architectures
2.2.2 Architectures with Missing Basic with additional components that are exten-
Components sions or variations of the basic components
There are several architectures in which all of the reference architecture. Such compo-
of the processors of one type and all sche- nents enhance the capabilities of an FDBS.
mas of one type are missing. Several ex- Examples of such components include the
amples follow. following:
l No transforming processors or com- l Auxiliary schema: Some FDBSs have
ponent schemas: All of the local sche- an additional schema called an auxiliary
mas are described in a single data model.
In other words, the FDBS does not sup- ‘Its design, however, has provisions to store model
port component DBSs that use different transformation information and attach a transforming
data models. Hence there is no need for processor.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 207
schema that stores the following types of
information: “‘“““‘;” Schema)
Data needed by federation users but
not available in any of the (preexisting)
component DBSs.
Information needed to resolve incom-
patibilities (e.g., unit translation tables,
format conversion information).
. Statistical information helpful in per-
forming query processing and optimi-
Figure 13. Using an auxiliary schema to store trans-
zation. lation information needed by a constructing processor.
Multibase [Landers and Rosenberg
19821 describes the first two types of
information in its auxiliary schema, This, however, can limit or conflict with
whereas DQS [Belcastro et al. 19881 de- the autonomy of the component DBSs.
scribes the last two types of information
in its auxiliary schema. Mermaid [Tem-
pleton et al. 1987133 describes the third 2.2.4 Extended Federated Architectures
type of information in its federated To allow a federation user to access data
schema. As illustrated in Figure 13, the from systems other than the component
auxiliary schema and the federated DBSs, the five-level schema architecture
schema are used by constructing proces- can be extended in additional ways.
sors. It is also possible to consider the
auxiliary schema to be a part (or sub- l Atypical component DBMS: Instead
schema) of a federated schema. of a typical centralized DBMS, a com-
. Enforcing constraints among com- ponent DBMS may be a different type of
ponent schemas: As illustrated in Fig- data management system such as a file
ure 14, an FDBS can have a filtering server, a database machine, a distributed
processor in addition to a constructing DBMS, or an FDBMS. OMNIBASE uses
processor between a federated schema a distributed DBMS as one of its com-
and the component schemas. The filter- ponent DBMSs [Rusinkiewicz et al.
ing processor enforces constraints that 19891. Figure 15 illustrates how one
span multiple component schemas. The FDBS can act as a backend for another
constructing processor, as discussed be- FDBS. By making local schema A2 of
fore, transforms a query into subqueries FDBS A the same as external schema B2
against the component schemas of the of FDBS B, the component DBS A2 of
component DBSs. Integrity constraints FDBS A is replaced by FDBS B.
may be stored in an external schema or l Replacing a component database by
a federated schema. The constraints may a collection of application pro-
involve data represented in multiple ex- grams: It is conceptually possible to re-
port schemas. The filtering processor place some database tables by application
checks and modifies each update request programs. For example, a table contain-
so when data in multiple component da- ing pairs of equivalent Fahrenheit and
tabases are modified, the intercomponent Celsius values can be replaced by a pro-
constraints are not violated. This capa- cedure that calculates values on one scale
bility is appropriate in a tightly coupled given values on the other. A collection of
system in which constraints among mul- conversion procedures can be modeled
tiple component databases must be en- by the federated system as a special-
forced. An early description of DDTS component database. A special-access
[Devor et al. 1982aJ suggested enforce- processor can be developed that accepts
ment of semantic integrity constraints requests for conversion information and
spanning components in this manner. invokes the appropriate procedure rather

ACM Computing Surveys, Vol. 22, No. 3, September 1990


208 l Amit Sheth and James Larson

Integrity Constraints in
External/Federated Schema 1

1 Filtering processor ]

I
Constructing
Processor

Figure 14. Using a filtering processor to enforce constraints across


export schemas.

than access a stored database. Navathe larger granularity and allocate them as
et al. [1989] discuss a federated architec- desired. For example, DDTS [Dwyer and
ture being developed to provide access Larson 19871 defines two types of modules:
to databases as well as application Application Processor and Data Processor
programs. (Figure 17). An Application Processor in-
cludes a federated schema with the associ-
2.3 Allocating Processors and Schemas to
ated constructing processor and all the
Computers
external schemas defined over the feder-
ated schema with the associated filtering
It is possible to allocate all processors and processors and transforming processors (if
schemas to a single computer, perhaps to present). A Data Processor includes a local
allow federation users to access data man- schema, a component schema, and the as-
aged by multiple component DBSs on that sociated transforming processor and all ex-
computer. Usually, however, different com- port schemas defined over the component
ponent DBSs reside on different computers schema with associated filtering processors.
connected by a communication system. Dif- An Application Processor performs the
ferent allocations of the FDBS components user interface and distributed transaction
result in different FDBS configurations. management and coordination functions
Figure 16 illustrates the configuration of and is located at every site at which there
a typical FDBS. A general-purpose com- are federation users. A Data Processor per-
puter at site 1 supports a single component forms the data management functions re-
DBS and two federation schemas for two lated to the data managed by a single
different classes of federation users. Site 2 component DBS and is located at every site
is a workstation that supports two export at which a component DBS is located. A
schemas, each containing different data for site can have either or both of the two
use by different federation users. Site 3 is modules. Mermaid [Templeton et al.
a small workstation that supports a single 1987b] divides the processors and the sche-
federation user and no component DBS. mas into four types of modules of smaller
Site 4 is a database computer that has one granularity.
component DBS but supports no federation Special communication processors can
users. also be placed on each computer to enable
It may be desirable to group a related set processors on two different sites to com-
of processors and schemas into modules of municate with each other. Communication

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 209

FDBS A

I
Component Component
DBS Al DBS An

0)
External
Schema B 1

FDBS B

Figure 15. FDBS B acting as a back end to FDBS A.

processors are not shown in our reference federated schema called the Global Repre-
architecture. They are placed between any sentation Schema, which is expressed in
pair of adjacent processors that are allo- the relational data model. It has an external
cated to different computers. schema called the Conceptual Schema rep-
resented in the Entity-Category-Relation-
2.4 Case Studies ship (ECR) model [Elmasri et al. 19851.
Users formulate requests directly against
In this section we relate the terms and
the Conceptual Schema in the GORDAS
concepts of the reference architecture to
query language [Elmasri 19811. The ECR
those used in four example FDBSs. Our
data model is rich in semantics (e.g., it
purpose is not to survey these systems
shows cardinality and operation con-
[Thomas et al. 19901 but to show how the
straints on an entity’s participation in re-
reference architecture can be used to rep-
lationships). The transforming part of the
resent the architectures of various FDBSs
Translation and Integrity Control proces-
uniformly. This uniform representation
sor is responsible for translating requests
can greatly simplify the task of studying
written in GORDAS on the ECR data
and comparing these systems.
model into the internal form of a relational
2.4.1 DOTS
query language against the Global Repre-
sentational Schema. The filtering part
Figure 17 illustrates the original architec- of the Translation and Integrity Control
ture of DDTS [Devor et al. 1982a] using processor is responsible for modifying
the terminology of the reference architec- each query, so when it is processed, the
ture (to the left of each colon) and the constraints specified in the Conceptual
terminology used by DDTS (to the right of Schema will be enforced. For example, a
each colon and in italics). It has a single GORDAS query that deletes a record will

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Site 1 Site 2 Site 3

External Schema External Schema (External Schema 1


External Schema
I I L /
Filtering Processor Filtering Processor I
1 Filtering, Processor 1
I I
Federated Schema Federated Schema I

f Federated Schema 3
I I I

onstructing Processor Constructing Processor Constructing Processor


Constructing Processor

Site 4
\

(Export Schema
I

1 Filtering Processor 1 Filtering Processor


I
(Component Schema 3 (Component Schema

1 TransforTing Proce: 1
I

(Local (Local Schema

1 Component DBS 1 1 Component DBS 1 1 Component DBS ]

Figure 16. Typical FDBS system configuration.


Federated Database Systems 211

Application Processor

F(T)

Federated: Global
Representational

Constructing:
-1

-1
I .
a Processor sor

Component I: Local Component n: Local


Representational Representational
Schema I
I
Transforming 1: Transforming n:
Local Operations Local Operations
Module I Module n
I I

I
Component DBMS 1: Component DBMS n:
IDS.2 Wet works RAM (Relational)
I I

Figure 17. DDTS architecture.

be modified so that it verifies that deleting There are two separate processors in
the record will not violate any semantic DDTS’s constructing processor, reflecting
integrity constraints before the record is the decision to separate distributed query
actually deleted. DDTS has since been ex- optimization from distributed query exe-
tended to support external schemas [Dwyer cution. The Materialization and Access
and Larson 19871 expressed in the rela- Planning component generates a distrib-
tional data model defined over the Global uted execution strategy consisting of sets
Representation Model. SQL is used to of commands, each expressed in terms
query such external schemas. of one of the Local Representational

ACM Computing Surveys, Vol. 22, No. 3, September 1990


212 l Amit Sheth and James Larson

ederatedl: DAPLEX Global vleW I


I
ransforming: Transformer

eratedl : DAPLEX Global View 1


ressedusing local terms
I
Transforming: GlobalQuery Optimizer
Constructing: Decomposer
Transforming: Filter
Constructinal Monitor

omponent 1: Component n:
DAPLEXLocal
I DAPLEXLocal
Schema n I
I

Transforming:
LocalOptimizer LocalOptimizer
Transforming:
Translator
I

l*rrn
I
Component
DBMS n:
Relational
I

Figure 18. Multibase architecture.

(component) schemas. The distributed ex- 2.4.2 Multibase


ecution strategy can be saved for future Figure 18 illustrates the architecture of
execution or can be passed immediately to Multibase, again using the terminology of
the Distributed Execution Monitor. The the reference architecture and the termi-
Distributed Execution Monitor is respon- nology used by Landers and Rosenberg
sible for executing the distributed execu- [1982]. The federated schema is expressed
tion strategy, coordinating the execution of in a functional data model called DAPLEX.
the sets of commands, and returning the The Transformer modifies global queries
results to the user. by inserting references to Local and Aux-
The Local Operations (transforming) iliary schemas. The modified global query
Module accepts a set of commands and is then processed by a series of processors
transforms it into a form that can be that generates sequences of DAPLEX
executed by the component DBMS. Orig- single-site queries. These processors
inally there were two CODASYL com- include:
ponent DBMSs. Later a third component
DBMS using the relational data model was l A Global Query Optimizer that produces
added. a global plan,

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 213

xternal 1: Relational I*** r ternal p: Semantic


Subschema 1 Subschema p
I I
Transforming: Relational Transforming: AR/EL
to D/L to D/L
Filtering: Access File/ Filtering: Access File/
Access List Access List

I
Constructing: Optimizer
and Controller

Local Schema 1

Figure 19. Mermaid architecture.

A Decomposer that decomposes the l A Translator converts the DAPLEX


global plan into single-site DAPLEX query to a form the component DBMS
queries, can process.
A Filter that reduces the decomposed
An Auxiliary schema holds additional
queries by removing operations from
data not stored in any component DBMS
them that are not supported by the cor-
responding component DBMSs, and and information needed to resolve incon-
sistencies. Component DBMSs supported
A Monitor that controls the distributed by Multibase include both CODASYL and
execution of the subqueries. relational.
The Optimizer, Decomposer, and Filter
may be cyclically invoked for nested global 2.4.3 Mermaid
queries. Two types of transformations are
performed on each DAPLEX single-site Figure 19 illustrates the architecture of
query: Mermaid, again using the terminology of
the reference architecture and the termi-
l A Local Optimizer determines the opti- nology used by Templeton et al. [1987b].
mal query-processing strategy for the Users may formulate requests using either
single-site DAPLEX queries. SQL on a relational schema or ARIEL, a

ACM Computing Surveys, Vol. 22, No. 3, September 1990


214 l Amit Sheth and James Larson

ternal 11 Federated 1:
DSL Queries / External 1
l.**P
I
Constructing: MRDSM

mponent l/ Local 1:
MRDS Schema I MRDS Schema n
I
Component DBMS 1: freon Component DBMS n:
MRDS I MRDS n
I I

database 1

Figure 20. MRDSM architecture.

user-friendly query language developed at Abdellatif 19871 to facilitate formulating


Systems Development Corporation (now such queries. Figure 20 illustrates the archi-
Unisys), on what is called a Semantic Sub- tecture of MRDSM using the terminology
schema. The federated schema is called a of the reference architecture and the ter-
Global Schema and is expressed in the re- minology of Litwin [ 19851.
lational model. User requests are trans-
formed to the internal command language 3. FEDERATED DATABASE SYSTEM
called the Distributed Intermediate Lan- EVOLUTION PROCESS
guage (DIL). DIL requests are processed by
a constructing processor that produces and There are two approaches to managing dis-
executes a query plan. A transforming pro- tributed data: installing a distributed
cessor (one for each component DBS) DBMS or adding a layer of software above
translates DIL requests into the query lan- existing DBMSs to create an FDBS system.
guage of the component DBMS, interacts In the first approach, installing a distrib-
with the component DBMS, and sends data uted DBMS requires (1) changes and dis-
to other transforming processors (e.g., for ruption of the existing applications because
joins and unions). Component DBMSs in- they do not have the dichotomy of local
clude commercial relational DBMSs on versus global operations, (2) a complete
Sun@ workstations and a database machine change of the organizational structure for
with a minicomputer host [Thomas et al. information management because this ap-
19901. proach does not respect the autonomy of
existing DBSs, and (3) replacement of the
2.4.4 MRDSM
existing centralized DBMSs by a distrib-
uted DBMS.
MRDSM is a loosely coupled FDBS (called The federation approach offers a pre-
a multidatabase system by its originators) ferred evolutionary path. It allows contin-
in which a programmer/user formulates a ued operation of existing applications to
request involving data from component remain unchanged, preserves most of the
DBSs. The system provides a multidata- organizational structure, supports con-
base language called MDSL [Litwin and trolled integration of existing databases,
and facilitates incorporation of new ap-
@Sun is a trademark of Sun Microsystems, Inc. plications and new databases. Although

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 215

existing applications need not be changed Both migration and extension of file sys-
in an FDBS, as the old applications are tems are expensive. Many developers of the
modified, the component databases may be FDBSs elect to use migration because they
standardized, and redundant data (unless do not have access to the source code of the
required for improving availability or ac- file system and because they want their
cess time) may be removed. New applica- data maintained by a commercially avail-
tions may be coded using an external able DBMS. Since existing application pro-
schema defined on a federated schema. grams are affected, however, this activity is
An FDBS evolves through gradual inte- very difficult and often organizationally
gration of a set of interrelated DBSs. It very sensitive.
evolves as the new component databases The second phase of the FDBS evolution,
are added and the existing ones are modi- developing a federated database system, in-
fied. This evolution process can be divided volves creating the component, export, fed-
into three phases: preintegration, develop- erated, and external schemas, defining the
ing a federated database system, and fed- mappings between various schemas, and
erated database system operation. These implementing the associated processors.
phases (or the activities within the phases) This is a critical task for the success of an
need not follow serially from one phase to FDBS. A methodology that can be used to
the next; each phase may be performed manage this phase is discussed in Section
several times, and previous phases may be 3.1. Tasks performed in this phase are dis-
revisited and their results revised. cussed in Section 4.
The preintegration phase deals with the The third phase of the FDBS evolution,
situation in which data reside in files and federated database system operations, in-
are not managed by any DBMS yet need to volves managing and manipulating multi-
be accessed by federation users. Two gen- ple integrated databases using an FDBMS.
eral approaches are possible: The processors support various run-time
FDBS operations (e.g., query processing
l Migrate the files to a DBMS: Files
and transaction management). The proces-
can be migrated to a DBMS by perform-
sors and the federated schemas developed
ing three activities: (1) develop a com-
or generated in the second phase allow the
ponent schema that describes the data
selective, shared, and consistent access to
in the files, (2) load the DBMS with
data stored in the component DBSs. An
data from the files, and (3) modify exist-
FDBMS provides an interface to the users
ing application programs to access the
and applications and may allow execution
DBMS rather than access the files of ad hoc queries. Tasks performed in this
directly.
phase are discussed in Section 5.
l Extend the file system to support
DBMS-like features: By extending 3.1 Methodology for Developing a Federated
the file system to support DBMS-like Database System
features, a file system can be treated as
a component DBMS. In this approach, The methodology discussed here extends
the following activities are performed: the methodology of Sheth [1988b] for de-
(1) create or generate a component veloping schemas for an FDBS to include
schema that describes data in the files, processors. Developing a new FDBS pri-
(2) create backup and recovery facilities marily consists of integrating existing com-
in the file system if the federation users ponent databases. A bottom-up FDBS
will perform updates to data in the file development process can be followed for
system, and (3) create appropriate filter- this purpose. This process can also be used
ing and transforming processors that will for adding a new component database to an
convert commands expressed in the FDBS.
FDBS’s internal language to the com- When new applications are developed us-
mands that can be processed by the file ing an existing FDBS, it is necessary to
system. determine whether the data requirements

ACM Computing Surveys, Vol. 22, No. 3, September 1990


216 l Amit Sheth and James Larson

of the application are supported by a fed- respective component DBSs to author-


erated schema. If they are not, it is neces- ize part of their databases to be in-
sary either to extend a federated schema or cluded in the FDBS based on the
create a new federated schema and either negotiations with the federation DBA.
to extend an existing component database Develop (or identify if one already ex-
or create a new component database. This ists) the appropriate filtering proces-
process is called a top-down FDBS devel- sor.
opment process. It is an extension of the (3) Integrate schemas: Select a related
traditional distributed database design set of export schemas to be integrated
process. In practice, elements of both the and integrate them. Integration of each
bottom-up and top-down processes are set of export schemas will produce one
used to develop an FDBS. federated schema. Develop (or identify
A data dictionary/directory (DD/D) if one already exists) a constructing
often plays an important role in coordinat- processor that would transform the
ing various activities by storing essential commands expressed on federated
information. In addition to storing all sche- schemas into commands expressed on
mas representing information about the the corresponding export schemas.
data managed by the FDBS, a DD/D also This includes generating mappings
stores mappings among schemas, informa- with appropriate distribution informa-
tion about schemas and databases (such as tion. This step is repeated once for each
statistics relevant to query optimization), related set of export schemas and the
schema-independent information (such as corresponding federated schema.
tables and functions for unit/format con- (4) Define external schemas: If neces-
versions or heuristics for query optimiza- sary, define external schemas for each
tion) , and various types of system federation user or class of federation
informations (such as capabilities of each users. Build or identify the necessary
component DBMS, network addresses of filtering and transforming processors.
each system hosting a component DBMS, The transforming processor performs
and communication facility to be used to schema translation if the data model of
communicate with a given system). the external schema is different than
the CDM.
3.1.1 Bottom-Up Development Process
3.1.2 Top-Down Development Process
A bottom-up FDBS development process is
used to integrate several existing databases A top-down FDBS development process is
to develop an FDBS. Figure 21 illustrates used when an FDBS already exists and
the bottom-up process outlined below: additional user requirements (e.g., to sup-
port a new application) are placed on it.
(1) Translate schemas: Translate the Figure 22 illustrates the top-down process
local schema of a component database outlined below:
into a component schema expressed in
the CDM. Generate the mappings be- (1) Define or modify external sche-
tween the objects in the two schemas. mas: Collect federation user require-
Develop (or identify if one already ex- ments and analyze them to define new
ists) the transforming processor that external schemas or extensions to the
can transform commands expressed on existing external schemas.
the component schema into the com- (2) Analyze schemas: Compare relevant
mands expressed on the corresponding federated schemas with the external
local schema. schemas to identify parts of the exter-
(2) Define export schemas: Define ex- nal schemas that are already in the
port schemas from a component federated schema and hence supported
schema. This step is performed by the by the FDBS. If some part of an exter-
administrators (component DBAs) of nal schema is not already supported by

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 217

I Data I

’ I
-Schema - -\

’ &J$&z-;/i
lntegratlon

Schema

----- Schema ---


Translation- -
Local
Schema

Figure 21. Bottom-up FDBS developing process.

the FDBS, a federated schema will need (b) The required data are not imple-
to be extended or developed to include mented in any component data-
that part. We refer to this unsupported base, and a component DBA is
part as a temporary schema (which is willing to place the required data
discarded at the end of the integration in his or her component database.
process). One or more of the compo- In this case, local, component, and
nent databases will have to support the export schemas of the relevant da-
temporary schema. This can be accom- tabases are modified.
plished in one of three ways: (c) The required data are not imple-
mented in any component data-
(a) The required data exist in one or base, and no component DBA is
more component database(s). In willing to place the required data
this case, identify the component in his or her component databases.
schemas containing the descrip- In this case, the temporary schema
tion of required data and negotiate is implemented as a separate da-
with their administrators to have a tabase of an existing component
description of this data placed in DBMS. Alternatively, a new com-
an export schema with appropriate ponent DBMS may be used that
access rights. may require a new transforming

ACM Computing Surveys, Vol. 22, No. 3, September 1990


218 . Amit Sheth and James Larson

Requirements
CollectIon & - -

External Schema External Schema

Physical
-Database
Design

Figure 22. Top-down FDBS developing process.

processor. The temporary schema control, negotiation, and schema integra-


becomes a new component schema. tion. As identified in Section 3.1, these
(3) Integrate schemas: Integrate the tasks are relevant to the development of
temporary schema with the relevant the schemas and are affected by the data
federated schema and discard the tem- requirements of the federation users.
porary schema.
4.1 Schema Translation

4. FEDERATED DATABASE SYSTEM Schema translation is performed when a


DEVELOPMENT TASKS schema represented in one data model (the
source schema) is mapped to an equivalent
Many tasks involved in developing a cen- schema represented in a different data
tralized or a distributed DBS [Ozsu and model (the target schema). Schema trans-
Valduriez 1990, Chapter 5; Teorey 19901 lation is needed in two situations:
are also relevant to developing an FDBS
and can be adapted with minor changes. In l Translating a local schema into a com-
this section, we discuss four tasks that do ponent schema when the DBMS’s native
not typically arise in that context but have data model is different from the CDM.
particular significance for developing an l Translating a part of the federated
FDBS. They are schema translation, access schema into an external schema when

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 219
the external schema is expressed in a data the source schema. These rules specify how
model different than the CDM. each object in the target schema is derived
from objects in the source schema. If the
For example, assume that if the CDM is mapping rules do not have inverses, the
an extended Entity Relationship (EER) view update problem must be solved.
model, then one of the component DBMSs The following is a simplified example of
to be integrated is a relational DBMS and a set of mapping rules that can be used to
another is a CODASYL DBMS, and a class generate a relational schema from a
of federated users wants a relational view. CODASYL schema:
The following are required: (1) translation
of the local schema of the relational DBMS (1) Each record type is mapped to a table
into its equivalent component schema ex- with the same name.
pressed in the EER model, (2) translation (2) Each field in a record type A is mapped
of the local schema of the CODASYL to a column of table A.
DBMS into an equivalent component (3) Each record identifier in record type A
schema expressed in the EER model, and is mapped to a key in table A.
(3) translation of a part of the federated
(4) Each set (one-to-many relationships
schema in the EER model into an exter- between record type A and record type
nal schema expressed in the relational B ) is mapped to a column in table B
model. Two general approaches for per- that contains values from the key of
forming schema translation are discussed table A. This column is called a foreign
below. key.
The first approach develops explicit
mappings between each source and the tar- Consider the CODASYL schema in Fig-
get schema. This approach is appropriate ure 23(a) (from Larson [1983a]). The
when a (class of) federation user(s) requires COMPANY record type has two fields,
specific data structures in the target COMPANY-NAME and CITY. The
schema. The DBA must then specify map- PRODUCT record type has two fields,
pings that transform the source schema to PRODUCT-NAME and COST. COM-
the target schema. There are three cases: PANY-NAME is the unique record identi-
fier for the COMPANY record type, and
All schema objects in the target schema PRODUCT-NAME is the unique record
can be derived from the schema objects identifier for the PRODUCT record type.
in the source schema, and there exist There is a PRODUCES set consisting of
inverses for all of these mappings. The the COMPANY record type as the owner
DBA specifies the mappings and their and the PRODUCT record type as the
inverses. member. The PRODUCES set maintains
All schema objects in the target schema one-to-many relationship between
can be derived from the schema objects EOMPANY and PRODUCT.
in the source schema, but inverses may The relational schema of Figure 23(b)
not exist for the mappings. If the update results when the above mapping rules are
were to be allowed, the DBA must con- applied. The COMPANY table has two
struct a filtering processor to solve the fields, COMPANY -NAME and CITY. The
associated view update task. PRODUCT table has three fields, PROD-
If the target schema contains objects that UCT-NAME, COST, and COMPANY-
cannot be derived from the objects in the NAME. COMPANY-NAME is the key for
source schema, the DBA must modify the the COMPANY table, and PRODUCT-
target schema or construct a constructing NAME is the key for the PRODUCT table.
processor that integrates schema objects The COMPANY-NAME column of
from other schemas to support all of the PRODUCT table is a foreign key; it con-
objects in the target schema. tains only values from the COMPANY-
NAME key field of the COMPANY table.
The second approach develops mapping Zaniolo [1979] developed a tool that au-
rules to generate the target schema from tomatically generates relational schemas

ACM Computing Surveys, Vol. 22, No. 3, September 1990


220 l Amit Sheth and James Larson

PRODUCT
PRODUCT-NAME COST COMPANY-NAME*

(a) (b)

Figure23. Equivalent CODASYL and relational schemas. (a) CODASYL


schema; (b) relational schema. * -Foreign key.

from CODASYL schemas. Elmasri et al. local schema to a component schema. This
[1985] and Teorey et al. [1986] discuss is because a component DBMS is autono-
transformations between extensions of the mous, and the database managed by it
Entity Relation data model [Chen 19761 should not be changed as a result of the
and the relational data model. Lien [1981] translation process. In other words, the
describes mappings from the hierarchical translation should be reversible in the sense
to the relational data model. Tsichritzis that (a) the component schema in the CDM
and Lochovsky [1982] provide a summary should represent the same database repre-
of these types of mappings. sented by the local schema and (b) it should
In practice, the translation task may re- be possible to translate a command on a
quire more than just data model translation component schema into command(s) on the
because the source and the target schemas corresponding local schema. A notion
may not be able to represent exactly the that may be useful in this context is
same semantics, Hence schema translation that of content-preserving transformations
poses two contradictory requirements: [Rosenthal and Reiner 19871.
(1) Capture additional semantics during the
schema translation that can later help in
4.2 Access Control
the tasks of schema integration and view
update, and (2) maintain only the existing An FDBS should be designed to control
semantics because the local schema is not access to component databases by federa-
able to support the additional semantics. tion users. The system architecture of an
These requirements are discussed next. FDBS (shown in Figure 11) has filtering
Using a semantic data model as the CDM processors at two levels. Each can be used
can facilitate representation of additional to provide access control. The filtering pro-
semantics that may be difficult or impos- cessors relating the export and the compo-
sible to specify in a traditional model such nent schemas control access to component
as the relational model. As an example, a DBSs. The filtering processor relating the
generalization relationship with inherit- external and federated schemas controls
ance between two entity types can be ex- access to the federated schemas.7 Negotia-
plicitly represented in a component schema tion between the component and federation
that uses a semantic data model. As an- DBAs may be necessary to reach an agree-
other example, a foreign key between rela- ment on how to control the data a compo-
tions Rl and R2 in a local schema expressed nent DBA wants to keep secure from some
in the relational model may be explicitly of the federation users while allowing ac-
represented as a relationship between the cess to other federation users. Alternately,
two entities representing Rl and R2 in the a new federated schema can be established
corresponding component schema in an
EER model. One, however, should be care- ‘This is similar to using the view mechanism for
ful about the additional semantic informa- access control security in centralized and distributed
tion provided during the translation from a DBMSs [Bertino and Haas 19881.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 221

for use by a selected subgroup of the origi- maid maintains an access file on each com-
nal federation. For example, suppose com- puter for this purpose. Access files control
ponent databases each contain information access to Mermaid’s data dictionary and
about commercial shipping and military directory. Third, Mermaid maintains an
shipping. The following approaches are access list that identifies the external and
possible: federated schemas each user is authorized
to use. Fourth, the user must be registered
Each component DBA includes schema with each computer having a component
objects describing both commercial and DBS that is relevant to the user’s federated
military shipping in their respective ex- schema. In addition, Mermaid provides ex-
port schemas. This information is inte- tensive run-time support for access control,
grated into a single federated schema. uses data encryption (for passing access
The federation DBA creates two external control information over the network and
schemas, one for accessing commercial for storing it in access files and access lists),
shipping information and one for access- and provides an audit trail and a journal
ing military shipping. In this scenario,
trail for security purposes.
the local DBAs trust the federation DBA One of the heterogeneities that may be
to provide appropriate access controls on encountered in an FDBS is the existence
each external schema.
of different and incompatible mechanisms
Each component DBA generates two ex- for expressing and enforcing access control
port schemas, one containing the schema policies. An access control mechanism used
objects describing commercial shipping by an FDBMS, such as the view mecha-
and the other containing schema objects nism, may also conflict with the autonomy
describing military shipping. Two feder- of the component DBMSs. A solution that
ated schemas are created, one dealing uses preprocessing is discussed by Wang
with commercial shipping and the other and Spooner [1987].
dealing with military shipping. In this
scenario, the component DBAs control 4.3 Negotiation
appropriate access on each export
schema and hence is preferred to the A federation DBA manages federated sche-
previous one. mas. A component DBA manages the ex-
port schemas defined over the component
Other access control and security issues DBS he or she manages. A federation DBA
include the following: and component DBAs must reach an agree-
How users are identified, grouped into ment about the contents of the export sche-
classes, and named for access control se- mas and operations allowed on the export
curity purposes. Templeton et al. [1987a] schemas such that federated schemas can
discuss the issue of installing a new user. be defined over them to support federation
How data are identified, grouped into users. The dialogue between the two admin-
classes, and named for access control se- istrators to reach this agreement is
curity purposes [Abbott and McCarthy called negotiation. To facilitate negotia-
1988; Templeton et al. 1987b]. tion, the administrators follow protocols
governing the messages exchanged during
What operations are used for controlling
a negotiation.
security privileges [Fagin 1978; Larson
In terms of the reference schema archi-
1983b; Selinger and Wade 19761.
tecture, there are two ways to perform ne-
A case study of the access control fea- gotiation. First, a component DBA may
tures of Mermaid [Templeton et al. 1987a] allow the federation DBA to read the com-
is interesting. It uses access control at four ponent schemas he or she controls but does
levels. First, a user must have an account not give any specific rights for data access.
on the computer where the Mermaid user When a federation DBA determines data
interface can be run. Second, the user access requirements, he or she sends a re-
should be registered with Mermaid. Mer- quest to the component DBA to define an

ACM Computing Surveys, Vol. 22, No. 3, September 1990


222 l Amit Sheth and James Larson

export schema with appropriate access ing export schemas in a bottom-up FDBS
rights. This request typically includes in- development process). The tasks are quite
formation on the data to be accessed, the similar and are treated uniformly as
types of access (retrieval or update), and schema integration in this paper.
the federation users who will access the Many approaches and techniques for
data. In some cases, it may also include schema integration have been reported in
requirements on the component DBS such the literature. The survey paper of Batini
as constraints to be satisfied (e.g., how cur- et al. [1986] discusses and compares 12
rent or consistent the data must be and methodologies for schema integration. It
maximum response time), and whether the divides schema integration activities into
federation DBA or user may grant others five steps: preintegration, comparison, con-
some or all of the access rights for this formation, merging, and restructuring. In
export schema [Larson 1983131. The com- the context of FDBS development, prein-
ponent DBA may accept or refuse any part tegration activities involve translation of
of the request for any reason in defining schemas into a CDM so they can be com-
the export schemas. In the second alterna- pared and specification of global con-
tive, all component DBAs may define the straints and naming conventions. The
export schemas a priori and require latter involves activities that may be useful
the access from the FDBMS to be limited in the comparison step (e.g., specifying a
to the contents and access rights contained thesaurus that may be used for identifying
in those export schemas. naming conflicts, homonyms, or syno-
Two of the many aspects of FDBS oper- nyms). Although schema translation has
ations for which negotiation protocols need been studied independently from schema
to be defined are the following: integration, the two tasks are highly inter-
related. Schema translation may be more
l A component DBA decides to withdraw
constrained when integrating existing
access to a schema object or change a
DBSs than during view integration because
schema object in a local schema.
the constraints of the existing data struc-
l A federation user decides that access to a tures cannot be changed.
schema object is no longer needed. The comparison step (also called schema
Heimbigner and McLeod [1985] present analysis) involves two activities: (a) analyz-
negotiation protocols for a multisited dis- ing and comparing the objects of the
tributed dialogue among the DBAs in an schemas (and possibly databases) to be in-
FDBS. Alonso and Barbara [1989] consider tegrated, including identification of naming
the case in which the federated schema is conflicts (e.g., homonym and synonym de-
a materialized view (they call it quasi- tection), domain (i.e., value type) conflicts,
copy). In this context, they explore ways to structural differences, constraint differ-
express the needs of a federation DBA pre- ences, and missing data and (b) specifying
cisely, to determine the degree of sharing the interrelationships among the schema
the component DBA is willing to offer, and objects. The conforming step is closely tied
to estimate the cost of a specific agreement to the comparing step since it is difficult to
for both the component DBS and the compare unless the related information is
FDBS. represented in a similar form in different
schemas.
4.4 Schema integration
Unless the schemas are represented in
the same model, analyzing and comparing
View integration refers to integrating mul- their schema objects is extremely difficult.
tiple user views into a single schema (e.g., It is important to note that comparison of
federated schema development in a top- the schema objects is primarily guided by
down FDBS development process). Schema their semantics. not bv their svntax. Hence
integration refers to integrating (usually ex- the choice of the CDM is critical. The CDM
isting) schemas into a single schema (e.g., should be semantically rich; that is, it
federated schema development by integrat- should provide abstraction and constraints

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 223

so that the semantics relevant to schema E2, and El is-disjoint-and-nonintegra-


integration can be presented. Thus, seman- bleewith E2. Similar relationships exist
tic (or conceptual) data models are much among relationship types. Elmasri et al.
preferred over traditional data models such [1986] and Sheth et al. [1988b] use the
as relational, network, or hierarchical. relationships among attributes in a heuris-
Dayal and Hwang [1984] suggest that the tic algorithm to identify pairs of entity
concept of generalization is important for types and relationship types that may be
schema integration. One reason is that sim- related by the first three types of relation-
ilar or related concepts are represented at ships. The actual relationships are specified
different levels of abstractions in different by the person (the DBA in a tightly coupled
schemas. For example, one schema may FDBS, a user in a loosely coupled FDBS)
contain attributes Office-Phone-Number integrating the schemas. Using classifica-
and Home-Phone-Number, both of which tion in the CANDIDE semantic data model
can be shown as specializations of the at- [Beck et al. 19891 and the relationships
tribute Telephone-Number in another among attributes, Sheth and Gala [1989]
schema. suggest that many of the relationships
Analyzing and comparing schema objects among entity types and relationship types
is followed by specifying the interrelation- can be automatically discovered without
ships among the schema objects. Consider additional human input.
integrating two EER schemas. In this case, In [Sheth et al. 1988b], specification of
there are three types of schema objects: relationships among schema objects is used
entity types, relationship types, and attri- as an input to a lattice generation or merg-
butes. In the methodology discussed by ing activity to generate a single integrated
Elmasri et al. [1986] as well as in several schema. Other efforts, particularly those
other schema integration methodologies, that do not rely on identifying relationships
relationships among attributes in the two among schema objects, as discussed above,
schemas are specified first. Relationships provide a set of operators for the DBA
among other object types (e.g., the entity to correlate or integrate the objects to gen-
types and the relationship types) follow. erate an integrated schema [Motro and
Two attributes, al and a2, may be related Buneman 19811.
in one of the three ways [Larson et al. 1989; One reason why a completely automatic
Sheth and Gala 19891: al is-equivalent-to schema integration process (particularly
a2, al includes (or is-included-in) a2, or for discovering attribute relationships) is
al is-disjoint-with a2. Determining such not possible is because it would require that
relationships can be time consuming and all of the semantics of the schema be com-
tedious. If each schema has 100 entity pletely specified. This is not possible be-
types, and an average of five attributes per cause, among other reasons, (1) the current
entity type, then 250,000 pairs of attributes semantic (or other) data models are unable
must be considered (for each attribute in to capture a real-world state completely,
one schema, a potential relationship with (2) it will be necessary to capture much
each attribute in other schemas should be more information than is typically cap-
considered). Sheth and Gala [1989] argue tured in a schema, and (3) there can be
that this task cannot be automated, and multiple views and interpretations of a
hence we may need to depend on heuristics real-world state; and the interpretations
to identify a small number of attribute pairs change with time. Convent [1986] formally
that may be potentially related by a rela- argues that integrating relational schemas
tionship other than is-disjoint-with. is undecidable.
Two entity types, El and E2, may be Three tools developed to perform schema
related in one of five ways [Elmasri et al. integration are reported in Hayes and Ram
1986; Navathe et al. 19861: El equals E2, [ 19901, Sheth et al. [1988b], and Souza
El includes (or is-included-in) E2, El [ 19861. Sheth et al. [ 1988131,for example,
overlaps (or may-be-integratable-with) describe a forms-based interactive tool to
E2, El is-disjoint-but-integrable-with integrate EER schemas. It accepts the

ACM Computing Surveys, Vol. 22, No. 3, September 1990


224 l Amit Sheth and James Larson

definitions of the schemas to be integrated, type/scale differences. Some systems using


guides the integrator through the process multidatabase languages can define multi-
of defining attribute equivalencies, uses at- ple semantics over the same data (e.g., dy-
tribute equivalencies to rank entity type namic attributes in Litwin and Abdellatif
and relationship type pairs that may be [ 19861). Examples of multidatabase lan-
related heuristically, accepts assertions guages include MDSL [Litwin and Abdel-
about the relationships among the entity latif 19871, MSQL [Litwin et al. 19871,
types and relationship type pairs, checks GSQL [Jacobs 19851, a relational language
for their consistency, and performs the with extended abstract data types [Czejdo
merging task automatically. et al. 19871, and a graphical multidatabase
After schemas have been integrated, it query language [Rusinkiewicz et al. 19891.
may be necessary to decide how to allocate CALIDA [Jacobsen et al. 19881) provides a
the data among multiple component DBSs. menu-based interface to formulate queries
This distribution design task has been stud- against multiple component DBMSs.
ied extensively [Ceri et al. 19871.
5.2 Command Transformation
5. FEDERATED DATABASE SYSTEM
A command transformation processor
OPERATION
translates commands in one language,
The previous section discussed some of the called the source language (or operations in
important tasks for developing an FDBS. one data model, called the source data
Many tasks relevant to the operation (i.e., model) into commands in another lan-
run-time system) of distributed DBMSs are guage, called the target language (or oper-
also relevant to the operation of multida- ations in another data model, called the
tabase systems and FDBSs. In this section, target data model). Converting between
we discuss four tasks that are either specific procedural and nonprocedural languages is
to an FDBS (or a multi-DBMS system) or of particular interest.
are significantly different from similar
tasks in a distributed DBMS. These tasks 5.2.1 Converting a Nonprocedural Language
are application independent. into a Procedural Language
A common example of this type of conver-
5.1 Query Formulation
sion is transforming relational algebraic op-
The same query languages used in cen- erations into a sequence of CODASYL
tralized and distributed DBMSs can programming language statements. Con-
be used for formulating queries in a sider the relational operation join that
tightly coupled FDBS. This is because a causes two tables to be combined. Two
lightly coupled FDBS provides location, rows, one from each table, are joined if their
distribution, and replication transpar- values satisfy a specified Boolean (the join
encies. Most loosely coupled FDBSs pro- condition) condition. Using the example of
vide a multidatabase language to allow a Section 4.1, joining the COMPANY and
federation user to access data from multiple PRODUCT tables where the COMPANY-
component DBSs. A multidatabase lan- NAME of a COMPANY equals the COM-
guage provides functions that are not pres- PANY-NAME of a PRODUCT results in
ent in data manipulation languages used in a single table consisting of columns of both
centralized and distributed DBMSs [Litwin the COMPANY and PRODUCT tables.
et al. 19871. It provides a capability to Using code generation techniques, it is pos-
define federated schemas as views over sible to automatically generate code con-
multiple export schemas (or parts of sisting of two nested loops. The outer loop
component schemas) and to formulate reads successive COMPANY records. The
queries against such a view. In addition the inner loop reads successive PRODUCT rec-
language deals with problems identified in ords in the PRODUCES set (owned by
the discussion on schema integration, such COMPANY) and constructs rows of the
as naming conflicts and data structure/ joined table.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 225

This idea can be generalized as follows. thesizing relational algebraic operations


For each relational operation, R, develop from a sequence of CODASYL program-
an algorithm, A, which generates equiva- ming language statements. This problem is
lent CODASYL code. For each nested op- quite difficult. Demo [ 19831 suggests a data
eration, Rl and R2, develop an algorithm flow analysis approach that may be used
for using the corresponding generation al- for some cases. Solving the general case is
gorithms Al and A2 to generate an equiv- an open research problem. Fortunately, the
alent CODASYL code. For example, if a need for this type of transformation seldom
query involves joining the DISTRIBUTOR arises because a procedural language is sel-
table with the result of joining the COM- dom chosen for the federation level.
PANY and PRODUCT tables, then gen-
erate two nested loops that perform the 5.3 Query Processing and Optimization
join of COMPANY and PRODUCT and
place those two loops inside another loop In a loosely coupled FDBS, the FDBMS
that generates the join of DISTRIBUTOR can support little or no query optimization
table with the result of joining COMPANY (see Section 2.1). In a tightly coupled
and PRODUCT. FDBS, the FDBMS can perform extensive
Another approach involves generating all query optimization. Query processing in-
possible strategies for performing nested volves converting a query against a feder-
relational operations, then selecting the ated schema into several queries against
best strategy. Heuristics can be used to the export schemas (and the corresponding
generate only promising strategies, de- component DBSs) and executing these
creasing the time to perform optimization queries. Query processing in an FDBMS is
at the cost of producing a near-optimal similar to that in a distributed DBMS (see
rather than optimal strategy. Rules for con- e.g., [Yu and Chang 19841 for a survey of
verting one strategy into a better strategy query-processing techniques for distributed
are described by Chu and Hurley [1982] DBMSs). In an FDBMS, however, the
and Ceri and Pelagatti [ 19841. following additional complexities may
An approach taken by some researchers be introduced due to heterogeneity and
is to use an intermediate language that can autonomy:
serve as an umbrella for multiple target The cost of performing an operation may
languages and be able to express most, if be different in different component
not all, of the operations expressed in the DBSs. Due to autonomy of a component
target languages. Piatetsky-Shapiro and DBS, the cost of performing the opera-
Jakobson [1987] define a language called tion in a component DBMS may not be
DELPHI that combines the power of rela- known or may only be approximately
tional algebra with many additional known. In addition, this cost may vary
database operations, including grouping, from time to time as the local system load
sorting, aggregates, and nested queries. changes.
They also describe rule-based transforma-
The component DBMSs may differ in
tions to convert DELPHI to different tar-
their abilities to perform local query
get query languages such as SQL and
optimizations.
fourth-generation languages. In CALIDA
[Jacobson et al. 19881, which is a loosely The system and database operations pro-
coupled FDBS, a user’s interactions with a vided by each of the component DBMSs
menu-based interface results in a query in and the FDBMSs may be different. Ex-
DELPHI, which is then converted into amples of system operations include the
queries in the desired target language. ability to create temporary relations and
the ability to receive and reference data
5.2.2 Converting a Procedural Language into a
from other sites. Examples of database
Nonprocedural Language
operations include the different rela-
tional algebraic operations (join, selec-
An example of converting a procedural lan- tion, etc.) and aggregations. Not all
guage into a nonprocedural language is syn- component DBMSs may support the

ACM Computing Surveys, Vol. 22, No. 3, September 1990


226 . Amit Sheth and James Larson

same operations and aggregations (or Similar trade-offs exist in a distributed


their semantics may vary, e.g., one com- DBMS, but the heterogeneity and auton-
ponent DBMS may remove duplicates omy add considerably to the complexity.
but another may not). Three FDBMS query optimization design
approaches are as follows:
Landers and Rosenberg [1982] discuss
optimization problems and solutions Simple LDIs and GDM: The GDM
adopted for some of the above issues in transforms the global transaction into
Multibase. Mermaid has implemented and the smallest possible local transactions.
tested comprehensive algorithms for query An LDI (and hence a component DBS)
optimization [Chen et al. 19891 that involve receives multiple local transactions for
a dynamically changing network environ- each global transaction. The LDI sends
ment and different processing costs at dif- the result of each local transaction to the
ferent component DBMSs. GDM. The GDM merges all the results.
A query evaluation plan can be formu- Multibase takes this approach. It results
lated as a program in an intermediate task in heavy workload for the GDM and sig-
specification language and then executed. nificant communication between the
DOL (Distributed Operation Language) GDM and the LDIs.
[Rusinkiewicz et al. 19901 is an example of Medium complexity of the GDM and
such a language. Its main functions include LDIs: The GDM transforms the global
invocation of local and remote tasks (proc- transaction into a set of the largest pos-
esses), synchronization of their execution, sible local transactions (one for each rel-
data exchange between tasks (including re- evant component DBS). This reduces the
formatting), and exception handling. The GDM workload and communication be-
commands for the local systems can be tween the GDM and the LDIs. Fewer
embedded in the DOL programs that are local transactions are sent to LDIs, and
forwarded to the local systems and executed the data returned to the GDM is a result
under local control. of processing a more complete local
Query processing in an FDBMS can be transaction).
divided into global processing performed by Complex GDM and LDIs: In this case,
the Global Data Manager (GDM) (a con- GDM generates efficient programs that
structing processor) of an FDBMS and Zocal involve participation of LDIs in the
processing performed by the Local Data- global optimization. Partial results can
base Interface (LDI) (a transforming pro- be sent to the GDM or other LDIs. LDIs
cessor) associate with a component DBS. have to support additional functionalities
Global query processing and optimization like sorting, removing duplicates, and
relate to processing a query or transaction handling and merging temporary files.
submitted by a federation user, called a DDTS and Mermaid take this approach.
global transaction, and dividing it into mul- It results in a further reduction in com-
tiple subtransactions, called local transac- munication and GDM workload. Rusin-
tions, for the LDIs of the appropriate kiewicz and Czejdo [1987] discuss an
component DBSs. Local query processing algorithm that attempts to maximize lo-
and optimization relate to processing a lo- cal processing and minimize data transfer
cal transaction at a single component DBS. between component DBSs.
Each is discussed below.
Global optimization involves evaluating Local optimization involves optimizing
the following trade-offs: execution of local transaction received from
the GDM. An FDBMS provides additional
l The amount of work done by the GDM requirements as well as opportunities for
and the complexity of the LDI, and local optimization as follows:
l The amount of communication and pro- l Query languages for the component
cessing done by different component DBMSs may be different. Thus, opera-
DBSs. tion transformation for different pairs of

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 227

the query language used by the FDBS FDBMS support different concurrency
and the query language provided by a control mechanisms [Gligor and Popescu-
component DBMS may involve different Zeletin 19861. The problem of global dead-
techniques. lock detection must also be addressed
l Physical database organization of each of [Logar and Sheth 19861.
the component DBSs may be different. Several researchers are currently study-
Similarly, criteria for access path selec- ing these problems [Alonso et al. 1987;
tion for different component DBMSs Breitbart and Silberschatz 1988; Elmagar-
may be different. Due to the autonomy mid and Helal 1988; Pu 19871. As argued
of the component DBMS, however, the by Barker and Ozsu [1988] and Du et al.
GDM or an LDI may not have enough [ 19891, however, the published solutions
information on these matters. often make unrealistic and pessimistic as-
sumptions, support a low level of concur-
Onuegbe et al. [1983] and Dayal and rency, and/or sacrifice autonomy in order
Goodman [1982] develop strategies for to obtain higher concurrency.
local query optimization in DDTS and It is unlikely that a theoretically elegant
Multibase, respectively. solution that provides conflict serializabil-
ity without sacrificing performance (i.e.,
5.4 Global Transaction Management
concurrency and/or response time) and
availability exists. One often accepted
The Global Transaction Manager (GTM) trade-off is to limit the functionality of
is responsible for maintaining database concurrency control in favor of preserving
consistency while allowing concurrent up- site autonomy. Examples of this would be
dates across multiple databases. Support- to allow only unsynchronized retrievals,
ing global transaction management in an preclude multisite updates, or perform local
environment with multiple heterogeneous updates off-line. Another approach is to
and autonomous component DBSs is very devise mechanisms that specifically suit the
difficult. This is underlined by the fact that limitations of a given environment and
none of the prototype FDBMSs support provide the required level of consistency.
global transaction management. Nowhere Eliassen and Veijalainen [1987] propose a
does the autonomy of component DBMSs concept of S-Transactions (for semantic
present more problems than in supporting transactions) suited for a banking environ-
updates. ment consisting of a network of highly au-
There are two types of transactions to be tonomous systems. It may be desirable to
managed: global transactions submitted to devise solutions that do not meet the con-
the FDBMS by federation users and local flict serializability criteria but that are
transactions directly submitted to a com- practical and meet a desired level of consis-
ponent DBMS by local users. The basic tency. Du and Elmagarmid [1989] propose
problem in supporting global concurrency a weaker consistency criterion called Quasi-
control is that the FDBMS does not know Serializability8 that works provided there
about local transactions since a component are no value dependencies (e.g., referential
DBMS is autonomous. That is, local wait- integrity constraints) across databases.
for relationships are known only to the Garcia-Molina and Salem [1987] and
transaction manager of the component Alonso et al. [1987] propose a concept of
DBMS. Without knowledge about local as Sagas that provides semantic atomicity but
well as global transactions, it is highly un- does not serialize execution of global trans-
likely that efficient global concurrency con- actions. More work on weaker consistency
trol can be provided. Due to the existence
of local transactions, it is very difficult to
recognize when the execution order differs ‘A Quasi-Serializable schedule is one in which the
local transaction schedules are serializable and the
from the serialization order at any site [Du global transactions are executed serially. We need to
et al. 19891. Additional complications occur understand the practical significance of this and other
when different component DBMSs and the proposed weaker consistency criteria better.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


228 l Amit Sheth and James Larson

criteria and a better understanding of their structured data, often called business data.
significance in practical terms is needed. Recently, however, there has been a great
Techniques to specify and execute the deal of activity for constructing DBMSs
transactions that selectively provide ato- that manage data for the so-called nontra-
micity, isolation, and durability properties ditional applications. These applications
need to be further researched. Little atten- require less structured data in the forms
tion has been given to the problems of fault such as text, image, graphics, and voice.
tolerance, ensuring the integrity of redun- Current database research activities, par-
dant data, and supporting integrity con- ticularly related to object-oriented database
straints across component databases. systems, address centralized management
of these types of data. We now need to
investigate issues in integrating such sys-
6. FUTURE RESEARCH AND UNSOLVED
PROBLEMS
tems [Sheth 1987b, 1988131.
The second significant extension is to
We discussed the concept of federation in create information systems that not only
the context of database systems. A feder- include database systems but also applica-
ated database system is a collection of co- tion programs and expert systems. Such a
operating but autonomous and possibly system may be called a federated knowledge
heterogeneous database systems. A refer- base system. One of the main problems is
ence architecture was used to study various to represent information contents, process-
FDBS architectural alternatives and their ing capabilities, and semantics (including
implications. A methodology for developing behavioral aspects) of programs and expert
FDBSs, particularly the tightly coupled systems adequately. We need to define a
FDBSs, was discussed. Finally, we dis- description that plays a role with respect to
cussed important tasks that need to be an application program that is equivalent
performed in order to develop and operate to the role a schema plays for a database.
FDBSs. One such effort is that of a “capability
Several problems need further research schema” for an application program
and development. They include the [Ryan and Larson 19861. Models for fed-
following: eration also need to be developed for such
environments.
Identifying and representing all seman-
tics useful in performing various FDBS ACKNOWLEDGMENTS
tasks such as schema translation and
schema integration and determining con- We thank the referees, S. March, M. Rusinkiewicz,
tents of schemas at various levels. G. Thomas, and M. Templeton, who have helped us
toward the current version; their time and care is
Lack of software tools to aid in perform- gratefully acknowledged.
ing various FDBS tasks with a high de-
gree of automation and an integrated
REFERENCES
toolset for developing, maintaining, and
managing FDBSs. ABBOTT, K., AND MCCARTHY, D. 1988. Admi-
Lack of adequate transaction manage- nistration and autonomy in a replication-
transuarent distributed DBMS. In Proceedinm of
ment algorithms that provide a specified the 14th International Conference on Very L&g;
level of consistency (i.e., are correct with Data Bases (Aug.), pp. 195-205.
respect to a given consistency criteria) ALONSO, R., AND BARBARA, D. 1989. Negotiating
and fault tolerance at acceptable perfor- data access in federated database systems. In
mance within the heterogeneity and au- Proceedings of the 5th International Conference
on Data Engineering (Feb.), pp. 56-65.
tonomy constraints of an FDBS.
ALONSO, R., GARCIA-M• LINA, H., AND SALEM, K.
How to address management and effi- 1987. Concurrency control and recovery for
ciency issues related to autonomy of com- global procedures in federated database systems.
ponent DBSs. In 9. Bull. IEEE-CS TC Data Em.- 10. 3
(Sept.), 5-11.
The focus of the past activities in FDBSs BARKER, K., AND Ozsu, T. 1988. A survey of issues
has been on databases that store more in distributed heterogeneous database systems.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 229

Tech. Rep. TR 88-9, Univ. of Alberta Edmonton, 243, Goos, G. and Hartmanis, J. Eds. Springer-
Canada. Verlag, New York, pp. 141-156.
BATINI, C., LENZERINI, M., AND NAVATHE, S. CZEJDO, B., RUSINKIEWICZ, M., AND EMBLEY, D.
1986. A comparative analysis of methodologies 1987. An approach to schema integration and
for database schema integration. ACM Comput. query formulation in federated database systems.
Suru. 18, 4 (Dec.), 323-364. In Proceedings of the 3rd International Conference
BECK, H., GALA, S., AND NAVATHE, S. 1989. on Data Engineering (Feb.), pp. 477-484.
Classification as a query processing technique in DATE, C. 1986. An Introduction to Database Systems,
CANDIDE semantic data model. In Proceedings Vol. 1, 4th ed. Addison-Wesley, Reading, Mass.
of the 5th International Conference on Data En- DAYAL, U., AND GOODMAN, N. 1982. Query optimi-
gineering (Feb.), pp. 572-581. zation for CODASYL database systems. In Pro-
BELCASTRO, V., ET AL. 1988. An overview of the ceedings of the ACM SZGMOD Conference,
distributed query system D&S. In Proceedings of pp. 13&156.
the International Conference on Extending Data DAYAL, U., AND HWANG, H. 1984. View definition
Base Technolozy (Venice, Italv. Mar.). In Com- and generalization for database integration in a
puter Science.--Vol. 303,’ Sp&ger-V&lag, New multidatabase system. IEEE Trans. Soft. Eng.
York, pp. 170-189. SE-IO, 6 (Nov.), 628-644.
BERTINO, E., AND HAAS, L. 1988. Views and security DE 1987. Special issue on federated database systems
in distributed database management systems. In (mainly transaction management aspects).
Proceedings of the International Conference on Q. Bull. IEEE-CS TC Data Eng. 10, 3 (Sept.).
Extending Database Technology (Venice, Italy, DEMO, B. 1983. Program analysis for conversion
Mar.). In Computer Science. Vol. 303, Springer- from a navigational to a specification data base
Verlag, New York, pp. 155-169. interface. In Proceedings of the 9th International
BLAKEY, M. 1987. Basis of a partially informed dis- Conference on Very Large Data Bases (Florence,
tributed database. In Proceedings of the 13th In- Italy, Oct.), pp. 387-398.
ternational Conference on Very Large Data Bases DEVOR, C., ELMASRI, R., LARSON, J., RAHIMI, S., AND
(Brighton, UK, Sept.), pp. 381-388. RICHARDSON, J. 1982b. Five-schema architec-
BREITBART, Y ., AND SILBERSCHATZ, A. 1988. ture extends DBMS to distributed applications.
Multidatabase update issues. In Proceedings of Electron. Des. (Mar. 18), 27-32.
the ACM SZGMOD Conference (June), 135-142. DEVOR, C., ELMASRI, R., AND RAHIMI, S. 1982a. The
BRZEZINSKI, Z., GETTA, J., RYBNIK, J., AND STEP- design of DDTS: A testbed for reliable distributed
NIEWSKI, W. 1984. UNIBASE-An integrated database management. Tech. Rep. HP-82-273:
access to databases. In Proceedings of the 10th 17-38, Honeywell Computer Sciences Center,
International Conference on Very Large Data Camden, Minn.
Bases (Singapore, Aug.), pp. 388-395. DP 1988. Special issue on heterogeneous distributed
CARDENAS, A. 1987. Heterogeneous distributed da- database systems. L. Lilien, Ed. D&rib. Process.
tabase management: The HD-DBMS. In Proc. Tech. Comm. News. (Q.) 10, 2 (Nov.).
ZEEE 75,5 (May), 588-600. Du, W., AND ELMAGARMID, A. 1989. Quasi serializ-
CERCONE, N., MORGESTERN, M., SHETH, A., AND ability: A correctness criterion for global concur-
LITWIN, W. 1990. Resolving semantic hetero- rency control in interbase. In Proceedings of the
geneity. Panel at the International Conference on 15th International Conference on Very Large Data
Data Engineering (Feb.). Bases (Amsterdam, Aug.), pp. 347-355.
CERI, S., AND PELAGATTI, G. 1984. Distributed Du, W., ELMAGARMID, A., AND KIM, W. 1990.
Databases-Principles and Systems. McGraw- Effects of local autonomy on heterogeneous dis-
Hill, New York. tributed database systems. MCC Tech. Rep.
CERI, S., PERNICI, B., AND WIEDERHOLD, G. 1987. ACT-OODS-EI-059-90, Microelectronics and
Distributed database design methodologies. In Computer Technology Corp., Austin Tex.
Proc. IEEE 75, 5 (May), 533-546. Du, W., ELMAGARMID, A., LEU, Y., AND OSTERMANN,
CHEN, P. 1976. The entity-relationship model: S. 1989. Effects of local autonomy on global
Toward a unified view of data. ACM Trans. Da- concurrency control in heterogeneous database
tabase Syst. 1, 1 (Mar.), 9-36. systems. In Proceedings of the 2nd International
CHEN, A., BRILL, D., TEMPLETON, M., AND Yu, C. Conference on Data and Knowledge Systems for
1989. Distributed query processing in Mermaid: Manufacturing and Engineering (Oct.).
A frontend system for multiple databases. IEEE DWYER, P., AND LARSON, J. 1987. Some experiences
J. Selected Areas Commun. 7, 3 (Apr.), 390-398. with a distributed database testbed system. In
CHU, W., AND HURLEY, P. 1982. Optimal query pro- Proc. IEEE 75, 5 (May), 633-647.
cessing for distributed databases systems. IEEE ELIASSEN, F. AND VEIJALAINEN, J. 1987. An S-
Trans. Comput. C-31 (Sept.), 835-850. transaction definition language and execution
CONVENT, B. 1986. Unsolvable problems related to mechanism. Tech. Rep. No. 275, GMD, Harden-
the view integration approach. In Proceedings of bergplatz, D-1000 Berlin 12, FRG.
the Znternational Conference on Database Theory ELIASSEN, F., AND VEIJALAINEN, J. 1988. A func-
(Rome, Italy, Sept.). In Computer Science, Vol. tional approach to information system interoper-

ACM Computing Surveys, Vol. 22, No. 3, September 1990


230 l Amit Sheth and James Larson
ability. In Research into Networks and Distributed MIT/LCS/TM-141, Massachusetts Institute of
Applications (Proceedings of the EUTECO ‘88), Technology, Cambridge, Mass.
Speth, R. Ed., Elsevier Science Publishers B.V., HAMMER, K., AND TIMMERMAN, T. 1989.
North-Holland, pp. 1121-1135. Automating data conversion between heteroge-
ELLINGHAUS, D., HALLMANN, M., HOLTKAMP, B., neous databases. Tech. Rep. ACA-ST-046-89,
AND KREPLIN, K. 1988. A multidatabase system Microelectronics and Computer Technology
for transaction autonomy. In Proceedings of the Corp.
International Conference on Extending Database HAYES, S., AND RAM, S. 1990. Multi-user view in-
Technology (Venice, Italy, Mar.). In Computer tegration system (MUVIS): An expert system for
Science, Vol. 303, Springer-Verlag, New York, view integration. In Proceedings of the 6th Znter-
pp. 600-605. national Conference on Data Engineering (Feb.).
ELMAGARMID, A. 1987. When will we have true het-
HEIMBIGNER, D., AND MCLEOD, D. 1985. A feder-
erogeneous databases? (A position statement on
ated architecture for information management.
Transaction Processing). In Proceedings of the
ACM Trans. Off. Znf. Syst. 3, 3 (July), 253-278.
Fall Joint Computer Conference (Dallas, Tex.,
Oct.), p. 746. HULL, R., AND KING, R. 1987. Semantic database
ELMAGARMID, A., AND HELAL, A. 1988. Supporting modeling: Survey, applications, and research is-
updates in heterogeneous distributed database sues. ACM Comput. Suru. 19,3 (Sept.), 201-260.
systems. In Proceedings on the International Con- IEEE 1987. Special issue on distributed database
ference on Data Engineering (Feb.). systems. Proc. IEEE 75, 5 (May).
ELMASRI, R. 1981. GORDAS: A data definition, IISS 1986. The integrated information support sys-
query and update language for the entity-cate- tem. Gateway 2, 2. Industrial Technology Insti-
gory-relationship model of data. Tech. Rep. HR- tute (Mar.-Apr.).
81-250, Honeywell Inc., Camden, Minn. JACOBS, B. 1985. Applied Database Logic II: Hetero-
ELMASRI, R., AND NAVATHE, S. 1989. Fundamentals geneous Distributed Query Processing. Prentice-
of Database Systems. Benjamin/Cummings, Red- Hall, Englewood Cliffs, N.J.
wood City, Calif.
JACOBSON, G., PIATETSKY-SHAPIRO, G., LAFOND, C.,
ELMASRI, R., LARSON, J., AND NAVATHE, S. 1986. RAJINIKANTH, M., HERNANDEZ, J. 1988.
Schema integration algorithms for federated da- CALIDA: A knowledge-based system for inte-
tabases and logical database design. Tech. Rep., grating multiple heterogeneous databases. In Pro-
Honeywell Corporate Systems Development Di- ceedings of the 3rd International Conference on
vision, Camden, Minn. Data and Knowledge Bases (Jerusalem, Israel,
ELMASRI, R., WEELDREYER, J., AND HEVNER, A. June), pp. 3-18.
1985. The category concept: An extension to KAUL, M., DROSTERN, K., AND NEUHOLD, E.
entity-relationship model. Data and Knowledge 1990. ViewSystem: Integrating heterogeneous
Engineering 1 (June). North-Holland, The Neth- information bases by object-oriented views. In
erlands, pp. 75-116. Proceedings of the 6th International Conference
FAGIN, R. 1978. On an authorization mechanism. on Data Engineering (Los Angeles, Calif., Feb.),
ACM Trans. Database Syst. 3, 3, 310-331. pp. 2-10.
GARCIA-M• LINA, H., AND KOGAN, B. 1988. Node KIM, W. 1989. Research directions for integrating
autonomy in distributed systems. In Proceedings heterogeneous databases. In 1989 Workshop on
of the International Symposium on Databases in Heterogeneous Databases (Chicago, Ill., Dec.).
Parallel and Distributed Systems (Austin, Tex.,
LANDERS, T., AND ROSENBERG, R. 1982. An over-
Dec.), pp. 158-166.
view of Multibase. In Distributed Databases,
GARCIA-M• LINA, H., AND SALEM, K. 1987. Sagas. H.-J. Schneider, Ed., North-Holland, The Neth-
In Proceedings of the ACM SZGMOD Conference erlands, pp. 153-184.
(May), pp. 249-259.
LARSON, J. 1983a. Bridging the gap between network
GE, L., JOHANNSEN, W., LAMERSDORF, W., REIN- and relational database management systems.
HARDT, R., AND SCHMIDT, J. 1987. Import and Comput. (Sept.), 82-92.
export of database objects in a distributed envi-
ronment. In Proceedings of the ZFZP WG 10.3 LARSON, J. 1983b. Granting and revoking discre-
Conference on Distributed Processing (Amster- tionary authority. Znf. Syst. 8, 4, 251-261.
dam, Oct.). LARSON, J. 1989. Four reference architectures for
GLIGOR, V., AND LUCKENBAUGH, G. 1984. Inter- distributed database management systems.
connecting heterogeneous database management Computer, Standards and Interfaces, Vol. 8,
systems. Comput. 17, 1 (Jan.), 33-43. pp. 209-221.
GLIGOR, V., AND POPESCU-ZELETIN, R. 1986. LARSON, J., NAVATHE, S., AND ELMASRI, R. 1989. A
Transaction management in distributed hetero- theory of attribute equivalence in databases with
geneous database management systems. Znf. Syst. applications to schema integration. IEEE Trans.
I I, 4,287-297. Softw. Eng. 15, 4 (Apr.), 449-463.
HAMMER, M., AND MCLEOD, D. 1979. On database LIEN, Y. 1981. Hierarchical schemata for relational
management system architecture. Tech. Rep. databases. ACM Trans. Database Syst. 6,48-69.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems 231
LITWIN, W. 1985. An overview of the multidatabase RAM, S., AND CHASTAIN, C. 1989. Architecture of
system MRDSM. In Proceedings of the ACM Na- distributed data base systems. J. Syst. Softw. 10,
tional Conference (Denver, Oct.), pp. 495-504. 2, 77-95.
LITWIN, W., 1987. The future of heterogeneous da- ROSENTHAL, A., AND REINER, D. 1987. Theo-
tabases. In Proceedings of the Fall Joint Computer retically sound transformations for practical da-
Conference (Dallas, Tex., Oct.), pp. 751-752. tabase design. In Proceedings of the 6th Znterna-
LITWIN, W., AND ABDELLATIF, A. 1986. Multi- tional Conference on Entity-Relationship
database interoperability. IEEE Comput. 19, 12 Approach (New York, Nov.), pp. 97-113.
(Dec.), 10-18. RUSINKIEWICZ, M. 1987. Heterogeneous databases:
LITWIN, W., AND ABDELLATIF, A. 1987. An overview Towards a federation of autonomous systems. In
of multidatabase manipulation language MDSL. Proceedings of the Fall Joint Computer Conference
In IEEE Proc. 75, 5 (May), 621-632. (Dallas, Tex., Oct.), pp. 751-752.
LITWIN, W., AND ZEROUAL, A. 1988. Advances in RUSINKIEWCIZ, M., AND CZEJDO, B. 1987. An ap-
multidatabase systems. In Research into Net- proach to query processing in federated database
works and Distributed Applications (Proceedings systems. In Proceedings of the 20th International
of the EUTECO ‘88). Sneth. R.. Ed. Elsevier Conference on System Sciences (Hawaii, Jan.),
Science Publishers’ B.V., fiorth-Holland, pp. 430-440.
pp. 1137-1151. RUSINKIEWICZ, M., OSTERMANN, S., ELMAGARMID,
LITWIN, W., ABDELLATIF, A., NICOL,AS, B., VIGIER, A., AND LOA, K. 1990. The distributed opera-
P., AND ZEROUAL, A. 1987. MSQL: A multida- tion language for specifying multisystem appli-
tabase language. Tech. Rep. 695, INRIA, BP 105, cations. In Proceedings of the 1st International
78153 Le-Chesnay, France. Conference on Systems Integration (Morristown,
LITWIN, W., BOUDENANT, J., ESCULIER, C., FERRIER, N.J., Apr.).
A., GLORIEUX, A., LA CHIMIA, J., KABBAJ, K., RUSINKIEWICZ, M., ELMASRI, R., CZEJDO, B.,
MOULINOUX, C., ROLIN, P., AND STANGRET, C. GEORAKOPOULOUS, D., KARABATIS, G., JA-
1982. SIRIUS Systems for Distributed Data MOUSSI, A., LOA, L., AND Ll, Y. 1989.
Management. In Distributed Data Bases, H.-J. OMNIBASE: Design and implementation of a
Schneider, Ed. North-Holland, The Netherlands, multidatabase system. In Proceedings of the 1st
pp. 311-366. Annual Symposium in Parallel and Distributed
LOGAR, T., AND SHETH, A. 1986. Concurrency con- Processing (Dallas, Tex., May), pp. 162-169.
trol issues in heterogeneous distributed database RYAN, K., AND LARSON, J. 1986. The use of E-R
management systems. Tech. Memo, Honeywell models in capability schemas. In Proceedings of
Computer Sciences Center, Camden, Minn. the 5th International Conference on the Entity-
MOTRO, A., AND BUNEMAN, P. 1981. Constructing Relationship Approach (Dijoin, France, Nov.).
superviews. In Proceeding of the ACM SZGMOD SELINGER, P., AND WADE, B. 1976. An authorization
Conference (May), pp. 54-64. mechanism for a relational database system.
NAVATHE, S., ELMASRI, R., AND LARSON, J. ACM Trans. Database Syst. 1, 3, 242-255.
1986. Integrating user views in database design. SIEGEL, M. 1987. A survey on heterogeneous data-
IEEE Comput. 19, 1 (Jan.), 50-62. base systems. Tech. Note 87-174.1, GTE Labo-
NAVATHE, S., GALA, S., KAMATH, A., KRISHNAMUR- ratories, Waltham, Mass.
THY, A., SAVASERE, A., AND WANG, W. 1989. A SHETH, A. 1987a. Heterogeneous distributed data-
federated architecture for heterogeneous infor- base systems: Issues in integration. The 3rd Zn-
mation systems. In the Workshop on Heteroge- ternational Conference on Data Engineering.
neous Database Systems (Chicago, Ill., Dec.). IEEE Press, Washington, D.C.
ONUEGBE, E., RAHIMI, S., AND HEVNER, A. SHETH, A. 1987b. When will we have true heteroge-
1983. Local query translation and optimization neous database systems? In Proceedings of the
in a distributed system. In AFZPS Conference Fall Joint Computer Conference (Dallas, Tex.,
Proceedings, Vol. 52, National Computer Confer- Oct.), pp. 747-748.
ence, AFIPS Press, pp. 229-239. SHETH, A. 1988a. Managing and integrating un-
Ozsu, M., AND VALDURIEZ, P. 1990. Principles of structured and structured data: Problems of rep-
Distributed Database Systems. Prentice-Hall, resentation, features, and abstraction. In
Englewood Cliffs, N.J. Proceedings of the 4th International Conference
PECKHAM, J., AND MARYANSKI, J. 1988. Semantic on Data Engineering. pp. 598-599.
data models. ACM Comput. Suru. 20, 3 (Sept.), SHETH, A. 1988b. Building Federated Database Sys-
153-190. tems. In D&rib. Process. Tech. Comm. Newsl. 10,
PIATETSKY-SHAPIRO, G., AND JAKOBSON, G. 1987. 2 (Nov.), 50-58.
An intermediate database language and its rule- SHETH, A., AND GALA, S. 1989. Attribute relation-
based transformation to different database lan- ships: An impediment in automating schema in-
guages. Data and Knowledge Engineering 2,1-29. tegration. In the Workshop on Heterogeneous
Pu, C. 1987. Superdatabases: Transactions across Database Systems (Chicago, Ill., Dec.).
database boundaries. In Q. Bull. IEEE-CS TC SHETH, A., LARSON, J., AND WATKINS, E.
Data Eng. 10, 3 (Sept.), 19-25. 1988a. TAILOR: A tool for updating views. In

ACM Computing Surveys, Vol. 22, No. 3, September 1990


232 l Amit Sheth and James Larson
Proceedings of the International Conference on et al. 1986, Brzezinski et al. 1984, Cardenas 1987,
Extending Database Technology (Venice, Italy, Cardenas and Pirahesh 1980, Chung 1990, Deen
Mar.). In Computer Science, Vol. 303, Springer- et al. 1985, Devor et al. 198213,Dwyer et al. 1986,
Verlag, New York, pp. 190-213. Dwyer and Larson 1987, Ferrier and Stangret
SHETH, A., LARSON, J., CORNELLIO, A., AND 1983, Furlani et al. 1983, Ge et al. 1987, Hammer
NAVATHE, S. 1988b. A tool for integrating con- and McLeod 1979, Heimbigner and McLeod 1985,
ceptual schemas and user views. In Proceedings IISS 1986, Jacobson et al. 1988, Landers and
of 4th International Conference on Data Engi- Rosenberg 1982, Litwin 1985, Litwin and Vifier
neering. pp. 176-183. 1987, Litwin et al. 1982, Rajinikanth et al. 1990,
Rusinkiewicz et al. 1989, Smith et al. 1981, Spac-
SOUZA, J. 1986. SIS: A schema integration system. capietra et al. 1982, Stephenson and Main 1986,
In Proceedines of the BNCOD5 Conference. Stocker et al. 1984, Templeton et al. 1983, Tem-
pp. 167-185. - ’ pleton et al. 1987b, Tsubaki and Hotaka 1980.
TEMPLETON, M., LUND, E., AND WARD, P. 1987a.
Pragmatics of access control in Mermaid. In Schema Translation: Elmasri et al. 1985, Jajodia
Q. Bull. IEEE-CS TC Data Eng. 10, 3 (Sept.), and Ng 1983, Kalinichenko 1978, Katz 1980, Klug
1981, Larson 1983a, Lien 1981, Morgestern 1981,
33-38.
Navathe 1980, Navathe and Cheng 1983,
TEMPLETON, M., BRILL, D., CHEN, A., DAO, S., Pelagatti et al. 1978, Senko 1976, Sibley and
LUND, E., MCGREGOR, R., AND WARD, P. Hardgrave 1977, Shoval and Even-Chaime
198713. Mermaid: A front-end to distributed het- 1987, Vassiliou and Lachovsky 1980, Zaniolo
erogeneous databases. In Proc. IEEE 75,5 (May), 1979 (references), Zaniolo 1979, (bibliography).
695-708.
Schema Integration: Batini and Lenzerini 1984,
TEOREY, T. 1990. Database Modeling and Design: Batini et al. 1986, Czejdo et al. 1987, Deen et al.
The Entity-Relationship Approach, Chaps. 8-9. 1987, Effelsherg and Mannino 1984, Elmasri
Morgan Kaufmann, San Mateo, Calif. 1980, Elmasri et al. 1986, Hayes and Ram 1990,
TEOREY, T., YANG, D., AND FRY, J. 1986. A logical Larson et al. 1989. Mannino and Effelsberg 1984,
design methodology for relational databases using Navathe et al. 1986, Sheth et al. 1988b,-Sheth
the extended entity-relationship model. ACM and Gala 1989, Souza 1986.
Comput. Surv. 18, 2 (June), 197-222. Multidatabase Languages: Deen et al. 1987, Ja-
THOMAS, G., et al. 1990. Heterogeneous distributed cobs 1985, Lamersdorf et al. 1987, Litwin and
database systems for production Use. Comput. Abdellatif 1987, Litwin et al. 1988, Piatetsky-
Surv. 22, 3 (Sept.), 237-266. Shapiro and Jakobson 1987, Rusinkiewicz et al.
TSICHRITZIS, D., AND KLUG, A. Eds. 1978. The 1989.
ANSI/XB/SPARC DBMS framework. Inf. Syst. Operation Translation and Optimization: Czejdo
3, 4. et al. 1987, Dayal 1983, Deen et al. 1987, Katz
TSICHRITZIS, D., AND LOCHOVSKY, F. 1982. Data 1980, Onuegbe et al. 1983, Vassiliou and Lachov-
Models, Chap. 14. Prentice-Hall, Englewood sky 1980, Zaniolo 1979 (bibliography).
Cliffs, N.J. Transaction Management: Alonso et al. 1987,
VEIJALAINEN, J., AND POPESCU-ZELETIN, R. 1988. Bernstein and Goodman 1981, Breibart et al.
Multidatabase systems in ISO/OSI environment. 1987, Breitbart and Silberschatz 1988, DE 1987,
In Standards in Information Technology and In- Du and Elmagarmid 1989, Du et al. 1989, Eliassen
dustrial Control, Malagardis, N., and-Williams, and Veiialainen 1987. Elmagarmid and Du 1990.
T., Eds. North-Holland. The Netherlands. , DD.
__ Elmagaimid and Helal 1988, Elmagarmid and
83-97. Leu 1987, Gligor and Popescu-Zeletin 1985, Logar
WANG, C., AND SPOONER, D. 1987. Access control and Sheth 1986, Pu 1987, Thomson 1987,
in a heterogeneous distributed database manage- Veijalainen and Popescu-Zeletin 1986.
ment system. In Proceedings of the 6th Sympo- ADIBA, M., AND DELOBEL, C. 1977. The problem of
sium on Reliability in Distributed Software and the cooperation between different D.B.M.S. In
Database Systems (Mar.). Architecture and Models in Data Base Manage-
Yu, C., AND CHANG, C. 1984. Distributed query pro- ment Systems, Nijssen, G., Ed. North-Holland,
cessing. ACM Comput. Surv. 16, 4, (Dec.), The Netherlands.
399-433. ADIBA, A., AND PROTAL, D. 1978. A cooperative sys-
ZANIOLO, C. 1979. Design of relational views over tem for heterogeneous data base management
network schemas. In Proceedings of the ACM systems. Znf. Syst. 3, 209-215.
SIGMOD Conference, pp. 179-190. BATINI, C., AND LENZERINI, M. 1984. A methodol-
ogy for data schema integration in entity-
relationship model. IEEE Trans. Softw. Eng.
BIBLIOGRAPHY SE-IO, 6 (Nov.).
BELL, D., et al. 1987. MULTI-STAR: A multidata-
Multi-DBMS and Federated Database Systems: base system for health information systems. In
Adiba and Delobel 1977, Adiba and Portal 1978, Proceedings of the 7th International Conference
Belcastro et al. 1988, Bell et al. 1987, Breitbart on Medical Informatic (Rome, Italy, Sept.).

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 233
BREITBART, Y., OLSON, P., AND THOMPSON, G. 1986. FURLANI, C., et al. 1983. The automated manufac-
Database integration in a distributed heteroge- turing research facility of the national bureau of
neous database system. In Proceedings 2nd Inter- standards. In Proceedings of the Summer Com-
national Conference on Data Engineering. pp. puter Simulation Conference (Vancouver, Can-
301-310. ada, July).
BREITBART, Y., SILBERSCHATZ, A., AND THOMPSON, HSIAO, D., AND KAMEL, M. 1989. Heterogeneous
G. 1987. An update mechanism for multidata- databases: Proliferation, issues, and solutions.
base systems. In Q. Bull. IEEE-CS TC Data Eng. IEEE Trans. Knowledge Data Eng. 1, 1,45-62.
10, 3 (Sept.), 12-18. HONG, S., AND MARYANSKI, F. 1988. Database de-
CARDENAS, A., AND PIRAHESH, H. 1980. Data base sign tool generation via software reusability. In
communication in a heterogeneous data base Proceedings of the COMPSAC.
management system network. Inf. Syst. 5,55-79.
JAJODIA, S., AND NG, P. 1983. On representation of
CD 1988. Common Data model*plus. Product liter- relational structures by entity-relationship dia-
ature, Control Data Corp. grams. In Entity Relationship Approach to Soft-
CHUNG, C. 1990. DATAPLEX: An access to heter- ware Engineering, Davis et al., Eds. Elsevier
ogeneous distributed databases. Commun. ACM Science Publishers, New York.
33, 1 (Jan.), 70-80. KALINICHENKO, L. 1978. Data model transforma-
CODASYL, 1971. Database task group of CODA- tion method based on axiomatic data model ex-
SYL programming language committee. Report. tension. In Proceedings of the 4th Very Large Data
(Apr.). Base Conference.
DAYAL, U. 1983. Processing queries over generaliza- KATZ, R. 1980. Database design and translation for
tion hierarchies in a multidatabase system. In multiple data models. Ph.D. dissertation, Memo
Proceedings of the 9th International Conference No. UCB/ERL M80/24, College of Eng., Univ. of
on Very Large Data Bases. California, Berkeley, Calif.
DEEN, S., AMIN, R., AND TAYLOR, M. 1987. Data KLUG, A. 1981. Multiple view, multiple data model
integration in distributed databases. IEEE Trans. support in the CHEOPS database management
Softw. Eng. SE-13, 7. system. Tech. Rep. 418, Computer Science Dept.,
DEEN, S., AMIN, R., OFORI-DWUMFUO, G., AND Univ. of Wisconsin-Madison, Madison, Wise.
TAYLOR, M. 1985. The architecture of a gener-
LAMERSDORF, W., ECKHARDT, H., EFFELSBERG, W.,
alized distributed data base system-PERCI*.
JOHANNSEN, W., REINHARDT, K., AND SCHMIDT,
Comput. J. 28, 3, 209-215.
J. 1987. Database programming for distributed
DWYER, P., KASRAVI, K., AND PHAM, M. 1986. A office systems. In Proceedings IEEE Comp. SOC.
heterogeneous distributed database management Symp. on Office Automation (Gaithersburg, Md.).
system (DDTS/RAM). Tech. Rep. CSC-86-
7:8216, Honeywell Computer Sciences Center, LARSON, J., NAVATHE, S., AND ELMASRI, R. 1989. A
Camden, Minn. theory of attribute equivalence in databases with
applications to schema integration. IEEE Trans.
EFFELSBERG, W., AND MANNINO, M. 1984.
Softw. Eng. 15, 4 (Apr.), 449-463.
Attribute equivalence in global schema design for
heterogeneous distributed databases. Inf. Syst. 9, LITWIN, W., AND VIGIER, P. 1987. New capabilities
3/4. of the multidatabase system MRDSM. In
ELMAGARMID, A., AND Du, W. 1990. A paradigm for HLSUA Forum XLV Proceedings (New Orleans,
concurrency control in heterogeneous distributed Oct.).
database systems. In Proceedings of the 6th Inter- MANNINO, M., AND EFFELSBERG, W. 1984.
national Conference on Data Engineering. (Feb.). Matching techniques in global schema design. In
ELMAGARMID, A., AND LEU, Y. 1987. An optimistic Proceedings of First International Conference on
concurrency control algorithm for heterogeneous Data Engineering (Apr.), pp. 418-425.
distributed database systems. In Q. Bull. IEEE- MORGESTERN, M. 1989. A unifying approach for
CS TC Data Eng. 10, 3 (Sept.), 26-32. conceptual schema to support multiple data
ELMASRI, R. 1980. On the design, use, and integra- models. In Proceedings of the 2nd International
tion of data models. Ph.D. dissertation, Rep. Conference on Entity-Relationship Approach
STAN-CS-80-801, Computer Science Dept., (Washington, D.C., Oct.), pp. 281-299.
Stanford Univ., Stanford. NAVATHE, S. 1980. An intuitive approach to nor-
FERRIER, A., AND STANGRET, C. 1983. Hetero- malize network structured data. In Proceedings of
geneity in the distributed database management the 6th International Conference on Very Large
system SIRIUS-DELTA. In Proceedings of the Data Bases.
8th Very Large Data Base Conference (Mexico NAVATHE, S., AND CHENG, A. 1983. A methodology
City). for database schema mapping from extended en-
FONG, E., AND GOLDFINE, A. 1986. Data base direc- tity relationship models into the hierarchical
tions: Information resource managementi Mak- model. In Entity Relationship Approach to Soft-
ing it work, Executive Summary. SIGMOD ware Engineering, Davis, et al. Eds. Elsevier Sci-
RECORD 15, 3 (Sept.). ence Publishers, New York.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


234 l Amit Sheth and James Larson

PELAGATTI, G., PAOLINI, P., AND BRACCHI, G. Meitner Institut, Berlin GmnH, D-1000 Berlin
1978. Mapping external views to common data 39, FRG.
model. Znf Syst. 3. ZANIOLO, C. 1979. Multimode1 external schemas for
RAJINIKANTH, M., JACOBSON, G., LAFOND, C., PAPP, CODASYL data base management systems. In
w., AND PIATETSKY-SHAPIRO, G. 1990. Data Base Architecture, Bracchi, G., and Nijssen,
Multiple database integration in CALIDA: De- G., Eds. North-Holland, The Netherlands.
sign and integration. In Proceedings of the 1st
International Conference on Systems Integration
(Apr.).
REINER, D., BROWN, G., FRIEDELL, M., LEHMAN, J., GLOSSARY
MCKEE, R., RHEINGANS, P., AND ROSENTHAL,
A. 1987. A Database designer’s workbench. In
Entity-Relationship Approach, Spaccapietra, S.,
Accessing Processor: Software that ac-
Ed. Elsevier Science Publishers. New York. cepts commands and produces data by
pp. 347-360. executing commands against a database.
SENKO, M., 1976. DIAM as a detailed example of Class of Users: A set of users performing
the ANSI SPARC architecture. In Modelline in closely related tasks.
Data Base Management Systems, Nijssen, G.,-Ed. Common Data Model (CDM): A data
North-Holland, The Netherlands.
model to which schemas of different com-
SHIPMAN, D. 1981. The functional data model data
language DAPLEX. ACM Trans. Database Svst. ponent DBMSs are translated for the
6, l-(Mar.), 140-173. purpose of representation in a com-
SMITH, J., et al. 1981. Multibase: Integrating heter- mon format and facilitation of schema
ogeneous distributed database systems. In Pro- integration.
ceedings of the National Computer Conference. Component Database Administrator
SPACCAPIETRA, S., DEMO, B., DILEVA, A., PARENT, (component DBA): The administrator
C., CELLIS, C., AND BELFAR, K. 1982. An ap- of a component DBS (also called local
proach to effective heterogeneous database co-
operation. In Distributed Data Sharing Systems, DBA) who, among other things, decides
van de Riet. R., and Litwin. W.. Eds. North- data access rights of local users as well
Holland, The Netherlands, pp: 209-218. as (individuals and/or classes) of federa-
STEPHENSON, G., AND MAIN, R. 1986. ARCHEDDA tion uses. It is the component DBA’s
Prototype. Final Report. CRI, Great Britain. responsibility to manage the local
STOCKER, P., et al. 1984. Proteus: A heterogeneous schema, translate it to create the com-
distributed data-base project. In Databases: Role
and Structure, Gray, P., and Atkinson, M., Eds. ponent schemas, and define export
Cambridge University Press, New York. schemas.
TEMPLETON, M., BRILL, D., CHEN, A., DAO, S., AND Component DBMS: A DBMS participat-
LUND, E. 1986. Mermaid: Experiences with net- ing in a multidatabase system. A com-
work operation. In Proceedings of the 2nd Znter- ponent DBMS participating in an FDBS
national Conference on Data Engineering.
is autonomous and allows local opera-
TEMPLETON, M., BRILL, D., HWANG, A., KAMENY, I.,
AND LUND, E. 1983. An overview of the mer-
tions as well as global (federation) oper-
maid system. A frontend to heterogeneous data- ations that meet its constraints.
bases. In Proceedings of EASCON 83. Component Schema: A translation of a
THOMSON, G. 1987. Multidatabase concurrency local schema into an equivalent schema
control. Ph.D. dissertation, Oklahoma State Uni- in the common data model.
versity. Constructing Processor: Software that
TSUBAKI, M., AND HOTAKA, R. 1980. Distributed partitions and/or replicates operations
multidatabase environment with a supervisory
data dictionary database. In Entity-Relationship produced by a single processor for exe-
Approach to System Analysis and Design, Chen cution by two or more processors. Also
P., Ed. North-Holland, The Netherlands. software that merges data produced by
VASSILIOU, Y., AND LOCHOVSKY, F. 1980. DBMS two or more processors for consumption
transaction translation. In Proceedings COMP- by another processor.
SAC 80, IEEE Computer Software and Applica-
tion Conference.
Database Management System
VEIJALAINEN, J., AND POPESCU-ZELETIN, R. 1986.
(DBMS): Software that manages a col-
On multi-database transactions in a cooperative, lection of structured data. Management
autonomous environment. Tech. Rep., Hahn- includes providing data management

ACM Computing Surveys, Vol. 22, No. 3, September 1990


Federated Database Systems l 235

services including data access, constraint or data that can be passed to another
enforcement, and consistency manage- processor.
ment. Global (or External) Operation: An op-
Database System (DBS): A database eration that is not submitted by a local
system (DBS) consists of a DBMS that user (e.g., an operation submitted to the
manages one or more databases. component DBS by the FDBMS or an-
Distributed DBMS: A system that man- other component DBS).
ages multiple databases. Local Schema: A schema of a component
Export Schema: A subschema of a com- DBS in a native data model of the com-
ponent schema specifically defined to ex- ponent DBMS.
press the access constraints on a class of Local User: A user of a component DBS.
federation users to access the data rep- Multidatabase System (MDBS): A sys-
resented in the component schema. tem that allows operations on multiple
External Schema: A subschema or a databases.
view defined over a federated schema Processor: Software that performs oper-
primarily for a pragmatic reason of not ations on data and commands.
having to define too many federated Schema Object: A description of a data
schemas or to tailor a federated schema element in a schema. For example, a
for smaller groups of federation users schema expressed in the Entity Relation-
than that of a federated schema. ship model has three types of schema
Federated Database System (FDBS): objects: entity types, relationship types,
A system that is created to provide op- and attribute definitions. An example of
erations on databases managed by auton- an entity type schema object is an “Em-
omous, and possibly heterogeneous, ployee” entity type.
DBSs. The software that manages an Schema, Subschema, and View: A rep-
FDBS is called a federated DBMS resentation of the structure (syntax), se-
(FDBMS). mantics, and constraints on the use of a
Federated Schema: An integration of database (or its portion) in a particular
several export schemas created to repre- data model. A schema is a collection of
sent data access requirements and rights schema objects. A subschema is a collec-
of a class or group of federation users. tion of subsets of that schema’s objects.
Federation Administrator (federation A view is any connected portion of a
DBA): The administrator of a federated schema. In other words, a schema is a
schema or the federated database sys- collection of views.
tem who, among other things, creates Transforming Processor: Software that
and maintains federated and external translates commands from one language
schemas. or format to another language or format
Federation User: A user of an FDBS or or translates data from one format to
an application running over an FDBS. another.
Filtering Processor: Software that con- User: An individual or an application us-
strains operations that can be applied ing a database system.

Appendix: Features of Some FDBS/Multi-DBMS Efforts


Table A.1 summarizes the choice of a common data model and the types of component
DBMSs integrated into some of the experimental prototype systems. Table A.2 gives an
empirical and partial list of significant or interesting features of some of these efforts.
No effort is made to present all systems that exhibit FDBS features or to capture all
important details of those mentioned.

ACM Computing Surveys, Vol. 22, No. 3, September 1990


236 l Amit Sheth and James Larson
Table A.l. Data Models of Various Multi-DBMS/FDBS Efforts
Types of Component
Effort CDM DBMSs Reference
ADDS Extended Relational Hierarchical, Relational, Files [Breitbart et al. 19861
CALIDA Relational like Relational, Relational like, [Jacobson et al. 19881
Files
DDTS Relational Network, Relational [Dwyer and Larson 19871
DQS Relational Hierarchical, Network, [Belcastro et al. 19881
Relational
IISS E-R Variant (IDEFlx) Network, Relational [IISS 19861
Mermaid Relational (DIL) Relational (partial integration [Templeton et al. 1987b]
of text)
MRDSM Relational Relational [Litwin 19851
Multibase Functional (DAPLEX) Network, Relational [Landers and Rosenberg 1982 ]
OMNIBASE Extended Relational Relational [Rusinkiewicz et al. 19891
SIRUS-DELTA Relational Network, Relational [Litwin et al. 19821
SCOOP E-R Hierarchical, Network, [Spaccapietra et al. 19821
Relational
’ Extended E-R is used for integrity constraint enforcement at the federated schema level and at the external
schema level. The primary CDM can be said to be relational since the internal command language is based on
the relational algebra.

Table A.2. Significant Features


Effort
ADDS Tightly coupled, limited updates, in-house use of the prototype system
CALIDA More like loosely coupled, intermediate language, interactive, menu-
driven user interface, data dictionary editor/browser, in-house use of the
prototype system
DDTS Tightly coupled architecture, integrity constraint checking, local query
optimization, use of local and long haul communication
DQS Tightly coupled, completeness of implementation, auxiliary schema, IBM
environment, attention to autonomy
IISS More like tightly coupled, queries must be compiled, forms-based user
interface
Mermaid Tightly coupled, completeness of implementation, data dictionary, access
control, global query optimization, extensions to include texts, experience
with communication systems, workstation environment
MRDSM Loosely coupled architecture, multidatabase language, dealing with mul-
tiple semantic interpretations
Multibase Tightly coupled, completeness of implementation, schema integration,
global query optimization, auxiliary schema, architecture, number of
component DBMSs integrated in various prototypes
OMNIBASE Loosely coupled, DEC environment, query processing, distributed DBMS
as a comnonent DBMS

Received November 1988; final revision accepted June 1990.

ACM Computing Surveys, Vol. 22, No. 3, September 1990

View publication stats

You might also like