Professional Documents
Culture Documents
net/publication/200043078
CITATIONS READS
2,123 828
2 authors, including:
Amit Sheth
Wright State University
984 PUBLICATIONS 27,568 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
PREDOSE PROJECT: A STUDY OF SOCIAL WEB DATA ON BUPRENORPHINE ABUSE USING SEMANTIC WEB TECHNOLOGY
(R21DA030571/Daniulaityte; Sheth, PIs) View project
Hazards SEES: Social and Physical Sensing Enabled Decision Support for Disaster Management and Response View project
All content following this page was uploaded by Amit Sheth on 22 May 2014.
JAMES A. LARSON
Intel Corp., HF3-02, 5200 NE Elam Young Pkwy., Hillsboro, Oregon 97124
’ The views and conclusions in this paper are those of the authors and should not be interpreted as necessarily
representing the official policies, either expressed or implied, of Bellcore, Intel Corp., or the authors’ past or
present affiliations. It is the policy of Bellcore to avoid any statements of comparative analysis or evaluation
of vendors’ products. Any mention of products or vendors in this document is done where necessary for the
sake of scientific accuracy and precision, or for background information to a point of technology analysis, or to
provide an example of a technology for illustrative purposes and should not be construed as either positive or
negative commentary on that product or that vendor. Neither the inclusion of a product or a vendor in this
paper nor the omission of a product or a vendor should be interpreted as indicating a position or opinion of
that product or vendor on the part of the author(s) or of Bellcore.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its
date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To
copy otherwise, or to republish, requires a fee and/or specific permission.
0 1990 ACM 0360-0300/90/0900-0183 $01.50
...
tonomous DBSs. We describe various ar- tion management strategies) has been pro-
chitectural alternatives and components of posed by Elmagarmid [1987]. Such a
a federated database system and explore characterization is particularly relevant to
the issues related to developing and oper- the study and development of transaction
ating such a system. The survey assumes management in FDBMS, an aspect of
an understanding of the concepts in basic FDBS that is beyond the scope of this
database management textbooks [ Ceri and paper.
Pelagatti 1984; Date 1986; Elmasri and
Navathe 1989; Tsichritzis and Lochovsky Distribution
19821 such as data models, the ANSI/
SPARC schema architecture, database de- Data may be distributed among multiple
sign, query processing and optimization, databases. These databases may be stored
transaction management, and distributed on a single computer system or on multiple
database management. computer systems, co-located or geograph-
ically distributed but interconnected by a
Characteristics of Database Systems
communication system. Data may be dis-
tributed among multiple databases in dif-
Systems consisting of multiple DBSs, of ferent ways. These include, in relational
which FDBSs are a specific type, may be terms, vertical and horizontal database par-
characterized along three orthogonal di- titions. Multiple copies of some or all of the
mensions: distribution, heterogeneity, and data may be maintained. These copies need
autonomy. These dimensions are discussed not be identically structured.
below with an intent to classify and define Benefits of data distribution, such as in-
such systems. Another characterization creased availability and reliability as well
based on the dimensions of the networking as improved access times, are well known
environment [single DBS, many DBSs in a [Ceri and Pelagatti 19841. In a distributed
local area network (LAN), many DBSs in DBMS, distribution of data may be in-
a wide area network (WAN), many net- duced; that is, the data may be deliberately
works], update related functions of partic- distributed to take advantage of these ben-
ipating DBSs (e.g., no update, nonatomic efits. In the case of FDBS, much of the
updates, atomic updates), and the types of data distribution is due to the existence of
heterogeneity (e.g., data models, transac- multiple DBSs before an FDBS is built.
ACM Computing Surveys, Vol. 22, No. 3, September 1990
186 l Amit Sheth and James Larson
derlying data model used to define data
DatabaseSystems structures and constraints. Both represen-
Differencesin DBMS tation (structure and constraints) and lan-
-data models guage aspects can lead to heterogeneity.
(structures,constraints,querylanguages)
-system levelsupport l Differences in structure: Different
(concurrencycontrol,commit,recovery) data models provide different structural
SemanticHeterogeneity primitives [e.g., the information modeled
C using a relation (table) in the relational
OperatingSystem 0 model may be modeled as a record type
-file systems m in the CODASYL model]. If the two rep-
-naming, file types,operations m
U resentations have the same information
-transaction support n content, it is easier to deal with the dif-
-interprocess communication I
C ferences in the structures. For example,
Hardware/System a address can be represented as an entity
-instruction set t in one schema and as a composite attri-
-data formats8 representation I
0 bute in another schema. If the informa-
-configuration n tion content is not the same, it may be
very difficult to deal with the difference.
Figure 2. Types of heterogeneities. As another example, some data models
(notably semantic and object-oriented
models) support generalization (and
property inheritance) whereas others do
not.
Many types of heterogeneity are due to l Differences in constraints: Two data
technological differences, for example, dif- models may support different con-
ferences in hardware, system software straints. For example, the set type in a
(such as operating systems), and commu- CODASYL schema may be partially
nication systems. Researchers and devel- modeled as a referential integrity con-
opers have been working on resolving such straint in a relational schema. CODA-
heterogeneities for many years. Several SYL, however, supports insertion and
commercial distributed DBMSs are avail- retention constraints that are not cap-
able that run in heterogeneous hardware tured by the referential integrity con-
and system software environments. straint alone. Triggers (or some other
The types of heterogeneities in the da- mechanism) must be used in relational
tabase systems can be divided into those systems to capture such semantics.
due to the differences in DBMSs and those l Differences in query languages:
due to the differences in the semantics of Different languages are used to manipu-
data (see Figure 2). late data represented in different data
models. Even when two DBMSs support
the same data model, differences in their
Heterogeneities due to Differences in DBMSs
query languages (e.g., QUEL and SQL)
An enterprise may have multiple DBMSs. or different versions of SQL supported
Different organizations within the enter- by two relational DBMSs could contrib-
prise may have different requirements and ute to heterogeneity.
may select different DBMSs. DBMSs Differences in the system aspects of the
purchased over a period of time may be DBMSs also lead to heterogeneity. Exam-
different due to changes in technology. Het- ples of system level heterogeneity include
erogeneities due to differences in DBMSs differences in transaction management
result from differences in data models and primitives and techniques (including
differences at the system level. These are concurrency control, commit protocols,
described below. Each DBMS has an un- and recovery), hardware and system
(d) Constraints (e.g., semantic integrity and resources (i.e., the data it manages)
constraints and the serializability cri- with others. This includes the ability to
teria) used to manage the data, associate or disassociate itself from the fed-
(e) The functionality of the system (i.e., eration and the ability of a component DBS
the operations supported by system), to participate in one or more federations.
(f) The association and sharing with other Association autonomy may be treated as
systems (see association autonomy be- a part of the design autonomy or as an
low), and autonomy in its own right. Alonso and
Barbara [1989] discuss the issues that are
k) The implementation (e.g., record and
relevant to this type of autonomy.
file structures, concurrency control
A subset of the above types of autonomy
algorithms).
were also identified by Heimbigner and
Heterogeneity in an FDBS is primarily McLeod [1985]. Du et al. [1990] use the
caused by design autonomy among compo- term local autonomy for the autonomy of a
nent DBSs. component DBS. They define two types of
The next two types of autonomy involve local autonomy requirements: operation
the DBMS of a component DBS. Commu- autonomy requirements and service auton-
nication autonomy refers to the ability of omy requirements. Operation autonomy re-
a component DBMS to decide whether quirements relate to the ability of a
to communicate with other component component DBS to exercise control over its
DBMSs. A component DBMS with com- database. These include the requirements
munication autonomy is able to decide related to design and execution autonomy.
when and how it responds to a request from Service autonomy requirements relate to the
another component DBMS. right of each component DBS to make de-
Execution autonomy refers to the ability cisions regarding the services it provides to
of a component DBMS to execute local other component DBSs. These include the
operations (commands or transactions sub- requirements related to association and
mitted directly by a local user of the com- communication autonomy. Garcia-Molina
ponent DBMS) without interference from and Kogan [1988] provide a different clas-
external operations (operations submitted sification of the types of autonomy. Their
by other component DBMSs or FDBMSs) classification is particularly relevant to the
and to decide the order in which to execute operating system and transaction manage-
external operations. Thus, an external sys- ment issues.
tem (e.g., FDBMS) cannot enforce an order The need to maintain the autonomy of
of execution of the commands on a com- component DBSs and the need to share
ponent DBMS with execution autonomy. data often present conflicting require-
Execution autonomy implies that a com- ments. In many practical environments, it
ponent DBMS can abort any operation that may not be desirable to support the auton-
does not meet its local constraints and that omy of component DBSs fully. Two exam-
its local operations are logically unaffected ples of relaxing the component autonomy
by its participation in an FDBS. Further- follow:
more, the component DBMS does not need
to inform an external system of the order l Association autonomy requires that each
in which external operations are executed component DBS be free to associate or
and the order of an external operation with disassociate itself from the federation.
respect to local operations. Operationally, This would require that the FDBS be
a component DBMS exercises its execution designed so that its existence and opera-
autonomy by treating external operations tion are not dependent on any single
in the same way as local operations. component DBS. Although this may be a
Association autonomy implies that a com- desirable design goal, the FDBS may
ponent DBS has the ability to decide moderate it by requiring that the entry
whether and how much to share its func- or departure of a component DBS must
tionality (i.e., the operations it supports) be based on an agreement between the
CA Schema B
(b)
(4
(b)
Figure 6. Filtering processors. (a) A pair of companion filtering processors. (b) An abstract filtering processor.
commands are interpreted, semantic in- than one way to translate an update com-
tegrity constraints will automatically be mand. We do not discuss the view update
enforced, or (c) verifies that data pro- task in more detail because we feel that a
duced by another processor does not vi- loosely coupled FDBS is not well suited to
olate any semantic integrity constraint. support updates, and solving this problem
l Access controller, which verifies that the in a tightly coupled FDBS is very similar
user is permitted to perform the com- to solving it in a centralized or distributed
mand on the indicated data or verifies DBMS [Sheth et al. 1988a].
that the user is permitted to use data
produced by another processor. 1.2.3 Constructing Processor
Figure 6(a) illustrates two filtering pro- Constructing processors partition and/or
cessors, one that controls commands and replicate an operation submitted by a single
one that controls data. Again, we will ab- processor into operations that are accepted
stract command- and data-filtering proces- by two or more other processors. Construct-
sors into a single filtering processor as ing processors also merge data produced by
illustrated in Figure 6(b). several processors into a single data set for
An important task that may be solved by consumption by another single processor.
a filtering processor is that of view update. They can support location, distribution,
This task occurs when the differences in and replication transparencies because a
data structures between the view and the processor submitting a command does not
schema is such that there may be more need to know the location, distribution, and
<a>
(2 Schema A
(b)
Figure 7. Constructing processors. (a) A pair of constructing processors. (b) An abstract constructing
processor.
Filtering
Processor n
Transforming
Processor
m Internal
Accessing
Processor
I
Local
bb b
Schema
Such transforming processors and the com- similar to that of federated schema is rep-
ponent schemas support the heterogeneity resented by the terms import schema
feature of an FDBS. [Heimbigner and McLeod 19851, global
schema [Landers and Rosenberg 1982J,
Export Schema: Not all data of a com- global conceptual schema [Litwin et al.
ponent DBS may be available to the fed- 19821, unified schema, and enterprise
eration and its users. An export schema schema, although the terms other than im-
represents a subset of a component schema port schemas are usually used when there
that is available to the FDBS. It may in- is only one such schema in the system.
clude access control information regarding
its use by specific federation users. The External Schema: An external schema
purpose of defining export schemas is to defines a schema for a user and/or appli-
facilitate control and management of asso- cation or a class of users/applications. Rea-
ciation autonomy. A filtering processor can sons for the use of external schemas are as
be used to provide the access control as follows:
specified in an export schema by limiting
the set of allowable operations that can be l Customization: A federated schema
submitted on the corresponding component can be quite large, complex, and difficult
schema. Such filtering processors and the to change. An external schema can be
export schemas support the autonomy fea- used to specify a subset of information in
ture of an FDBS. a federated schema that is relevant to the
Alternatively, the data available to the users of the external schema. They can
FDBS can be defined as the transactions be changed more readily to meet chang-
that can be executed by a component DBS ing users’ needs. The data model for an
(e.g., [Ge et al. 1987; Heimbigner and external schema may be different than
McLeod 1985; Veijalainen and Popescu- that of the federated schema.
Zeletin 19881). In this paper, however, we Additional integrity constraints:
will not consider that case of exporting Additional integrity constraints can also
transactions. be specified in the external schema.
Access control: Export schemas pro-
Federated Schema: A federated schema vide access control with respect to the
is an integration of multiple export sche- data managed by the component data-
mas. A federated schema also includes the bases. Similarly, external schemas pro-
information on data distribution that is vide access control with respect to the
generated when integrating export sche- data managed by the FDBS.
mas. Some systems use a separate schema
called a distribution schema or an allocation A filtering process analyzes the com-
schema to contain this information. A con- mands on an external schema to ensure
structing processor transforms commands their conformance with access control and
on the federated schema into the com- integrity constraints of the federated
mands on one or more export schemas. schema. If an external schema is in a dif-
Constructing processors and the federated ferent data model from that of the federated
schemas support the distribution feature of schema, a transforming processor is also
an FDBS. needed to transform commands on the ex-
There may be multiple federated sche- ternal schema into commands on the fed-
mas in an FDBS, one for each class of erated schema.
federation users. A class of federation users Most existing prototype FDBSs support
is a group of users and/or applications per- only one data model for all the external
forming a related set of activities. For ex- schemas and one query language interface.
ample, in a corporate environment, all Exceptions are a version of Mermaid that
managers may be one class of federation supported two query language interfaces,
users, and all employees and applications SQL and ARIEL, and a version of DDTS
in the accounting department may be an- that supported SQL and GORDAS (a
other class of federation users. A concept query language for an extended ER model).
Future systems are likely to provide ing local schema. The additional semantics
more support for multimode1 external are supplied by the FDBS developer during
schemas and multiquery language interfaces the schema design, integration, and trans-
[Cardenas 1987; Kim 19891. lation processes.
The five-level schema architecture
Besides adding to the levels in the presented above has several possible
schema architecture, heterogeneity and au- redundancies.
tonomy requirements may also dictate
changes in the content of a schema. For Redundancy between external and
example, if an FDBS has multiple hetero- federated schemas: External schemas
geneous DBMSs providing different data can be considered redundant with feder-
management capabilities, a component ated schemas since a federated schema
schema should contain information on the could be generated for every different
operations supported by a component federation user. This is the case in the
DBMS. schema architecture of Heimbigner and
An FDBS may be required to support McLeod [ 19851 (they use the term import
local and external schemas expressed in schema rather than federated schema). In
different data models. To facilitate their loosely coupled FDBSs, a user defines the
design, integration, and maintenance, how- federated schema by integrating export
ever, all component, export, and federated schemas. Thus there is usually no need
schemas should be in the same data model. for an additional level. In tightly coupled
This data model is called canonical or com- FDBSs, however, it may be desirable to
mon data model (CDM). A language asso- generate a few federated schemas for
ciated with the CDM is called an internal widely different classes of users and to
command language. All commands on fed- customize these further by defining ex-
erated, export, and component schemas are ternal schemas. Such external schemas
expressed using this internal command can also provide additional access
language. control.
Database design and integration is a Redundancy between an external
complex process involving not only the schema of a component DBS and an
structure of the data stored in the databases export schema: If a component DBMS
but also the semantics (i.e., the meaning supports proper access control security
and use) of the data. Thus it is desirable to features for its external schemas and if
use a high-level, semantic data model [Hull translating a local schema into a compo-
and King 1987; Peckham and Maryanski nent schema is not required (e.g., the data
19881 for the CDM. Using concepts from model of the component DBMS is the
object-oriented programming along with a same as CDM of the FDBS), then the
semantic data model may also be appropri- external schemas of a component DBS
ate for use as a CDM [Kaul et al. 19901. may be used as an export schema in the
Although many existing FDBS prototypes five-level schema architecture (external
use some form of the relational model as schemas of component DBSs are not
the CDM (Appendix), we believe that fu- shown in the five-level schema architec-
ture systems are more likely to use a se- ture of Figure 10).
mantic data model or a combination of an Redundancy between component
object-oriented model and a semantic data schemas and local schemas: When
model. Most of the semantic data models component DBSs uses CDM of the
will adequately meet requirements of a FDBS and have the same functionality,
CDM, and the choice of a particular one is it is unnecessary to define component
likely to be subjective. Because a CDM schemas.
using a semantic data model may provide
richer semantic constructs than the data Figure 12 shows an example in which
models used to express the local schemas, some of the schema levels are not used. No
the component schema may contain more external schemas are defined over Feder-
semantic information than the correspond- ated Schema 2 (all of it is presented to all
Figure 12. Example FDBS schemas with missing schemas at some levels.
federation users using it). Component in one type of schema may also vary from
Schema 2 is the same as the Local Schema those in another type of schema. For ex-
2 (the data model of the Component DBMS ample, a federated schema may have
2 is the same as the CDM). No export schema objects describing the capabilities
schema is defined over Component Schema of the various component DBMSs in the
3 (all of it is exported to the FDBS). system, whereas no such objects exist in
An important type of information asso- the local schemas.
ciated with all FDBS schemas is the map- Two important features of the schema
pings. These correlate schema objects at architecture are how autonomy is preserved
one level with the schema objects at the and how access control is managed. These
next lower level of the architecture. Thus, involve exercising control over schemas at
there are mappings from each external different levels. Two types of administra-
schema to the federated schema over which tive individuals are involved in developing,
it is defined. Similarly, there are mappings controlling, and managing an FDBS:
from each federated schema to all of the
export schemas that define it. The map- l A component DBS administrator (com-
pings may either be stored as a part of the ponent DBA) manages a component
schema information or as distinct objects DBS. There is one component DBA5 for
within the FDBS data dictionary (which each component DBS. The local, com-
also stores schemas). The amount of dic- ponent, and export schemas are con-
tionary information needed to describe a trolled by the component DBAs of the
schema object in one type of schema may respective component DBSs. A key man-
be different from that needed for another agement function of a component DBA
type of schema. For example, the descrip-
tion of an entity type in a federated schema
may include the names of the users that ’ Here a database administrator is a logical entity. In
reality, multiple authorized individuals may play the
can access it, whereas such information is role of a single (logical) DBA, or the same individual
not stored for an entity type in a compo- may play the role of the component DBA for multiple
nent schema. The types of schema objects component DBSs.
is to define the export schemas that spec- mas” in Heimbigner and McLeod [1985]),
ify the access rights of federation users defining a view using a set of operators
to access different data in the component (e.g., defining “superviews” in Motro
databases. and Buneman [1981]), or defining a view
. A federation DBA defines and manages a using a query in a multidatabase lan-
federated schema and the external sche- guage ([Czejdo et al. 1987; Litwin and
mas related to the federated schema. Abdellatif 19861; see Section 5.1). In a
There can be one federation DBA for tightly coupled FDBS, it takes the form of
each federated schema or one federation schema integration ([Batini et al. 19861; see
DBA for the entire FDBS. Each federa- Section 4.4).
tion DBA in a tightly coupled FDBS is a A typical process of developing federated
specially authorized system administra- schemas in a loosely coupled FDBS is as
tor and is not a federation user. In a follows. Each federation user is the admin-
loosely coupled FDBS, federated schemas istrator of his or her own federated schema.
are defined and maintained by the users, First, a federation user looks at the avail-
not by the system-assigned federation able set of export schemas to determine
DBA. This is further discussed in Sec- which ones describe data he or she would
tion 2.1. like to access. Next, the federation user
defines a federated schema by importing
2. SPECIFIC FEDERATED DATABASE the export schema objects by using a user
SYSTEM ARCHITECTURES interface or an application program or by
defining a multidatabase language query
The architecture of an FDBS is primarily that references export schema objects. The
determined by which schemas are present, user is responsible for understanding the
how they are arranged, and how they are semantics of the objects in the export sche-
constructed. In this section, we begin by mas and resolving the DBMS and semantic
discussing the loosely coupled and tightly heterogeneity. In some cases, component
coupled architectures of our taxonomy in DBMS dictionaries and/or the federated
additional detail. Then we discuss how sev- DBMS dictionary may be consulted for ad-
eral alternate architectures can be derived ditional information. Finally, the federated
from the five-level schema architecture by schema is named and stored under account
inserting additional basic components, re- of the federation user who is its owner. It
moving all basic components of a specific can be referenced or deleted at any time by
type, and arranging the components of the that federation user.
five-level schema architecture in different A typical scenario for the administration
ways. We then discuss assignment of com- of a tightly coupled FDBS is as follows. For
ponents to computers. Finally, we briefly simplicity, we assume single (logical) fed-
discuss four case studies. eration DBA for the entire tightly coupled
FDBS. Export schemas are created by ne-
2.1 Loosely Coupled and Tightly Coupled gotiation between a component DBA and
FDBSs the federation DBA; the component DBA
has authority or control over what is in-
With the background of Section 1, we dis-
cluded in the export schemas. The federa-
cuss distinctions between the loosely cou-
pled and tightly coupled FDBSs in more tion DBA is usually allowed to read the
detail. component schemas to help determine
what data are available and where they are
located and then negotiate for their access.
2.1.1 Creation and Administration of Federated
The federation DBA creates and controls
Schemas
the federated schemas. External schemas
The process of creating a federated schema are created by negotiation between a fed-
takes different forms. In a loosely coupled eration user (or a class of federation users)
FDBS, it typically takes the form of schema and the federation DBA who has the
importation (e.g., defining “import sche- authority over what is included in each
external schema. It may be possible to in- eration DBA [Litwin and Abdellatif
stitute detailed and well-defined negotia- 19861.
tion protocols as well as business rules (or
some types of constraints) for creating, An example of multiple semantics is as
modifying, and maintaining the federated follows. Suppose that there are two export
schemas. schemas, each containing the entity SHOE.
Based on how often the federated sche- The colors of SHOE in one component
mas are created and maintained as well as schema, schemal, are brown, tan, cream,
on their stability, an FDBS may be termed white, and black. The colors of SHOE in
dynamic or static. Properties of a dynamic the other component schema, schema2, are
FDBS are as follows: (a) A federated brown, tan, white, and black. Users defin-
schema can be promptly created and ing different federated schemas may define
dropped; (b) there is no predetermined pro- different mappings that are relevant to
cess for controlling the creation of a feder- their applications. For example,
ated schema. As described above, defining
a federated schema in a loosely coupled l User1 maps cream in his federated sche-
FDBS is like creating a view over the sche- mas to cream in schema1 and tan in
mas of the component DBSs. Since such a schema2,
federated schema may be managed on the l User2 maps cream in her federated
fly (created, changed, dropped easily) by a schema to tan or cream in schema1 and
user, loosely coupled FDBSs are dynamic. tan or white in schema2.
A tightly coupled federation is almost al-
ways static because creating a federated Proponents of the loosely coupled archi-
schema is like database schema integration. tecture argue that a federated schema cre-
A federated schema in a tightly coupled ated and maintained by a single federation
FDBS evolves gradually and in a more con- DBA is utopian and totalitarian in nature
trolled fashion. [Litwin 1987; Rusinkiewicz 19871. We feel
that a loosely coupled approach may be
better suited for integrating a large number
2.1.2 Case for Loosely Coupled FDBS of very autonomous read only databases
A loosely coupled FDBS provides an inter- accessible over communication networks
face to deal with multiple component (e.g., public databases of the types dis-
DBMSs directly. A typical way to formulate cussed by Litwin and Abdellatif [ 19861).
queries is to use a multidatabase language User management of federated schemas
(see Section 5.1). This architecture has the means that the FDBMS can do little to
- following advantages: optimize queries. In most cases, however,
the users are free to use their own under-
l A user can precisely specify relationships standing of the component DBSs to design
and mappings among objects in the ex- a federated schema and to specify queries
port schema. This is desirable when the to achieve good performance.
federation DBA is unable to specify the
mappings in order to integrate data in
multiple databases in a manner meaning- 2.1.3 Case for Tightly Coupled FDBS
ful to the user’s precise needs [Litwin The loosely coupled approach may be ill
and Abdellatif 19861. suited for more traditional business or cor-
l It is also possible to support multiple porate databases, where system control (via
semantics since different users can im- DBAs that represent local and federation
port or integrate export schemas differ- level authories) is desirable, where the users
ently and maintain different mappings are naive and would find it difficult to
from their federated schemas to export perform negotiation and integration them-
schemas. This can be a significant advan- selves, or where location, distribution, and
tage when the needs of the federation replication transparencies are desirable.
users cannot be anticipated by the fed- Furthermore, in our opinion, a loosely
coupled FDBS is not suitable for update grated to develop a single federated schema.
operations. Updating in a loosely coupled Sometimes an organization will insist on
FDBS may degrade data integrity. When a having a single federated schema (also
user of a loosely coupled FDBSs creates called enterprise schema or global concep-
a federated schema using a view definition tual schema) to have a single point of con-
process, view update transformations are trol for all data sharing in the organization
often not determined. The users may not across the component DBS boundaries. Us-
have complete information on the compo- ing a single federated schema helps in de-
nent DBSs and different users may use fining uniform semantics of the data in the
different semantic interpretations of the FDBS. With a single federated schema, it
data managed by the component DBSs (i.e., is also easier to enforce constraints that
loosely coupled FDBSs support multiple cross export schemas (and hence multiple
semantic interpretations). Thus different databases) then when multiple federated
users can define different federated sche- schemas are allowed.
mas over the same component DBSs, and Because one federated schema is created
different transformations may be chosen by integrating all export schemas and be-
for the same updates submitted on different cause this federated schema supports data
federated schemas. Similar problems can requirements of all federation users, it may
occur in a tightly coupled FDBS with mul- become too large and hence difficult to
tiple federations but can be resolved at the create and maintain. In this case, it may
time of federated schema creation through become necessary to support external sche-
schema integration. A federation DBA cre- mas for different federation users.
ating a federated schema using a schema A tightly coupled FDBS with multiple
integration process can be expected to have federations allows the tailoring of the use
more complete knowledge of the compo- of the FDBS with respect to multiple
nent DBSs and other federated schemas. classes of federation users with different
In addition to the update transformation data access requirements. Integrations of
issue, transaction management issues need the same set of schemas can lead to differ-
to be addressed (see Section 5.4). ent integrated schemas if different seman-
A tightly coupled FDBS provides loca- tics are used. Thus this architecture can
tion, replication, and distribution transpar- support multiple semantics, but the seman-
ency. This is accomplished by developing a tics are decided upon by the federation
federated schema that integrates multiple DBAs when defining the federated schemas
export schemas. The transparencies are and their mappings to the export schemas.
managed by the mappings between the fed- A federation user can select from among
erated schema and the export schemas, and multiple alternative mappings by selecting
a federation user can query using a classical from among multiple federated schemas.
query language against the federated When an FDBS allows updates, multiple
schema with an illusion that he or she is semantics could lead to inconsistencies. For
accessing a single system. A loosely coupled this reason, federation DBAs have to be
system usually provides none of these very careful in developing the federated
transparencies. Hence a user of a loosely schemas and their mappings to the export
coupled FDBS has to be sophisticated to schemas. Updates are easier to support in
find appropriate export schemas that can tightly coupled FDBSs where DBAs care-
provide required data and to define map- fully define mappings than in a loosely
pings between his or her federated schema coupled FDBS where the users define the
and export schemas. Lack of adequate se- mappings.
mantics in the component schemas make
this task particularly difficult. Let us now 2.2 Alternative FDBS Architectures
discuss two alternatives for tightly coupled
FDBSs in more detail. In this section, we discuss how processors
In a tightly coupled FDBS with a single and schemas are combined to create various
federation, all export schemas are inte- FDBS architectures.
Integrity Constraints in
External/Federated Schema 1
1 Filtering processor ]
I
Constructing
Processor
than access a stored database. Navathe larger granularity and allocate them as
et al. [1989] discuss a federated architec- desired. For example, DDTS [Dwyer and
ture being developed to provide access Larson 19871 defines two types of modules:
to databases as well as application Application Processor and Data Processor
programs. (Figure 17). An Application Processor in-
cludes a federated schema with the associ-
2.3 Allocating Processors and Schemas to
ated constructing processor and all the
Computers
external schemas defined over the feder-
ated schema with the associated filtering
It is possible to allocate all processors and processors and transforming processors (if
schemas to a single computer, perhaps to present). A Data Processor includes a local
allow federation users to access data man- schema, a component schema, and the as-
aged by multiple component DBSs on that sociated transforming processor and all ex-
computer. Usually, however, different com- port schemas defined over the component
ponent DBSs reside on different computers schema with associated filtering processors.
connected by a communication system. Dif- An Application Processor performs the
ferent allocations of the FDBS components user interface and distributed transaction
result in different FDBS configurations. management and coordination functions
Figure 16 illustrates the configuration of and is located at every site at which there
a typical FDBS. A general-purpose com- are federation users. A Data Processor per-
puter at site 1 supports a single component forms the data management functions re-
DBS and two federation schemas for two lated to the data managed by a single
different classes of federation users. Site 2 component DBS and is located at every site
is a workstation that supports two export at which a component DBS is located. A
schemas, each containing different data for site can have either or both of the two
use by different federation users. Site 3 is modules. Mermaid [Templeton et al.
a small workstation that supports a single 1987b] divides the processors and the sche-
federation user and no component DBS. mas into four types of modules of smaller
Site 4 is a database computer that has one granularity.
component DBS but supports no federation Special communication processors can
users. also be placed on each computer to enable
It may be desirable to group a related set processors on two different sites to com-
of processors and schemas into modules of municate with each other. Communication
FDBS A
I
Component Component
DBS Al DBS An
0)
External
Schema B 1
FDBS B
processors are not shown in our reference federated schema called the Global Repre-
architecture. They are placed between any sentation Schema, which is expressed in
pair of adjacent processors that are allo- the relational data model. It has an external
cated to different computers. schema called the Conceptual Schema rep-
resented in the Entity-Category-Relation-
2.4 Case Studies ship (ECR) model [Elmasri et al. 19851.
Users formulate requests directly against
In this section we relate the terms and
the Conceptual Schema in the GORDAS
concepts of the reference architecture to
query language [Elmasri 19811. The ECR
those used in four example FDBSs. Our
data model is rich in semantics (e.g., it
purpose is not to survey these systems
shows cardinality and operation con-
[Thomas et al. 19901 but to show how the
straints on an entity’s participation in re-
reference architecture can be used to rep-
lationships). The transforming part of the
resent the architectures of various FDBSs
Translation and Integrity Control proces-
uniformly. This uniform representation
sor is responsible for translating requests
can greatly simplify the task of studying
written in GORDAS on the ECR data
and comparing these systems.
model into the internal form of a relational
2.4.1 DOTS
query language against the Global Repre-
sentational Schema. The filtering part
Figure 17 illustrates the original architec- of the Translation and Integrity Control
ture of DDTS [Devor et al. 1982a] using processor is responsible for modifying
the terminology of the reference architec- each query, so when it is processed, the
ture (to the left of each colon) and the constraints specified in the Conceptual
terminology used by DDTS (to the right of Schema will be enforced. For example, a
each colon and in italics). It has a single GORDAS query that deletes a record will
f Federated Schema 3
I I I
Site 4
\
(Export Schema
I
1 TransforTing Proce: 1
I
Application Processor
F(T)
Federated: Global
Representational
Constructing:
-1
-1
I .
a Processor sor
I
Component DBMS 1: Component DBMS n:
IDS.2 Wet works RAM (Relational)
I I
be modified so that it verifies that deleting There are two separate processors in
the record will not violate any semantic DDTS’s constructing processor, reflecting
integrity constraints before the record is the decision to separate distributed query
actually deleted. DDTS has since been ex- optimization from distributed query exe-
tended to support external schemas [Dwyer cution. The Materialization and Access
and Larson 19871 expressed in the rela- Planning component generates a distrib-
tional data model defined over the Global uted execution strategy consisting of sets
Representation Model. SQL is used to of commands, each expressed in terms
query such external schemas. of one of the Local Representational
omponent 1: Component n:
DAPLEXLocal
I DAPLEXLocal
Schema n I
I
Transforming:
LocalOptimizer LocalOptimizer
Transforming:
Translator
I
l*rrn
I
Component
DBMS n:
Relational
I
I
Constructing: Optimizer
and Controller
Local Schema 1
ternal 11 Federated 1:
DSL Queries / External 1
l.**P
I
Constructing: MRDSM
mponent l/ Local 1:
MRDS Schema I MRDS Schema n
I
Component DBMS 1: freon Component DBMS n:
MRDS I MRDS n
I I
database 1
existing applications need not be changed Both migration and extension of file sys-
in an FDBS, as the old applications are tems are expensive. Many developers of the
modified, the component databases may be FDBSs elect to use migration because they
standardized, and redundant data (unless do not have access to the source code of the
required for improving availability or ac- file system and because they want their
cess time) may be removed. New applica- data maintained by a commercially avail-
tions may be coded using an external able DBMS. Since existing application pro-
schema defined on a federated schema. grams are affected, however, this activity is
An FDBS evolves through gradual inte- very difficult and often organizationally
gration of a set of interrelated DBSs. It very sensitive.
evolves as the new component databases The second phase of the FDBS evolution,
are added and the existing ones are modi- developing a federated database system, in-
fied. This evolution process can be divided volves creating the component, export, fed-
into three phases: preintegration, develop- erated, and external schemas, defining the
ing a federated database system, and fed- mappings between various schemas, and
erated database system operation. These implementing the associated processors.
phases (or the activities within the phases) This is a critical task for the success of an
need not follow serially from one phase to FDBS. A methodology that can be used to
the next; each phase may be performed manage this phase is discussed in Section
several times, and previous phases may be 3.1. Tasks performed in this phase are dis-
revisited and their results revised. cussed in Section 4.
The preintegration phase deals with the The third phase of the FDBS evolution,
situation in which data reside in files and federated database system operations, in-
are not managed by any DBMS yet need to volves managing and manipulating multi-
be accessed by federation users. Two gen- ple integrated databases using an FDBMS.
eral approaches are possible: The processors support various run-time
FDBS operations (e.g., query processing
l Migrate the files to a DBMS: Files
and transaction management). The proces-
can be migrated to a DBMS by perform-
sors and the federated schemas developed
ing three activities: (1) develop a com-
or generated in the second phase allow the
ponent schema that describes the data
selective, shared, and consistent access to
in the files, (2) load the DBMS with
data stored in the component DBSs. An
data from the files, and (3) modify exist-
FDBMS provides an interface to the users
ing application programs to access the
and applications and may allow execution
DBMS rather than access the files of ad hoc queries. Tasks performed in this
directly.
phase are discussed in Section 5.
l Extend the file system to support
DBMS-like features: By extending 3.1 Methodology for Developing a Federated
the file system to support DBMS-like Database System
features, a file system can be treated as
a component DBMS. In this approach, The methodology discussed here extends
the following activities are performed: the methodology of Sheth [1988b] for de-
(1) create or generate a component veloping schemas for an FDBS to include
schema that describes data in the files, processors. Developing a new FDBS pri-
(2) create backup and recovery facilities marily consists of integrating existing com-
in the file system if the federation users ponent databases. A bottom-up FDBS
will perform updates to data in the file development process can be followed for
system, and (3) create appropriate filter- this purpose. This process can also be used
ing and transforming processors that will for adding a new component database to an
convert commands expressed in the FDBS.
FDBS’s internal language to the com- When new applications are developed us-
mands that can be processed by the file ing an existing FDBS, it is necessary to
system. determine whether the data requirements
I Data I
’ I
-Schema - -\
’ &J$&z-;/i
lntegratlon
Schema
the FDBS, a federated schema will need (b) The required data are not imple-
to be extended or developed to include mented in any component data-
that part. We refer to this unsupported base, and a component DBA is
part as a temporary schema (which is willing to place the required data
discarded at the end of the integration in his or her component database.
process). One or more of the compo- In this case, local, component, and
nent databases will have to support the export schemas of the relevant da-
temporary schema. This can be accom- tabases are modified.
plished in one of three ways: (c) The required data are not imple-
mented in any component data-
(a) The required data exist in one or base, and no component DBA is
more component database(s). In willing to place the required data
this case, identify the component in his or her component databases.
schemas containing the descrip- In this case, the temporary schema
tion of required data and negotiate is implemented as a separate da-
with their administrators to have a tabase of an existing component
description of this data placed in DBMS. Alternatively, a new com-
an export schema with appropriate ponent DBMS may be used that
access rights. may require a new transforming
Requirements
CollectIon & - -
Physical
-Database
Design
PRODUCT
PRODUCT-NAME COST COMPANY-NAME*
(a) (b)
from CODASYL schemas. Elmasri et al. local schema to a component schema. This
[1985] and Teorey et al. [1986] discuss is because a component DBMS is autono-
transformations between extensions of the mous, and the database managed by it
Entity Relation data model [Chen 19761 should not be changed as a result of the
and the relational data model. Lien [1981] translation process. In other words, the
describes mappings from the hierarchical translation should be reversible in the sense
to the relational data model. Tsichritzis that (a) the component schema in the CDM
and Lochovsky [1982] provide a summary should represent the same database repre-
of these types of mappings. sented by the local schema and (b) it should
In practice, the translation task may re- be possible to translate a command on a
quire more than just data model translation component schema into command(s) on the
because the source and the target schemas corresponding local schema. A notion
may not be able to represent exactly the that may be useful in this context is
same semantics, Hence schema translation that of content-preserving transformations
poses two contradictory requirements: [Rosenthal and Reiner 19871.
(1) Capture additional semantics during the
schema translation that can later help in
4.2 Access Control
the tasks of schema integration and view
update, and (2) maintain only the existing An FDBS should be designed to control
semantics because the local schema is not access to component databases by federa-
able to support the additional semantics. tion users. The system architecture of an
These requirements are discussed next. FDBS (shown in Figure 11) has filtering
Using a semantic data model as the CDM processors at two levels. Each can be used
can facilitate representation of additional to provide access control. The filtering pro-
semantics that may be difficult or impos- cessors relating the export and the compo-
sible to specify in a traditional model such nent schemas control access to component
as the relational model. As an example, a DBSs. The filtering processor relating the
generalization relationship with inherit- external and federated schemas controls
ance between two entity types can be ex- access to the federated schemas.7 Negotia-
plicitly represented in a component schema tion between the component and federation
that uses a semantic data model. As an- DBAs may be necessary to reach an agree-
other example, a foreign key between rela- ment on how to control the data a compo-
tions Rl and R2 in a local schema expressed nent DBA wants to keep secure from some
in the relational model may be explicitly of the federation users while allowing ac-
represented as a relationship between the cess to other federation users. Alternately,
two entities representing Rl and R2 in the a new federated schema can be established
corresponding component schema in an
EER model. One, however, should be care- ‘This is similar to using the view mechanism for
ful about the additional semantic informa- access control security in centralized and distributed
tion provided during the translation from a DBMSs [Bertino and Haas 19881.
for use by a selected subgroup of the origi- maid maintains an access file on each com-
nal federation. For example, suppose com- puter for this purpose. Access files control
ponent databases each contain information access to Mermaid’s data dictionary and
about commercial shipping and military directory. Third, Mermaid maintains an
shipping. The following approaches are access list that identifies the external and
possible: federated schemas each user is authorized
to use. Fourth, the user must be registered
Each component DBA includes schema with each computer having a component
objects describing both commercial and DBS that is relevant to the user’s federated
military shipping in their respective ex- schema. In addition, Mermaid provides ex-
port schemas. This information is inte- tensive run-time support for access control,
grated into a single federated schema. uses data encryption (for passing access
The federation DBA creates two external control information over the network and
schemas, one for accessing commercial for storing it in access files and access lists),
shipping information and one for access- and provides an audit trail and a journal
ing military shipping. In this scenario,
trail for security purposes.
the local DBAs trust the federation DBA One of the heterogeneities that may be
to provide appropriate access controls on encountered in an FDBS is the existence
each external schema.
of different and incompatible mechanisms
Each component DBA generates two ex- for expressing and enforcing access control
port schemas, one containing the schema policies. An access control mechanism used
objects describing commercial shipping by an FDBMS, such as the view mecha-
and the other containing schema objects nism, may also conflict with the autonomy
describing military shipping. Two feder- of the component DBMSs. A solution that
ated schemas are created, one dealing uses preprocessing is discussed by Wang
with commercial shipping and the other and Spooner [1987].
dealing with military shipping. In this
scenario, the component DBAs control 4.3 Negotiation
appropriate access on each export
schema and hence is preferred to the A federation DBA manages federated sche-
previous one. mas. A component DBA manages the ex-
port schemas defined over the component
Other access control and security issues DBS he or she manages. A federation DBA
include the following: and component DBAs must reach an agree-
How users are identified, grouped into ment about the contents of the export sche-
classes, and named for access control se- mas and operations allowed on the export
curity purposes. Templeton et al. [1987a] schemas such that federated schemas can
discuss the issue of installing a new user. be defined over them to support federation
How data are identified, grouped into users. The dialogue between the two admin-
classes, and named for access control se- istrators to reach this agreement is
curity purposes [Abbott and McCarthy called negotiation. To facilitate negotia-
1988; Templeton et al. 1987b]. tion, the administrators follow protocols
governing the messages exchanged during
What operations are used for controlling
a negotiation.
security privileges [Fagin 1978; Larson
In terms of the reference schema archi-
1983b; Selinger and Wade 19761.
tecture, there are two ways to perform ne-
A case study of the access control fea- gotiation. First, a component DBA may
tures of Mermaid [Templeton et al. 1987a] allow the federation DBA to read the com-
is interesting. It uses access control at four ponent schemas he or she controls but does
levels. First, a user must have an account not give any specific rights for data access.
on the computer where the Mermaid user When a federation DBA determines data
interface can be run. Second, the user access requirements, he or she sends a re-
should be registered with Mermaid. Mer- quest to the component DBA to define an
export schema with appropriate access ing export schemas in a bottom-up FDBS
rights. This request typically includes in- development process). The tasks are quite
formation on the data to be accessed, the similar and are treated uniformly as
types of access (retrieval or update), and schema integration in this paper.
the federation users who will access the Many approaches and techniques for
data. In some cases, it may also include schema integration have been reported in
requirements on the component DBS such the literature. The survey paper of Batini
as constraints to be satisfied (e.g., how cur- et al. [1986] discusses and compares 12
rent or consistent the data must be and methodologies for schema integration. It
maximum response time), and whether the divides schema integration activities into
federation DBA or user may grant others five steps: preintegration, comparison, con-
some or all of the access rights for this formation, merging, and restructuring. In
export schema [Larson 1983131. The com- the context of FDBS development, prein-
ponent DBA may accept or refuse any part tegration activities involve translation of
of the request for any reason in defining schemas into a CDM so they can be com-
the export schemas. In the second alterna- pared and specification of global con-
tive, all component DBAs may define the straints and naming conventions. The
export schemas a priori and require latter involves activities that may be useful
the access from the FDBMS to be limited in the comparison step (e.g., specifying a
to the contents and access rights contained thesaurus that may be used for identifying
in those export schemas. naming conflicts, homonyms, or syno-
Two of the many aspects of FDBS oper- nyms). Although schema translation has
ations for which negotiation protocols need been studied independently from schema
to be defined are the following: integration, the two tasks are highly inter-
related. Schema translation may be more
l A component DBA decides to withdraw
constrained when integrating existing
access to a schema object or change a
DBSs than during view integration because
schema object in a local schema.
the constraints of the existing data struc-
l A federation user decides that access to a tures cannot be changed.
schema object is no longer needed. The comparison step (also called schema
Heimbigner and McLeod [1985] present analysis) involves two activities: (a) analyz-
negotiation protocols for a multisited dis- ing and comparing the objects of the
tributed dialogue among the DBAs in an schemas (and possibly databases) to be in-
FDBS. Alonso and Barbara [1989] consider tegrated, including identification of naming
the case in which the federated schema is conflicts (e.g., homonym and synonym de-
a materialized view (they call it quasi- tection), domain (i.e., value type) conflicts,
copy). In this context, they explore ways to structural differences, constraint differ-
express the needs of a federation DBA pre- ences, and missing data and (b) specifying
cisely, to determine the degree of sharing the interrelationships among the schema
the component DBA is willing to offer, and objects. The conforming step is closely tied
to estimate the cost of a specific agreement to the comparing step since it is difficult to
for both the component DBS and the compare unless the related information is
FDBS. represented in a similar form in different
schemas.
4.4 Schema integration
Unless the schemas are represented in
the same model, analyzing and comparing
View integration refers to integrating mul- their schema objects is extremely difficult.
tiple user views into a single schema (e.g., It is important to note that comparison of
federated schema development in a top- the schema objects is primarily guided by
down FDBS development process). Schema their semantics. not bv their svntax. Hence
integration refers to integrating (usually ex- the choice of the CDM is critical. The CDM
isting) schemas into a single schema (e.g., should be semantically rich; that is, it
federated schema development by integrat- should provide abstraction and constraints
the query language used by the FDBS FDBMS support different concurrency
and the query language provided by a control mechanisms [Gligor and Popescu-
component DBMS may involve different Zeletin 19861. The problem of global dead-
techniques. lock detection must also be addressed
l Physical database organization of each of [Logar and Sheth 19861.
the component DBSs may be different. Several researchers are currently study-
Similarly, criteria for access path selec- ing these problems [Alonso et al. 1987;
tion for different component DBMSs Breitbart and Silberschatz 1988; Elmagar-
may be different. Due to the autonomy mid and Helal 1988; Pu 19871. As argued
of the component DBMS, however, the by Barker and Ozsu [1988] and Du et al.
GDM or an LDI may not have enough [ 19891, however, the published solutions
information on these matters. often make unrealistic and pessimistic as-
sumptions, support a low level of concur-
Onuegbe et al. [1983] and Dayal and rency, and/or sacrifice autonomy in order
Goodman [1982] develop strategies for to obtain higher concurrency.
local query optimization in DDTS and It is unlikely that a theoretically elegant
Multibase, respectively. solution that provides conflict serializabil-
ity without sacrificing performance (i.e.,
5.4 Global Transaction Management
concurrency and/or response time) and
availability exists. One often accepted
The Global Transaction Manager (GTM) trade-off is to limit the functionality of
is responsible for maintaining database concurrency control in favor of preserving
consistency while allowing concurrent up- site autonomy. Examples of this would be
dates across multiple databases. Support- to allow only unsynchronized retrievals,
ing global transaction management in an preclude multisite updates, or perform local
environment with multiple heterogeneous updates off-line. Another approach is to
and autonomous component DBSs is very devise mechanisms that specifically suit the
difficult. This is underlined by the fact that limitations of a given environment and
none of the prototype FDBMSs support provide the required level of consistency.
global transaction management. Nowhere Eliassen and Veijalainen [1987] propose a
does the autonomy of component DBMSs concept of S-Transactions (for semantic
present more problems than in supporting transactions) suited for a banking environ-
updates. ment consisting of a network of highly au-
There are two types of transactions to be tonomous systems. It may be desirable to
managed: global transactions submitted to devise solutions that do not meet the con-
the FDBMS by federation users and local flict serializability criteria but that are
transactions directly submitted to a com- practical and meet a desired level of consis-
ponent DBMS by local users. The basic tency. Du and Elmagarmid [1989] propose
problem in supporting global concurrency a weaker consistency criterion called Quasi-
control is that the FDBMS does not know Serializability8 that works provided there
about local transactions since a component are no value dependencies (e.g., referential
DBMS is autonomous. That is, local wait- integrity constraints) across databases.
for relationships are known only to the Garcia-Molina and Salem [1987] and
transaction manager of the component Alonso et al. [1987] propose a concept of
DBMS. Without knowledge about local as Sagas that provides semantic atomicity but
well as global transactions, it is highly un- does not serialize execution of global trans-
likely that efficient global concurrency con- actions. More work on weaker consistency
trol can be provided. Due to the existence
of local transactions, it is very difficult to
recognize when the execution order differs ‘A Quasi-Serializable schedule is one in which the
local transaction schedules are serializable and the
from the serialization order at any site [Du global transactions are executed serially. We need to
et al. 19891. Additional complications occur understand the practical significance of this and other
when different component DBMSs and the proposed weaker consistency criteria better.
criteria and a better understanding of their structured data, often called business data.
significance in practical terms is needed. Recently, however, there has been a great
Techniques to specify and execute the deal of activity for constructing DBMSs
transactions that selectively provide ato- that manage data for the so-called nontra-
micity, isolation, and durability properties ditional applications. These applications
need to be further researched. Little atten- require less structured data in the forms
tion has been given to the problems of fault such as text, image, graphics, and voice.
tolerance, ensuring the integrity of redun- Current database research activities, par-
dant data, and supporting integrity con- ticularly related to object-oriented database
straints across component databases. systems, address centralized management
of these types of data. We now need to
investigate issues in integrating such sys-
6. FUTURE RESEARCH AND UNSOLVED
PROBLEMS
tems [Sheth 1987b, 1988131.
The second significant extension is to
We discussed the concept of federation in create information systems that not only
the context of database systems. A feder- include database systems but also applica-
ated database system is a collection of co- tion programs and expert systems. Such a
operating but autonomous and possibly system may be called a federated knowledge
heterogeneous database systems. A refer- base system. One of the main problems is
ence architecture was used to study various to represent information contents, process-
FDBS architectural alternatives and their ing capabilities, and semantics (including
implications. A methodology for developing behavioral aspects) of programs and expert
FDBSs, particularly the tightly coupled systems adequately. We need to define a
FDBSs, was discussed. Finally, we dis- description that plays a role with respect to
cussed important tasks that need to be an application program that is equivalent
performed in order to develop and operate to the role a schema plays for a database.
FDBSs. One such effort is that of a “capability
Several problems need further research schema” for an application program
and development. They include the [Ryan and Larson 19861. Models for fed-
following: eration also need to be developed for such
environments.
Identifying and representing all seman-
tics useful in performing various FDBS ACKNOWLEDGMENTS
tasks such as schema translation and
schema integration and determining con- We thank the referees, S. March, M. Rusinkiewicz,
tents of schemas at various levels. G. Thomas, and M. Templeton, who have helped us
toward the current version; their time and care is
Lack of software tools to aid in perform- gratefully acknowledged.
ing various FDBS tasks with a high de-
gree of automation and an integrated
REFERENCES
toolset for developing, maintaining, and
managing FDBSs. ABBOTT, K., AND MCCARTHY, D. 1988. Admi-
Lack of adequate transaction manage- nistration and autonomy in a replication-
transuarent distributed DBMS. In Proceedinm of
ment algorithms that provide a specified the 14th International Conference on Very L&g;
level of consistency (i.e., are correct with Data Bases (Aug.), pp. 195-205.
respect to a given consistency criteria) ALONSO, R., AND BARBARA, D. 1989. Negotiating
and fault tolerance at acceptable perfor- data access in federated database systems. In
mance within the heterogeneity and au- Proceedings of the 5th International Conference
on Data Engineering (Feb.), pp. 56-65.
tonomy constraints of an FDBS.
ALONSO, R., GARCIA-M• LINA, H., AND SALEM, K.
How to address management and effi- 1987. Concurrency control and recovery for
ciency issues related to autonomy of com- global procedures in federated database systems.
ponent DBSs. In 9. Bull. IEEE-CS TC Data Em.- 10. 3
(Sept.), 5-11.
The focus of the past activities in FDBSs BARKER, K., AND Ozsu, T. 1988. A survey of issues
has been on databases that store more in distributed heterogeneous database systems.
Tech. Rep. TR 88-9, Univ. of Alberta Edmonton, 243, Goos, G. and Hartmanis, J. Eds. Springer-
Canada. Verlag, New York, pp. 141-156.
BATINI, C., LENZERINI, M., AND NAVATHE, S. CZEJDO, B., RUSINKIEWICZ, M., AND EMBLEY, D.
1986. A comparative analysis of methodologies 1987. An approach to schema integration and
for database schema integration. ACM Comput. query formulation in federated database systems.
Suru. 18, 4 (Dec.), 323-364. In Proceedings of the 3rd International Conference
BECK, H., GALA, S., AND NAVATHE, S. 1989. on Data Engineering (Feb.), pp. 477-484.
Classification as a query processing technique in DATE, C. 1986. An Introduction to Database Systems,
CANDIDE semantic data model. In Proceedings Vol. 1, 4th ed. Addison-Wesley, Reading, Mass.
of the 5th International Conference on Data En- DAYAL, U., AND GOODMAN, N. 1982. Query optimi-
gineering (Feb.), pp. 572-581. zation for CODASYL database systems. In Pro-
BELCASTRO, V., ET AL. 1988. An overview of the ceedings of the ACM SZGMOD Conference,
distributed query system D&S. In Proceedings of pp. 13&156.
the International Conference on Extending Data DAYAL, U., AND HWANG, H. 1984. View definition
Base Technolozy (Venice, Italv. Mar.). In Com- and generalization for database integration in a
puter Science.--Vol. 303,’ Sp&ger-V&lag, New multidatabase system. IEEE Trans. Soft. Eng.
York, pp. 170-189. SE-IO, 6 (Nov.), 628-644.
BERTINO, E., AND HAAS, L. 1988. Views and security DE 1987. Special issue on federated database systems
in distributed database management systems. In (mainly transaction management aspects).
Proceedings of the International Conference on Q. Bull. IEEE-CS TC Data Eng. 10, 3 (Sept.).
Extending Database Technology (Venice, Italy, DEMO, B. 1983. Program analysis for conversion
Mar.). In Computer Science. Vol. 303, Springer- from a navigational to a specification data base
Verlag, New York, pp. 155-169. interface. In Proceedings of the 9th International
BLAKEY, M. 1987. Basis of a partially informed dis- Conference on Very Large Data Bases (Florence,
tributed database. In Proceedings of the 13th In- Italy, Oct.), pp. 387-398.
ternational Conference on Very Large Data Bases DEVOR, C., ELMASRI, R., LARSON, J., RAHIMI, S., AND
(Brighton, UK, Sept.), pp. 381-388. RICHARDSON, J. 1982b. Five-schema architec-
BREITBART, Y ., AND SILBERSCHATZ, A. 1988. ture extends DBMS to distributed applications.
Multidatabase update issues. In Proceedings of Electron. Des. (Mar. 18), 27-32.
the ACM SZGMOD Conference (June), 135-142. DEVOR, C., ELMASRI, R., AND RAHIMI, S. 1982a. The
BRZEZINSKI, Z., GETTA, J., RYBNIK, J., AND STEP- design of DDTS: A testbed for reliable distributed
NIEWSKI, W. 1984. UNIBASE-An integrated database management. Tech. Rep. HP-82-273:
access to databases. In Proceedings of the 10th 17-38, Honeywell Computer Sciences Center,
International Conference on Very Large Data Camden, Minn.
Bases (Singapore, Aug.), pp. 388-395. DP 1988. Special issue on heterogeneous distributed
CARDENAS, A. 1987. Heterogeneous distributed da- database systems. L. Lilien, Ed. D&rib. Process.
tabase management: The HD-DBMS. In Proc. Tech. Comm. News. (Q.) 10, 2 (Nov.).
ZEEE 75,5 (May), 588-600. Du, W., AND ELMAGARMID, A. 1989. Quasi serializ-
CERCONE, N., MORGESTERN, M., SHETH, A., AND ability: A correctness criterion for global concur-
LITWIN, W. 1990. Resolving semantic hetero- rency control in interbase. In Proceedings of the
geneity. Panel at the International Conference on 15th International Conference on Very Large Data
Data Engineering (Feb.). Bases (Amsterdam, Aug.), pp. 347-355.
CERI, S., AND PELAGATTI, G. 1984. Distributed Du, W., ELMAGARMID, A., AND KIM, W. 1990.
Databases-Principles and Systems. McGraw- Effects of local autonomy on heterogeneous dis-
Hill, New York. tributed database systems. MCC Tech. Rep.
CERI, S., PERNICI, B., AND WIEDERHOLD, G. 1987. ACT-OODS-EI-059-90, Microelectronics and
Distributed database design methodologies. In Computer Technology Corp., Austin Tex.
Proc. IEEE 75, 5 (May), 533-546. Du, W., ELMAGARMID, A., LEU, Y., AND OSTERMANN,
CHEN, P. 1976. The entity-relationship model: S. 1989. Effects of local autonomy on global
Toward a unified view of data. ACM Trans. Da- concurrency control in heterogeneous database
tabase Syst. 1, 1 (Mar.), 9-36. systems. In Proceedings of the 2nd International
CHEN, A., BRILL, D., TEMPLETON, M., AND Yu, C. Conference on Data and Knowledge Systems for
1989. Distributed query processing in Mermaid: Manufacturing and Engineering (Oct.).
A frontend system for multiple databases. IEEE DWYER, P., AND LARSON, J. 1987. Some experiences
J. Selected Areas Commun. 7, 3 (Apr.), 390-398. with a distributed database testbed system. In
CHU, W., AND HURLEY, P. 1982. Optimal query pro- Proc. IEEE 75, 5 (May), 633-647.
cessing for distributed databases systems. IEEE ELIASSEN, F. AND VEIJALAINEN, J. 1987. An S-
Trans. Comput. C-31 (Sept.), 835-850. transaction definition language and execution
CONVENT, B. 1986. Unsolvable problems related to mechanism. Tech. Rep. No. 275, GMD, Harden-
the view integration approach. In Proceedings of bergplatz, D-1000 Berlin 12, FRG.
the Znternational Conference on Database Theory ELIASSEN, F., AND VEIJALAINEN, J. 1988. A func-
(Rome, Italy, Sept.). In Computer Science, Vol. tional approach to information system interoper-
PELAGATTI, G., PAOLINI, P., AND BRACCHI, G. Meitner Institut, Berlin GmnH, D-1000 Berlin
1978. Mapping external views to common data 39, FRG.
model. Znf Syst. 3. ZANIOLO, C. 1979. Multimode1 external schemas for
RAJINIKANTH, M., JACOBSON, G., LAFOND, C., PAPP, CODASYL data base management systems. In
w., AND PIATETSKY-SHAPIRO, G. 1990. Data Base Architecture, Bracchi, G., and Nijssen,
Multiple database integration in CALIDA: De- G., Eds. North-Holland, The Netherlands.
sign and integration. In Proceedings of the 1st
International Conference on Systems Integration
(Apr.).
REINER, D., BROWN, G., FRIEDELL, M., LEHMAN, J., GLOSSARY
MCKEE, R., RHEINGANS, P., AND ROSENTHAL,
A. 1987. A Database designer’s workbench. In
Entity-Relationship Approach, Spaccapietra, S.,
Accessing Processor: Software that ac-
Ed. Elsevier Science Publishers. New York. cepts commands and produces data by
pp. 347-360. executing commands against a database.
SENKO, M., 1976. DIAM as a detailed example of Class of Users: A set of users performing
the ANSI SPARC architecture. In Modelline in closely related tasks.
Data Base Management Systems, Nijssen, G.,-Ed. Common Data Model (CDM): A data
North-Holland, The Netherlands.
model to which schemas of different com-
SHIPMAN, D. 1981. The functional data model data
language DAPLEX. ACM Trans. Database Svst. ponent DBMSs are translated for the
6, l-(Mar.), 140-173. purpose of representation in a com-
SMITH, J., et al. 1981. Multibase: Integrating heter- mon format and facilitation of schema
ogeneous distributed database systems. In Pro- integration.
ceedings of the National Computer Conference. Component Database Administrator
SPACCAPIETRA, S., DEMO, B., DILEVA, A., PARENT, (component DBA): The administrator
C., CELLIS, C., AND BELFAR, K. 1982. An ap- of a component DBS (also called local
proach to effective heterogeneous database co-
operation. In Distributed Data Sharing Systems, DBA) who, among other things, decides
van de Riet. R., and Litwin. W.. Eds. North- data access rights of local users as well
Holland, The Netherlands, pp: 209-218. as (individuals and/or classes) of federa-
STEPHENSON, G., AND MAIN, R. 1986. ARCHEDDA tion uses. It is the component DBA’s
Prototype. Final Report. CRI, Great Britain. responsibility to manage the local
STOCKER, P., et al. 1984. Proteus: A heterogeneous schema, translate it to create the com-
distributed data-base project. In Databases: Role
and Structure, Gray, P., and Atkinson, M., Eds. ponent schemas, and define export
Cambridge University Press, New York. schemas.
TEMPLETON, M., BRILL, D., CHEN, A., DAO, S., AND Component DBMS: A DBMS participat-
LUND, E. 1986. Mermaid: Experiences with net- ing in a multidatabase system. A com-
work operation. In Proceedings of the 2nd Znter- ponent DBMS participating in an FDBS
national Conference on Data Engineering.
is autonomous and allows local opera-
TEMPLETON, M., BRILL, D., HWANG, A., KAMENY, I.,
AND LUND, E. 1983. An overview of the mer-
tions as well as global (federation) oper-
maid system. A frontend to heterogeneous data- ations that meet its constraints.
bases. In Proceedings of EASCON 83. Component Schema: A translation of a
THOMSON, G. 1987. Multidatabase concurrency local schema into an equivalent schema
control. Ph.D. dissertation, Oklahoma State Uni- in the common data model.
versity. Constructing Processor: Software that
TSUBAKI, M., AND HOTAKA, R. 1980. Distributed partitions and/or replicates operations
multidatabase environment with a supervisory
data dictionary database. In Entity-Relationship produced by a single processor for exe-
Approach to System Analysis and Design, Chen cution by two or more processors. Also
P., Ed. North-Holland, The Netherlands. software that merges data produced by
VASSILIOU, Y., AND LOCHOVSKY, F. 1980. DBMS two or more processors for consumption
transaction translation. In Proceedings COMP- by another processor.
SAC 80, IEEE Computer Software and Applica-
tion Conference.
Database Management System
VEIJALAINEN, J., AND POPESCU-ZELETIN, R. 1986.
(DBMS): Software that manages a col-
On multi-database transactions in a cooperative, lection of structured data. Management
autonomous environment. Tech. Rep., Hahn- includes providing data management
services including data access, constraint or data that can be passed to another
enforcement, and consistency manage- processor.
ment. Global (or External) Operation: An op-
Database System (DBS): A database eration that is not submitted by a local
system (DBS) consists of a DBMS that user (e.g., an operation submitted to the
manages one or more databases. component DBS by the FDBMS or an-
Distributed DBMS: A system that man- other component DBS).
ages multiple databases. Local Schema: A schema of a component
Export Schema: A subschema of a com- DBS in a native data model of the com-
ponent schema specifically defined to ex- ponent DBMS.
press the access constraints on a class of Local User: A user of a component DBS.
federation users to access the data rep- Multidatabase System (MDBS): A sys-
resented in the component schema. tem that allows operations on multiple
External Schema: A subschema or a databases.
view defined over a federated schema Processor: Software that performs oper-
primarily for a pragmatic reason of not ations on data and commands.
having to define too many federated Schema Object: A description of a data
schemas or to tailor a federated schema element in a schema. For example, a
for smaller groups of federation users schema expressed in the Entity Relation-
than that of a federated schema. ship model has three types of schema
Federated Database System (FDBS): objects: entity types, relationship types,
A system that is created to provide op- and attribute definitions. An example of
erations on databases managed by auton- an entity type schema object is an “Em-
omous, and possibly heterogeneous, ployee” entity type.
DBSs. The software that manages an Schema, Subschema, and View: A rep-
FDBS is called a federated DBMS resentation of the structure (syntax), se-
(FDBMS). mantics, and constraints on the use of a
Federated Schema: An integration of database (or its portion) in a particular
several export schemas created to repre- data model. A schema is a collection of
sent data access requirements and rights schema objects. A subschema is a collec-
of a class or group of federation users. tion of subsets of that schema’s objects.
Federation Administrator (federation A view is any connected portion of a
DBA): The administrator of a federated schema. In other words, a schema is a
schema or the federated database sys- collection of views.
tem who, among other things, creates Transforming Processor: Software that
and maintains federated and external translates commands from one language
schemas. or format to another language or format
Federation User: A user of an FDBS or or translates data from one format to
an application running over an FDBS. another.
Filtering Processor: Software that con- User: An individual or an application us-
strains operations that can be applied ing a database system.