Professional Documents
Culture Documents
The Entity-Relationship
Approach II: Methods
8.1 Introduction
There is no single method for ER modeling, but there are many heuristics proposed by
many different researchers. In this chapter, we gather a number of these heuristics and
group them into two classes: heuristics for finding an ER model (section 8.2) and heuristics
for evaluating an ER model (section 8.3). In section 8.4, we show how an ER model of a
database system can be transformed into a relational database schema. ER models can be
made of a computer-based system or of its UoD. Because the first can only be found by
way of the second, we restrict ourselves to finding an ER model of the UoD of a system.
Section 8.5 contains a methodological analysis of the heuristics discussed in this chapter.
discuss the first of the two methods above. Pointers to the literature on the other methods
are given in section 8.8.
(2) A 40-year old person works on a project with project number 2175 for 20% of his time
leads to the ER diagram in figure 8.4. We used some background knowledge that we share
with whoever produced sentence (2): we know that persons have a birth date and that
the age of a person is the number of years that the birth year is past. Similarly, some
shared background knowledge about percentages and peoples ages was used to define the
codomains of age and percentage_of_time. This shared background knowledge is part of
an implicit conceptual model of the UoD shared with the people who produced sentence
(2).
Using our background knowledge, we also assumed that all existing projects are identified
by their project number. This is represented in the diagram by the cardinality constraint
{0,1}of the project_number attribute. We have not attached a cardinality constraint to an
attribute before, but the meaning of this is clear: each natural number is a project number
of at most one existing PROJECT instance. We must verify with the domain specialists
whether this is true, and whether project numbers of past projects can be reused.
An advantage of natural language analysis is that it is simple. A disadvantage is that
it may lead to many irrelevant ER entity types and relationships. For example, sentence
(3) may very well occur in a statement of information needs or in some other document
describing the UoD:
(3) The database system shall register members of the library.
Applying natural language analysis, we find the following elementary sentences:
The database system registers members.
The library has members.
This gives the diagram shown in figure 8.5. Obviously, this represents irrelevant structures
for almost any database system that is to be used by the library. The library has no
database system that contains data about which members it registered in which database
system, or about which library a person is a member of. Only if we wanted to make, say, a
national database system that registers data about all libraries in the country, the database
systems they use, and the library members registered in those database systems, would this
diagram be relevant.
As a value, an address does not have a state. It could be modeled as a value of type
STRING, containing a zip code, street name etc. as substrings.
The difference between the two representations is that as an ER entity, one address could
have had other values for its attributes zip_code, street_name, etc. If a government were to
assign new zip codes to addresses, then this can be modeled without modeling at the same
time that everybody is changing address. It is merely the states of addresses that would
change, not the addresses themselves. This would not be possible if address were modeled
as a value. Changing the zip code of an address is then strictly not possible: we can only
replace one STRING representing an address by another that differs from the first one
only in its zip code.
Another way to say the same thing is that ER entities have an identity. Saying that
something has a state that could have been different, is tantamount to saying that there is
something that remains identical to itself under changes of state. That is, ER entities have
an identity that remains invariant under state change.
Taking yet another view on the same difference, ER entities can initiate or suffer events
(which may change the state of the entity), but values cannot. This means that ER entities
can interact with other ER entities. An address modeled as an ER entity will be able to
participate in an interaction like observe_zip_code with an observer who reads the zip code,
and it can engage in interactions like change_zip_code and change_street_name, in which
the zip code or street name is changed. Since interaction is regarded in this book as the
same thing as observation, we can also say that ER entities can be observed and values
cannot which is their definition to begin with.
But when ER entities have a state, they potentially have a history. An address modeled
as an ER entity can suffer the event change_zip_code but an address modeled as a STRING
cannot. If we model an address as an ER entity,then we can model its history as the trace
of events which occurred in its life. This difference is illustrated in figure 8.15, using a
salary attribute as example. In one representation, salary is an attribute that can have
different values at different times. The history of these values is not represented. In the
second representation, salary is an entity type, because it has a history. We can represent
the salary the employee had in different periods of time. Note, incidentally, that MONEY
is a value type in both representations.
8.3. METHODS TO EVALUATE AN ER MODEL 179
contains both copies of this instance at the same time, for it is a set. That is, the
set {borrower : m, document: d, borrower : m, document: d} is equal to the set
{borrower:m, document:d}, even if both links existed at different times (and therefore
have a different borrowing_date attribute). We cannot therefore represent the history of
the link in an ER diagram.
One way to be able to represent the history of an instance is to model it as an entity
type, as shown in the lower diagram. In this case, a LOAN instance has an identity that
is independent from its state and different from all other LOAN instance identities. The
elements of an existence set of LOAN are identifiersl1, l2, ...,some of which may have
the same borrower and document attribute values. Those with the same borrower and
document attribute values are historical occurrences of the same link. This representation
has, however, the disadvantage that we can now have two different instances of LOAN(with
different identifiers) with the same borrowing_date. This situation is correctly excluded by
the relationship representation. In addition, we cannot count the number of existing loans
by counting the members of the existence set of LOAN, because some members of this
existence set may represent loan links that existed in the past. The identity information of
different occurrences of one borrowing link is therefore not properly represented.
Another possible way to represent a historical link is to represent LOAN as a ternary
relationship between MEMBER, DOCUMENT and DATE. The borrowing_date of the
LOAN instance is now a component rather than an attribute. This means that the identifier
Of LOAN instances has the form
Weak entities
In many texts on the Entity-Relationship approach the concept of aweak entity is used.
Roughly speaking, this is an entity that must be identified by a key of another entity concatenated with some of its own attributes. The archetypical example is shown in figure 8.19.
Each employee has key emp# and each employee has zero or more children, who are identified, per employee, by an attribute dependent#. If we want to identify a child in the
existence set of CHILD, we must therefore concatenate the value of emp# with the value
of dependent#.
Each weak entity type always has a parent type that must supply part of its identifying
information. The relationship from parent type to weak entity type is always a master-detail
relationship. In the days of sequential storage media (magnetic tapes), weak entities were
implemented as repeating groups within their parent record; this explains the use of a
local key, that is only unique within the parent record. In a hierarchical data model,
weak entities are descendants of their parent entity.
Obviously, the concept of a weak entity has a meaning in the implementation of a
software system but not in a model of the UoD. Children have their own identity that can
be represented in a UoD model by their own identifier, that does not contain an identifier
of any other ER entity. In this book, we will therefore not use weak entity types.
These are not decisive arguments for dropping derivable relationships from an ER diagram. In special cases, a diagram may become more understandable if we include a derivable
relationship. The domain specialists, users or sponsor or even the law may prescribe inclusion of a redundant relationship in an ER model.
8.3.9 Cross-checking
One way to check the mutual consistency of different parts of the model is to perform crosschecking. Cross-checking consists in traveling in different directions through the map of
induction methods shown in figure 8.1, and checking if we get mutually consistent results.
For example, we can perform natural language analysis starting from a piece of text and,
independently, do an entity analysis of transactions starting from a function decomposition
tree. The resulting models are likely not to be identical, and we can improve the quality of
both by merging them in the way described in the section on view integration below. The
result should be consistent with the natural language analysis as well as with the transaction
analysis.
Similarly, we can independently perform an entity analysis of transactions starting from
a natural language analysis of transactions, and check whether the results are the same.
For example, in a natural language analysis of transactions of the student administration
system, we should have found the following elementary sentence for the transaction List
Course Participants:
List students who are enrolled in a course.
This elementary sentence is consistent with the description of the transaction in figure 8.7.
Yet another useful way to cross-check is to travel the map against the direction of
the arrows. Starting from an ER diagram, we can describe its contents in elementary
sentences and check whether these describe the transactions in the function decomposition
tree. Alternatively, starting from an ER diagram, we can ask for each entity and relationship
what happens to it. When is an instance created, read, updated or deleted? This gives
us a list of relevant transactions, that should be identical to the list of transactions in the
function decomposition tree we started with.
8.6 Summary
Two methods for discovering ER models from a set of observation data were treated: natural
language analysis and entity analysis of transactions. Natural language analysis starts from
elementary sentences describing the UoD and translates these sentences into ER diagrams.
This may yield many irrelevant entity types and relationships. To focus on relevant entity
types and relationships, we can start with the required system transactions, including the
queries that the system must be able to answer, and describe these in elementary sentences.
Because the required transactions manipulate data that has a meaning in the UoD, these
sentences are elementary descriptions of the UoD. Natural language analysis of transactions
results in an ER model of the UoD. Query analysis a special case of this.
A different method is entity analysis of transactions, in which we, for each required
transaction, ask what data are created, read, updated or deleted. These data have a meaning
in the UoD, and if we make an ER model of them, we have a model of the UoD.
Several groups of methods were discussed to evaluate ER models. Entities have an identity and a state, values have no state (entity/value check). Entities have a simple identity,
links have a compound identity (entity/link check). Each specialization instance is identical to a
generalization instance (is_a check). Absent creation or deletion transactions for
an entity type or relationship may indicate an incomplete model (creation/deletion check).
Another way to check the completeness of the model is by means of cross-checking and view
integration. The model is simplified if we take care that all relationships are minimal (minimal arity check) and if we remove derivable relationships (derivable relationship check). By
means of the elementary sentence check and population check, we can validate the truth
value of the model with domain specialists. This is a simple way to do empirical tests of
the model. The navigation check can tell us whether the system will be able to perform the
required functions.
Transformation of an ER model into a relational database schema brings us to the level
of software subsystems. This represents a decomposition of entity types and relationships
into attributes. Performance considerations can cause us to factor out groups of attributes
into separate relation schemas.
ER models can be used to model the UoD ofa business, of a computer-based system
or of a software system. All these systems may have the same UoD, because the UoD is
not part of the aggregation hierarchy of computer-based systems. Each UoD has its own
aggregation hierarchy, about which we make no assumptions in this book. ER models can
be used to represent the state space of the UoD or of a system under development. As a
UoD model, it has a descriptive orientation and the empirical cycle of discovery must be
followed. As a system specification, it has a prescriptive orientation and the engineering
cycle of product development must be followed.
8.7 Exercises
1. One of the following constraints can be expressed as a cardinality constraint in figure 8.24, the other cannot. Add the representable one to the diagram.
Each client buys at least one part at a shop.
Each client buys at most two parts in one shop.