Professional Documents
Culture Documents
Introduction, Features
Databases Are
Collections of information
Created with logical structures
With logical ties within the information
With built-in integrity constraints
Databases Collections of
Information
Databases have many tables
Consider Solomon Enterprises that provides
concrete to home and commercial builders.
Tables or files include:
Order
Customer
Concrete Type
Employee
Truck
Databases Collections of
Information
3-5
DATABASE MANAGEMENT
SYSTEM TOOLS
Database management system (DBMS)
helps you specify the logical organization for a
databases and access and use the information
within a database
Word processing software = document
Spreadsheet software = workbook
DBMS software = database
DATABASE MANAGEMENT
SYSTEM TOOLS
5 software components:
1.
2.
3.
4.
5.
DBMS engine
Data definition subsystem
Data manipulation subsystem
Application generation subsystem
Data administration subsystem
DATABASE MANAGEMENT
SYSTEM TOOLS
DBMS Engine
DBMS engine accepts logical requests from
the various other DBMS subsystems, converts
them into their physical equivalent, and
actually accesses the database and data
dictionary as they exist on a storage device
DBMS engine separates the logical from the
physical
DBMS Engine
Physical view how information is physically
arranged, stored, and accessed on some type
of storage device
Logical view how you as a knowledge
worker need to arrange and access
information
With a database, you only concern yourself
with your logical view
Views
View allows you to see the contents of a
database file
Make whatever changes you want
Perform simple sorting
Query to find the location of information
Looks similar to a workbook with no row numbers
Views
Report Generators
Report generator helps you quickly define
formats of reports and what information you
want to see in a report
You can save report formats and generate
reports at any time with up-to-date
information
Report Generators
Report Generators
QBE Tools
Query-by-example (QBE) tool helps you
graphically design the answer to a question
What driver most often delivers concrete to
Triple A Homes?
QBE Tools
3-24
SQL
Structured query language (SQL)
standardized fourth-generation language
found in most DBMSs
Performs the same task as a QBE tool
But uses a sentence structure instead of pointand-click interface
Application Generation
Subsystem
Application generation subsystem contains
facilities to help you develop transactionintensive applications
Data entry screen (called forms)
Programming languages
Security management
Who has access to what information
Who can perform certain tasks (e.g., add, change,
or delete) on information
Concurrency control
What happens if two people makes changes to the
same information at the same time?
Multidimensional
Rows and columns
Also layers
Many times called hypercubes
What are the dimensions in Figure 3.8 on page
142?
Query-And-Reporting Tools
Query-and-reporting tools similar to QBE
tools, SQL, and report generators in the typical
database environment
Intelligent Agents
Use various artificial intelligence tools such as
neural networks and fuzzy logic to form the
basis for information discovery and building
business intelligence
Help you find hidden patterns in information
Chapter 4 focuses more on these
Multidimensional Analysis
Tools
Multidimensional analysis (MDA) tools
slice-and-dice techniques that allow you to
view multidimensional information from
different perspectives
Bring new layers to the front
Reorganize rows and columns
Statistical Tools
Help you apply various mathematical models
to the information stored in a data warehouse
to discover new information
Regression
Analysis of variance
And so on
Data management
Database
Metadata
Database management system (DBMS)
Computer
Programs
Data files
Operating systems
Simultaneous playback
Analysis, prediction, decision-making queries
Transaction granularity
Historical data, decay
Security and privacy
Centralized vs. distributed
General Concepts
Database definition
Organized collection of logically related data
Data
Known facts
Types: text, graphics, images, sound, videos
Database
What is a database ?
A collection of files storing related data
Database Examples
Class roster
Hospital patients
Literature (published articles in a certain field)
Genomic information
Protein structure
Taxonomy
Single nucleotide polymorphism
Typical number of
users
Typical
architecture
Typical size
Personal
Desktop/Laptop/
PDA
MB
Workgroup
5-25
Department
25-100
Client/server:3 tier GB
Enterprise
>100
Client/server:
distributed
GB-TB
Internet
>1000
MB-GB
Flat Files
Characteristics:
Data is stored as records in regular files
Records usually have a simple structure and fixed
number of fields
For fast access may support indexing of fields in the
records
No mechanisms for relating data between files
One needs special programs in order to access and
manipulate the data
Relational Database
Characteristics:
Data is organized into tables: rows & columns
Each row represents an instance of an entity
Each column represents an attribute of an entity
Metadata describes each table column
Relationships between entities are represented by
values stored in the columns of the corresponding
tables (keys)
Accessible through Standard Query Language (SQL)
1
Organism
m
Gene
Relational
OO
Hybrid (Object-Relational)
Temporal
Deductive
Others
Spatial,
OO
Complex data reps
multivalued, composite
Temporal
Relational model: add valid start, end dates to each table
(versions of info and when valid)
Includes time, events, durations
Operations
DDL/DML (data def/manip languages)
SQL
OQL
Update operations
Built-in insert, delete, update
Stored procedures for triggers, active (ECA) rules
Active DB
Event-Condition-Action rules
Allow for decisions to be made in the database instead
of a separate application
Relational
Implemented as triggers
Challenges
Rule consistency
(2+ rules do not contradict)
Guaranteed termination
Trigger loops (T1 <->T2)
Metadata
Data that describes the properties or
characteristics of other data
Does not include sample data
Allows database designers and users to
understand the meaning of the data
Type
Max Length
Description
Name
Alphanumeric
100
Organism name
Size
Integer
10
Gc
Float
Percent GC
Accession
Alphanumeric
10
Accession number
Release
Date
Release date
Center
Alphanumeric
100
Sequence
Alphanumeric
Variable
Sequence
Name
Size
Gc
Accession
Release
Center
Sequence
4,640,000
50
NC_000913
09/05/1997
Univ.
Wisconsin
AGCTTTTC
ATT
Streptococcus
pneumoniae R6
2,040,000
40
NC_003098
09/07/2001
TTGAAAGA
AAA
Type
Max Length
Description
Name
Alphanumeric
100
Gene name
Accession
Alphanumeric
10
OAccesion
Alphanumeric
10
Start
Integer
10
Gene start
End
Integer
10
Gene end
Strand
Character
Gene strand
Product
Alphanumeric
1000
Gene annotation
Sequence
Alphanumeric
Variable
Gene sequence
Name
Accession
OAccession
Start
End
Strand
Product
Sequence
thrL
16127995
NC_000913
190
255
MKRI
thrA
16127996
NC_000913
337
2799
homoserine
dehydrogenase I
MRVL
transposas
e_A
15902058
NC_003098
20207
20554
transposase
MWYN
Relationships
Database Systems
Types of Database Systems
Number of Users
Single-user
Desktop database
Multiuser
Workgroup database
Enterprise database
Scope
Desktop
Workgroup
Enterprise
Database Systems
Types of Database Systems
Location
Centralized
Distributed
Use
Transactional (Production)
Decision support
Data warehouse
Database
A Database is a collection of stored
operational data used by the application
systems of some particular enterprise (C.J.
Date)
Paper Databases
Still contain a large portion of the worlds knowledge
Why DBMS?
History
50s and 60s all applications were custom built for
particular needs
File based
Many similar/duplicative applications dealing with
collections of business data
Early DBMS were extensions of programming
languages
1970 - E.F. Codd and the Relational Model
1979 - Ashton-Tate and first Microcomputer DBMS
File
Toys
Addresses
Naughty
Nice Toys
DBMS Benefits
Database Environment
CASE
Tools
Repository
User
Interface
DBMS
Application
Programs
Database
Database Components
DBMS
===============
Design tools
Database
Database contains:
Users Data
Metadata
Indexes
Application Metadata
Table Creation
Form Creation
Query Creation
Report Creation
Procedural
language
compiler (4GL)
=============
Run time
Form processor
Query processor
Report Writer
Language Run time
Application
Programs
User
Interface
Applications
PC databases
Centralized database
Client/server databases
Distributed databases
Database models
PC Databases
E.g.:
Access
FoxPro
Dbase
Etc.
Centralized Databases
Central
Computer
Client
Network
Database
Server
Client
Distributed Databases
Location C
Location B
computer
computer
computer
Location A
Homogeneous
Databases
Distributed Databases
Client
Heterogeneous
Or Federated
Databases
Database
Server
Remote
Comp.
Local Network
Comm
Server
Client
Remote
Comp.
Figure 1.2
Historical Roots
Why Study File Systems?
It provides historical perspective.
It teaches lessons to avoid pitfalls of data management.
Its simple characteristics facilitate understanding of the design
complexity of a database.
It provides useful knowledge for converting a file system to a
database system.
Raw facts that have little meaning unless they have been
organized in some logical manner. The smallest piece of data
that can be recognized by the computer is a single
character, such as the letter A, the number 5, or some
symbol such as; ? > * +. A single character requires one
byte of computer storage.
Field
Record
File
DBMS
Software package for defining and managing a
database.
Examples:
Proprietary: MS Access, MS SQL Server, DB2,
Oracle, Sybase
Open source: MySql, PostgreSQL
DBMS Advantages
Program-data independence
Minimal data redundancy
Improved data consistency & quality
Access control
Transaction control
Web Databases
Data is accessible through Internet
Have different underlying database models
Example: biological databases
Molecular data: NCBI , Swissprot , PDB , GO
Protein interaction : DIP , BIND
Organism specific: Mouse , Worm, Yeast
Literature: Pubmed
Disease
Possible Organizations
Files
Spreadsheets
DBMS
Yes, but
Spreadsheets
Not really
DBMS
Yes
2. Search/Query/Update
Files
Spreadsheets
Simple queries;
Simple updates
DBMS
All
Updates: generally OK
Very hard
Spreadsheets
Yes
DBMS
Yes
4. Concurrent Access
Multiple users access/update the data
concurrently
Lost updates; inconsistent reads,
locks
Y = Read(Account, #7199);
Y.amount = Y.amount + 100;
Write(Account, #7199, Y);
6. Security
Files
File-level
access control
Spreadsheets
Same [?]
DBMS
Table/attributelevel access control
124
Enters a DMBS
Two tier system or client-server
connection
(ODBC, JDBC)
Database server
(someone elses
C program)
Data files
Applications
125
Data Independence
Logical view
Directors:
Movie_Directors:
id
fName
lName
id
mid
15901
Francis Ford
Coppola
15901
130128
...
...
Movies:
mid
Title
Year
130128
The Godfather
1972
...
Directors_file
Moviews_title_index_file
Directors_fname_index_file
Movies_file
Physical view
126
SQL DML
SQL DDL
Transactions
ACID
Database Systems
Commercial
DB2
Empress
Informix
Oracle
MS Access
MS SQL
Sybase
Free
Berkeley DB
PostgreSQL
MySQL
Relational Database
Definition:
Data stored in tables that are associated by shared
attributes (keys).
Any data element (or entity) can be found in the
database through the name of the table, the
attribute name, and the value of the primary key.
Database Tables
Tables represent entities
Tables are always named in the singular, such
as: Vehicle, Order, Grade, etc.
Tables in database jargon are flat files, dBase
or Spreadsheet like..
Attributes
Characteristics of an entity
Examples:
Vehicle (VIN, color, make, model, mileage)
Student (SSN, Fname, Lname, Address)
Fishing License (Type, Start_date, End_date)
Database Views
A View is an individuals picture of a database.
It can be composed of many tables,
unbeknownst to the user.
Its a simplification of a complex data model
It provides a measure of database security
Views are useful, primarily for READ-only users
and are not always safe for CREATE, UPDATE, and
DELETE.
Table Indexing
An Index is a means of expediting the retrieval
of data.
Indexes are built on a column(s).
Indexes occupy disk space; occasionally a lot.
Indexes arent technically necessary for
operation and must be maintained by the
database administrator.
Types of Keys
PRIMARY KEY
Serves as the row level addressing mechanism in the relational
database model.
It can be formed through the combination of several items.
FOREIGN KEY
A column or set of columns within a table that are required to
match those of a primary key of a second table.
Address
Parcel #
John Smith
18 Lawyers Dr. 756554
T. Brown 14 Summers Tr. 887419
Table B
Parcel # Assessed Value
887419 152,000
446397 100,000
Database Keys
Primary Key - Indicates uniqueness within
records or rows in a table.
Foreign Key - the primary key from another
table, this is the only way join relationships
can be established.
There may also be alternate or secondary keys
within a table.
Entity Integrity
Entity integrity deals
with within-entity
rules.
These rules deal with
ranges and the
permission of null
values in attributes or
possibly between
records
Enforcing Integrity
Not a trivial task!
Not all database management systems or GIS
software enable users to enforce data
integrity during attribute entry or edit
sessions.
Therefore, the programmer or the Database
Administrator must enforce and/or check for
Integrity.
Referential Integrity
Referential integrity concerns two or more
tables that are related.
Example: IF table A contains a foreign key that
matches the primary key of table B
THEN
values of this foreign key either match the
value of the primary key for a row in table B or
must be null.
Functions of a Database
Management System
CRUD
Four basic functions, for a given entity they
should all be performed with few exceptions,
in your system:
CREATE
READ
UPDATE
DELETE
Normalization
Normalization: a process for analyzing the
design of a relational database
Database Design - Arrangement of attributes into
entities
Some Disadvantage of
Database Processing
Greater Complexity
Possibly a greater impact of a failure
Recovery is more difficult
Although these are all debated issues,
opportunities for complete failure are often
reduced with the latest database products,
but reliability results in higher investment
costs.