You are on page 1of 92

Getting Started With Oracle

• Overview
• Logging In to Oracle
• Changing Your Password
• Creating a Table
• Creating a Table With a Primary Key
• Inserting Tuples
• Getting the Value of a Relation
• Getting Rid of Your Tables
• Getting Information About Your Database
• Quitting sqlplus
• Executing SQL From a File
• Editing Commands in the Buffer
• Recording Your Session
• Help Facilities

Overview
You will be using the Oracle database system to implement your PDA (Personal Database
Application) this quarter. Important: As soon as your Oracle account is set up, you should log
in to change the initial password.

Logging In to Oracle
You should be logged onto one of the Leland Systems Sun Solaris machines. These machines
include elaine, saga, myth, fable, and tree.
Before using Oracle, execute the following line in your shell to set up the correct environment
variables:
source /afs/ir/class/cs145/all.env
You may wish to put this line in your shell initialization file instead (for example, .cshrc).
Now, you can log in to Oracle by typing:
sqlplus <yourName>
Here, sqlplus is Oracle's generic SQL interface. <yourName> refers to your leland login.
You will be prompted for your password. This password is initially changemesoon and must be
changed as soon as possible. For security reasons, we suggest that you not use your regular
leland password, because as we shall see there are opportunities for this password to become
visible under certain circumstances. After you enter the correct password, you should receive the
prompt
SQL>

Changing Your Password


In response to the SQL> prompt, type
ALTER USER <yourName> IDENTIFIED BY <newPassword>;
where <yourName> is again your leland login, and <newPassword> is the password you would
like to use in the future. This command, like all other SQL commands, should be terminated with
a semicolon.
Note that SQL is completely case-insensitive. Once you are in sqlplus, you can use capitals or
not in keywords like ALTER; Even your password is case insensitive. We tend to capitalize
keywords and not other things.

Creating a Table
In sqlplus we can execute any SQL command. One simple type of command creates a table
(relation). The form is
CREATE TABLE <tableName> (
<list of attributes and their types>
);
You may enter text on one line or on several lines. If your command runs over several lines, you
will be prompted with line numbers until you type the semicolon that ends any command.
(Warning: An empty line terminates the command but does not execute it; see Editing
Commands in the Buffer.) An example table-creation command is:
CREATE TABLE test (
i int,
s char(10)
);
This command creates a table named test with two attributes. The first, named i, is an integer,
and the second, named s, is a character string of length (up to) 10.

Creating a Table With a Primary Key


To create a table that declares attribute a to be a primary key:
CREATE TABLE <tableName> (..., a <type> PRIMARY KEY, b, ...);
To create a table that declares the set of attributes (a,b,c) to be a primary key:
CREATE TABLE <tableName> (<attrs and their types>, PRIMARY KEY (a,b,c));

Inserting Tuples
Having created a table, we can insert tuples into it. The simplest way to insert is with the INSERT
command:
INSERT INTO <tableName>
VALUES( <list of values for attributes, in order> );
For instance, we can insert the tuple (10, 'foobar') into relation test by
INSERT INTO test VALUES(10, 'foobar');

Getting the Value of a Relation


We can see the tuples in a relation with the command:
SELECT *
FROM <tableName>;
For instance, after the above create and insert statements, the command
SELECT * FROM test;
produces the result
I S
---------- ----------
10 foobar

Getting Rid of Your Tables


To remove a table from your database, execute
DROP TABLE <tableName>;
We suggest you execute
DROP TABLE test;
after trying out this sequence of commands to avoid leaving a lot of garbage around that will be
still there the next time you use the Oracle system.

Getting Information About Your Database


The system keeps information about your own database in certain system tables. The most
important for now is USER_TABLES. You can recall the names of your tables by issuing the query:
SELECT TABLE_NAME
FROM USER_TABLES;
More information about your tables is available from USER_TABLES. To see all the attributes of
USER_TABLES, try:
SELECT *
FROM USER_TABLES;
It is also possible to recall the attributes of a table once you know its name. Issue the command:
DESCRIBE <tableName>;
to learn about the attributes of relation <tableName>.

Quitting sqlplus
To leave sqlplus, type
quit;
in response to the SQL> prompt.

Executing SQL From a File


Instead of executing SQL commands typed at a terminal, it is often more convenient to type the
SQL command(s) into a file and cause the file to be executed.
To run the file foo.sql, type:
@foo
sqlplus assumes by default the file extension ".sql" if there is no extension. So you could have
entered @foo.sql at the SQL> prompt, but if you wanted to execute the file bar.txt, you would
have to enter @bar.txt.
You can also run a file at connection by using a special form on the Unix command line. The
form of the command is:
sqlplus <yourName>/<yourPassword> @<fileName>
For instance, if user sally, whose password is etaoinshrdlu, wishes to execute the file
foo.sql, then she would say:
sqlplus sally/etaoinshrdlu @foo
Notice that this mode presents a risk that sally's password will be discovered, so it should be
used carefully.
NOTE: If you are getting an error of the form "Input truncated to 2 characters" when you try to
run your file, try putting an empty line at the bottom of your .sql file. This seems to make the
error go away.

Editing Commands in the Buffer


If you end a command without a semicolon, but with an empty new line, the command goes into
a buffer. You may execute the command in the buffer by either the command RUN or a single
slash (/).
You may also edit the command in the buffer before you execute it. Here are some useful editing
commands. They are shown in upper case but may be either upper or lower.
LIST lists the command buffer, and makes the last line in the buffer the "current" line
LIST n prints line n of the command buffer, and makes line n the current line
LIST m n prints lines m through n, and makes line n the current line
INPUT
enters a mode that allows you to input text following the current line; you must
terminate the sequence of new lines with a pair of "returns"
CHANGE
/old/new replaces the text "old" by "new" in the current line
APPEND text appends "text" to the end of the current line
DEL deletes the current line
All of these commands may be executed by entering the first letter or any other prefix of the
command except for the DEL command.
An alternative is to edit the file where your SQL is kept directly from sqlplus. If you say
edit foo.sql
the file foo.sql will be passed to an editor of your choice. The default is vi. However, you may
say
DEFINE _EDITOR = "emacs"
if you prefer to use the emacs editor; other editor choices may be called for in the analogous
way. In fact, if you would like to make emacs your default editor, there is a login file that you
may create in the directory from which you call sqlplus. Put in the file called login.sql the
above editor-defining command, or any other commands you would like executed every time
you call sqlplus.

Recording Your Session


There are several methods for creating a typescript to turn in for your programming assignments.
The most primitive way is to cut and paste your terminal output and save it in a file (if you have
windowing capabilities). Another method is to use the Unix command script to record the
terminal interaction. The script command records everything printed on your screen. The
syntax for the command is
script [ -a ] [ filename ]
The record is written to filename. If no file name is given, the record is saved in the file
typescript. The -a option allows you to append the session record to filename, rather than
overwrite it. To end the recording, type
exit
For more information on how to run the script command, check out its man page.
sqlplus provides the command spool to save query results to a file. At the SQL> prompt, you
say:
spool foo;
and a file called foo.lst
will appear in your current directory and will record all user input and
system output, until you exit sqlplus or type:
spool off;
Note that if the file foo.lst existed previously, it will be overwritten, not appended.
Finally, if you use Emacs, you can simply run sqlplus in a shell buffer and save the buffer to a
file. To prevent your Oracle password from being echoed in the Emacs buffer, add the following
lines to your .emacs file:
(setq-default
comint-output-filter-functions
'(comint-watch-for-password-prompt))
(setq
comint-password-prompt-regexp
"\\(\\([Oo]ld \\|[Nn]ew \\|^\\)[Pp]assword\\|Enter password\\):\\s *\\'")

Help Facilities
SQL*Plus provides internal help facilities for SQL*Plus commands. No help is provided for
standard SQL keywords. To see a list of commands for which help is available, type help
topics or help index in response to the SQL> prompt. To then look up help for a particular
keyword (listed in the index), type help followed by the keyword. For example, typing help
accept will print out the syntax for the accept command.
The output from help, and in general, the results of many SQL commands, can be too long to
display on a screen. You can use
set pause on;
to activate the paging feature. When this feature is activated, output will pause at the end of each
screen until you hit the "return" key. To turn this feature off, use
set pause off;

This document was written originally for Prof. Jeff Ullman's CS145 class in Autumn, 1997; revised by Jun Yang for Prof. Jennifer Widom's CS145 class in
Spring, 1998; further revisions by Jeff Ullman, Autumn, 1998; further revisions by Jennifer Widom, Spring 2000; further revisions by Nathan Folkert,
Spring 2001; further revisions by Jim Zhuang, Summer 2005.

Using the Oracle Bulk Loader


• Overview
• Creating the Control File
• Creating the Data File
• Loading Your Data
• Loading Without a Separate Data File
• Loading DATE Data
• Loading Long Strings
• Entering NULL Values

Overview
To use the Oracle bulk loader, you need a control file, which specifies how data
should be loaded into the database; and a data file, which specifies what data
should be loaded. You will learn how to create these files in turn.

Creating the Control File


A simple control file has the following form:

LOAD DATA
INFILE <dataFile>
APPEND INTO TABLE <tableName>
FIELDS TERMINATED BY '<separator>'
(<list of all attribute names to load>)
• <dataFile> is the name of the data file. If you did not give a file name
extension for <dataFile>, Oracle will assume the default extension ".dat".
Therefore, it is a good idea to name every data file with an extension, and
specify the complete file name with the extension.
• <tableName> is the name of the table to which data will be loaded. Of course,
it should have been created already before the bulk load operation.
• The optional keyword APPEND says that data will be appended to <tableName>.
If APPEND is omitted, the table must be empty before the bulk load operation
or else an error will occur.
• <separator> specifies the field separator for your data file. This can be any
string. It is a good idea to use a string that you know will never appear in the
data, so the separator will not be confused with data fields.
• Finally, list the names of attributes of <tableName> that are set by your data
file, separated by commas and enclosed in parentheses. This list need not be
the complete list of attributes in the actual schema of the table, nor must it
be arranged in the same order as the attributes when the table was created
-- sqlldr will match attributes to by their names in the table schema. Any
attributes unspecified in the list of attributes will be set to NULL.
As a concrete example, here are the contents of a control file test.ctl:
LOAD DATA
INFILE test.dat
INTO TABLE test
FIELDS TERMINATED BY '|'
(i, s)

Creating the Data File


Each line in the data file specifies one tuple to be loaded into <tableName>. It lists, in
order, values for the attributes in the list specified in the control file, separated by
<separator>. As a concrete example, test.dat might look like:

1|foo
2|bar
3| baz
Recall that the attribute list of test specified in test.ctl is (i, s), where i has the
type int, and s has the type char(10). As the result of loading test.dat, the
following tuples are inserted into test:

(1, 'foo')
(2, 'bar')
(3, ' baz')

Some Notes of Warning


• Note that the third line of test.dat has a blank after "|". This blank is not
ignored by the loader. The value to be loaded for attribute s is ' baz', a four-
character string with a leading blank. It is a common mistake to assume that
'baz', a three-character string with no leading blank, will be loaded instead.
This can lead to some very frustrating problems that you will not notice until
you try to query your loaded data, because ' baz' and 'baz' are different
strings.
• Oracle literally considers every single line to be one tuple, even an empty
line! When it tries to load data from an empty line, however, an error would
occur and the tuple will be rejected. Some text editors love to add multiple
newlines to the end of a file; if you see any strange errors in your .log file
about tuples with all NULL columns, this may be the cause. It shouldn't affect
other tuples loaded.
• If you are using a Microsoft text editor, such as MSWord, you will find that Bill
Gates believes in ending lines with the sequence <CR> (carriage return)
<LF> (line feed). The UNIX world uses only <LF>, so each <CR> becomes
^M, the last character of strings in your load file. That makes it impossible for
you ever to match a stored string in an SQL query. Here's how you remove
^M symbols from your file. Let's say the file with ^M symbols is
bad_myRel.dat. Then the following command will create myRel.dat without
^M symbols:
• cat bad_myRel.dat | tr -d '\015' > myRel.dat
If you're an emacs fan, type in the following sequence to modify your current buffer:
ESC-x replace-string CTRL-q CTRL-m ENTER ENTER
Loading Your Data
The Oracle bulk loader is called sqlldr. It is a UNIX-level command, i.e., it
should be issued directly from your UNIX shell, rather than within sqlplus. A
bulk load command has the following form:

sqlldr <yourName> control=<ctlFile> log=<logFile> bad=<badFile>


Everything but sqlldr is optional -- you will be prompted for your username,
password, and control file. <ctlFile> is the name of the control file. If no file
name extension is provided, sqlldr will assume the default extension ".ctl".
The name of the data file is not needed on the command line because it is
specified within the control file. You may designate <logFile> as the log file.
If no file name extension is provided, ".log" will be assumed. sqlldr will fill
the log file with relevant information about the bulk load operation, such as
the number of tuples loaded, and a description of errors that may have
occurred. Finally, you may designate <badFile> as the file where bad tuples
(any tuples for which an error occurs on an attempt to load them) are
recorded (if they occur). Again, if no file extension is specified, Oracle uses
".bad". If no log file or bad file are specified, sqlldr will use the name of the
control file with the .log and .bad extensions, respectively.

As a concrete example, if sally wishes to run the control file test.ctl and have the log
output stored in test.log, then she should type
sqlldr sally control=test.ctl log=test.log
Reminder: Before you run any Oracle commands such as sqlldr and
sqlplus, make sure you have already set up the correct environment by
sourcing /afs/ir/class/cs145/all.env (see Getting Started With Oracle).

Loading Without a Separate Data File


It is possible to use just the control file to load data, instead of using a
separate data file. Here is an example:

LOAD DATA
INFILE *
INTO TABLE test
FIELDS TERMINATED BY '|'
(i, s)
BEGINDATA
1|foo
2|bar
3| baz
The trick is to specify "*" as the name of the data file, and use BEGINDATA to
start the data section in the control file.

Loading DATE Data


The DATE datatype can have its data loaded in a format you specify with
considerable flexibility. First, suppose that you have created a relation with
an attribute of type DATE:

CREATE TABLE foo (


i int,
d date
);
In the control file, when you describe the attributes of foo being loaded, you
follow the attribute d by its type DATE and a date mask. A date mask specifies
the format your date data will use. It is a quoted string with the following
conventions:

○ Sequences of d, m, or y, denote fields in your data that will be


interpreted as days, months, years, respectively. As with almost all of
SQL, capitals are equally acceptable, e.g., MM is a month field.
○ The lengths of these fields specify the maximum lengths for the
corresponding values in your data. However, the data can be shorter.
○ Other characters, such as dash, are treated literally, and must appear
in your data if you put them in the mask.
Here is an example control file:
LOAD DATA
INFILE *
INTO TABLE foo
FIELDS TERMINATED BY '|'
(i, d DATE 'dd-mm-yyyy')
BEGINDATA
1|01-01-1990
2|4-1-1998
Notice that, as illustrated by the second tuple above, a field can be shorter
than the corresponding field in the date mask. The punctuation "-" tells the
loader that the day and month fields of the second tuple terminate early.

Loading Long Strings


String fields that may be longer than 255 characters, such as for data types
CHAR(2000) or VARCHAR(4000), require a special CHAR(n) declaration in the
control file. For example, if table foo was created as

CREATE TABLE foo (x VARCHAR(4000));


Then a sample control file should look like:
LOAD DATA
INFILE <dataFile>
INTO TABLE foo
FIELDS TERMINATED BY '|'
(x CHAR(4000))
Note that the declaration takes the form CHAR(n) regardless of whether the
field type was declared as CHAR or VARCHAR.

Entering NULL Values


You may specify NULL values simply by entering fields in your data file without
content. For example, if we were entering integer values into a table with
schema (a, b, c) specified in the .ctl file, the following lines in the data file:

3||5
|2|4
1||6
||7
would result in inserting the following tuples in the relation:

(3, NULL, 5)
(NULL, 2, 4)
(1, NULL, 6)
(NULL, NULL, 7)
Keep in mind that any primary keys or other constraints requiring that values
be non-NULL will reject tuples for which those attributes are unspecified.

Note:If the final field in a given row of your data file will be unspecified (NULL), you
have to include the line TRAILING NULLCOLS after the FIELDS TERMINATED BY line in
your control file, otherwise sqlldr will reject that tuple. sqlldr will also reject a tuple
whose columns are all set to NULL in the data file.
If you do not wish to enter values for any row of a given column, you can, as mentioned
above, leave that column out of the attribute list altogether.

This document was written originally for Prof. Jeff Ullman's CS145 class in Autumn, 1997; revised by Jun Yang for
Prof. Jennifer Widom's CS145 class in Spring, 1998; further revisions by Jeff Ullman, Autumn, 1998; further
revisions by Srinivas Vemuri for Prof. Jeff Ullman's CS145 class in Autumn, 1999; further revisions by Nathan
Folkert for Prof. Jennifer Widom's CS145 class in Spring, 2001. Further revisions by Wang Lam for Prof. Widom's
CS145 class in Spring, 2003.

Resources

• Database Systems: The Complete Book by Hector Garcia, Jeff Ullman, and
Jennifer Widom.
• A First Course in Database Systems by Jeff Ullman and Jennifer Widom.
• Gradiance SQL Tutorial.

Oracle 9i SQL versus Standard SQL


This document highlights some of the differences between the SQL standard and
the SQL dialect of Oracle 9i. Please share with us any additional differences that you
may find.

• Basic SQL Features


• Comments
• Data Types
• Indexes
• Views
• Constraints
• Triggers
• Transactions
• Timing SQL Commands
• PL/SQL Vs. SQL/PSM
• Object-Relational Features

Basic SQL Features

Oracle does not support AS in FROM clauses, but you can still specify tuple variables
without AS:

from Relation1 u, Relation2 v


On the other hand, Oracle does support AS in SELECT clauses, although the use of AS
is completely optional.

The set-difference operator in Oracle is called MINUS rather than EXCEPT. There is no
bag-difference operator corresponding to EXCEPT ALL. The bag-intersection operator
INTERSECT ALL is not implemented either. However, the bag-union operator UNION
ALLis supported.

In Oracle, you must always prefix an attribute reference with the table name
whenever this attribute name appears in more than one table in the FROM clause. For
example, suppose that we have tables R(A,B) and S(B,C). The following query does
not work in Oracle, even though B is unambiguous because R.B is equated to S.B in
the WHERE clause:

select B from R, S where R.B = S.B; /* ILLEGAL! */


Instead, you should use:
select R.B from R, S where R.B = S.B;

In Oracle, the negation logical operator (NOT) should go in front of the boolean
expression, not in front of the comparison operator. For example, "NOT A = ANY
(<subquery>)" is a valid WHERE condition, but "A NOT = ANY (<subquery>)" is not.
(Note that "A <> ANY (<subquery>)" is also a valid condition, but means something
different.) There is one exception to this rule: You may use either "NOT A IN
(<subquery>)" or "A NOT IN (<subquery>)".

In Oracle, an aliased relation is invisible to a subquery's FROM clause. For example,

SELECT * FROM R S WHERE EXISTS (SELECT * FROM S)


is rejected because Oracle does not find S in the subquery, but

SELECT * FROM R S WHERE EXISTS (SELECT * FROM R WHERE R.a = S.a)


is accepted.

In Oracle, a query that includes

1. a subquery in the FROM clause, using GROUP BY; and


2. a subquery in the WHERE clause, using GROUP BY
can cause the database connection to break with an error (ORA-03113: end-of-file
on communication channel), even if the two GROUP BY clauses are unrelated.

Comments
In Oracle, comments may be introduced in two ways:

1. With /*...*/, as in C.
2. With a line that begins with two dashes --.
Thus:
-- This is a comment
SELECT * /* and so is this */
FROM R;

Data Types

BIT type is not supported. There is a BOOLEAN type in PL/SQL (see Using Oracle
PL/SQL for details), but it cannot be used for a database column.
Domains (i.e., type aliases) are not supported.

Dates and times are supported differently in Oracle. For details, please refer to
Oracle Dates and Times, available from the class web page.

CHAR(n) can be of length up to 2000. VARCHAR(n) can be of length up to 4000.


However, special treatment is required when bulk-loading strings longer than 255
characters. See The Oracle Bulk Loader for details.

Indexes

To create an index in Oracle, use the syntax:

create [unique] index <index_name> on <table_name>(<attr_list>);


In general, <attr_list> could contain more than one attribute. Such an index allows
efficient retrieval of tuples with given values for <attr_list>. The optional keyword
UNIQUE, if specified, declares <attr_list> to be duplicate-free, which in effect makes
<attr_list> a key of <table_name>.

To get rid of an index, use:


drop index <index_name>;

Oracle automatically creates an index for each UNIQUE or PRIMARY KEY declaration.
For example, if you create a table foo as follows:

create table foo (a int primary key,


b varchar(20) unique);
Oracle will automatically create one index on foo.a and another on foo.b. Note that
you cannot drop indexes for UNIQUE and PRIMARY KEY attributes. These indexes are
dropped automatically when you drop the table or the key constraints (see the
section on Constraints).

To find out what indexes you have, use

select index_name from user_indexes;


USER_INDEXES is another system table just like USER_TABLES. This can become
especially helpful if you forget the names of your indexes and therefore cannot drop
them. You might also see weird names of the indexes created by Oracle for UNIQUE
and PRIMARY KEY attributes, but you will not be able to drop these indexes.
On the Stanford Oracle installation, there are two "tablespaces", one for data, the
other for indexes. Every time you create an index (either explicitly with CREATE
INDEX or implicitly with a UNIQUE or PRIMARY KEY declaration), you should (on the
Stanford Oracle) follow the declaration by TABLESPACE INDX. In addition, if you are
implicitly creating the index, you need the phrase USING INDEX before TABLESPACE
INDX. For example:

create index RAindex on R(A) tablespace indx;


create table foo (a int primary key using index tablespace indx,
b varchar(20) unique using index tablespace indx);

Views

Oracle supports views as specified in SQL. To find out what views you have created,
use:

select view_name from user_views;

Constraints

To find out what constraints are defined in your database, use:

select constraint_name from user_constraints;

Oracle supports key constraints as specified in SQL. For each table, there can be
only one PRIMARY KEY declaration, but many UNIQUE declarations. Each PRIMARY KEY
(or UNIQUE) declaration can have multiple attributes, which means that these
attributes together form a primary key (or a key, respectively) of the table.

Oracle supports referential integrity (foreign key) constraints, and allows an optional
ON DELETE CASCADE or ON DELETE SET NULL after a REFERENCES clause in a table
declaration. However, it does not allow ON UPDATE options.

Note that when declaring a foreign key constraint at the end of a table declaration it is always
necessary to put the list of referencing attributes in parentheses:
create table foo (...
foreign key (<attr_list>) references (<attr_list>));
Oracle supports attribute- and tuple-based constraints, but does not allow CHECK
conditions to use subqueries. Thus, there is no way for an attribute- or tuple-based
constraint to reference anything else besides the attribute or tuple that is being
inserted or updated.

Domain constraints are not supported since domains are not supported.

As for general constraints, ASSERTION is not supported. However, a TRIGGER close to


the SQL trigger is supported. See Constraints and Triggers for details.

In the ALTER TABLE statement, Oracle supports ADDing columns and table constraints,
MODIFYing column properties and column constraints, and DROPping constraints.
However, you cannot MODIFY an attribute-based CHECK constraint. Here are some
examples:

create table bar (x int, y int, constraint XYcheck check (x > y));
alter table bar add (z int, w int);
alter table bar add primary key (x);
alter table bar add constraint YZunique unique (y, z);
alter table bar modify (w varchar(2) default 'AM'
constraint Wnotnull not null);
alter table bar add check (w in ('AM', 'PM'));
alter table bar drop constraint YZunique;
alter table bar drop constraint XYcheck;
alter table bar drop constraint Wnotnull;
alter table bar drop primary key cascade;
Dropping constraints generally requires knowing their names (only in the special
case of primary or unique key constraints can you drop them without specifying
their names). Thus, it is always a good idea to name all your constraints.

Triggers
Triggers in Oracle differ in several ways from the SQL standard. Details are in a
separate section Constraints and Triggers.

Transactions

Oracle supports transactions as defined by the SQL standard. A transaction is a


sequence of SQL statements that Oracle treats as a single unit of work. As soon as
you connect to the database with sqlplus, a transaction begins. Once the
transaction begins, every SQL DML (Data Manipulation Language) statement you
issue subsequently becomes a part of this transaction. A transaction ends when you
disconnect from the database, or when you issue a COMMIT or ROLLBACK command.

COMMIT makes permanent any database changes you made during the current transaction. Until
you commit your changes, other users cannot see them. ROLLBACK ends the current transaction
and undoes any changes made since the transaction began.
After the current transaction has ended with a COMMIT or ROLLBACK, the first executable SQL
statement that you subsequently issue will automatically begin another transaction.
For example, the following SQL commands have the final effect of inserting into table R the
tuple (3, 4), but not (1, 2):
insert into R values (1, 2);
rollback;
insert into R values (3, 4);
commit;
During interactive usage with sqlplus, Oracle also supports an AUTOCOMMIT option.
With this option set to ON each individual SQL statement is treated as a transaction
an will be automatically commited right after it is executed. A user can change the
AUTOCOMMIT option by typing

SET AUTOCOMMIT ON
or

SET AUTOCOMMIT OFF


whereas by typing

SHOW ALL
a user can see the current setting for the option (including other ones).

The same rules for designating the end of a transaction (COMMIT/ROLLBACK) and the
beginning of it (which is implied and starts just after the last COMMIT/ROLLBACK) apply to
programmers interacting with Oracle using Pro*C or JDBC. Note though that Pro*C doesn't
support the AUTOCOMMIT option whereas JDBC does and it has a default AUTOCOMMMIT
option set to ON. Thus a programmer needs to execute COMMIT/ROLLBACK statements in
Pro*C whereas in JDBC a user can make use of the AUTOCOMMIT and never specify
explicitly where a transaction starts or ends. For more details, see the respective sections: Pro*C,
JDBC.

Oracle also supports the SAVEPOINT command. The command SAVEPOINT <sp_name>
establishes a savepoint named <sp_name> which marks the current point in the
processing of a transaction. This savepoint can be used in conjunction with the
command ROLLBACK TO <sp_name> to undo parts of a transaction.

For example, the following commands have the final effect of inserting into table R tuples (5, 6)
and (11, 12), but not (7, 8) or (9, 10):
insert into R values (5, 6);
savepoint my_sp_1;
insert into R values (7, 8);
savepoint my_sp_2;
insert into R values (9, 10);
rollback to my_sp_1;
insert into R values (11, 12);
commit;

Oracle automatically issues an implicit COMMIT before and after any SQL DDL (Data
Definition Language) statement (even if this DDL statement fails) .

Timing SQL Commands

Oracle provides a TIMING command for measuring the running time of SQL
commands. To activate this feature, type

set timing on;


Then, Oracle will automatically display the elapsed wall-clock time for each SQL
command you run subsequently. Note that timing data may be affected by external
factors such as system load, etc. To turn off timing, type

set timing off;


You can also create and control multiple timers; type HELP TIMING in sqlplus for
details.

PL/SQL Vs. PSM


Here are a few of the most common distinctions between Oracle's PL/SQL and the
SQL standard PSM (persistent, stored modules):

In nested if-statements, PL/SQL uses ELSIF, while PSM calls for ELSEIF. Both are
used where we would find ELSE IF in C, for example.

To leave a loop, PL/SQL uses EXIT, or EXIT WHEN(...) to exit conditionally. PSM uses
LEAVE, and puts the leave-statement in an if-statement to exit conditionally.

Assignments in PL/SQL are with the := operator, as A := B. The corresponding PSM


syntax is SET A = B.

Object-Relational Features
There is a great deal of difference between the Oracle and SQL-standard
approaches to user-defined types. You should look at the on-line guide Object
Relational Features of Oracle for details and examples of the Oracle approach.
However, here are a few small places where the approaches almost coincide but
differ in small ways:

When defining a user-defined type, Oracle uses CREATE TYPE ... AS OBJECT, while
the word ``OBJECT'' is not used in the standard.

When accessing an attribute a of a relation R that is defined to have a user-defined


type, the ``dot'' notation works in Oracle, as R.a. In the standard, a must be thought
of as a method of the same name, and the syntax is R.a().

To define (not declare) a method, Oracle has you write the code for the method in a
CREATE TYPE BODY statement for the type to which the method belongs. The
standard uses a CREATE METHOD statement similar to the way functions are defined in
PL/SQL or SQL/PSM.

This document was written originally for Prof. Jeff Ullman's CS145 class in Autumn, 1997; revised by Jun Yang for Prof.
Jennifer Widom's CS145 class in Spring, 1998; further revisions by Jun Yang, Spring 1999; further revisions by Jennifer
Widom, Spring 2000; minor revisions by Nathan Folkert, Spring 2001; Henry Hsieh, Autumn 2001; and Antonios Hondroulis,
Spring 2002; further revisions by Wang Lam for Prof. Jennifer Widom's CS145 class in Spring 2003.

Resources
• Database Systems: The Complete Book by Hector Garcia, Jeff Ullman, and
Jennifer Widom.
• A First Course in Database Systems by Jeff Ullman and Jennifer Widom.
• Gradiance SQL Tutorial.

mySQL versus Standard SQL


This document highlights some of the differences between the SQL standard and
the SQL dialect of mySQL. Please share with us any additional differences that you
may find.

• HAVING Clauses
• Views
• Intersection and Set-Difference
• ANY and ALL

HAVING Clauses
mySQL has a very limited form of HAVING clause. Instead of evaluating the HAVING
condition within each group, mySQL treats HAVING as a selection on the output
tuples. Thus, you can only refer in the HAVING clause to attributes that appear in
the SELECT clause. Recent versions of mySQL allow you to refer to aggregates in
the SELECT clause by their formula [e.g., AVG(salary)] rather than by an alias
established in the SELECT clause by (e.g.) AVG(salary)AS avgSalary.

Views
mySQL does not support views. However, unlike some other SQL implementations,
mySQL does support fully nested subqueries in the FROM clause. These subqueries
can serve as views in many situations, although they do not provide the ability of a
view to serve as a macro, with its definition reused in many queries.

Intersection and Set-Difference


The INTERSECT and EXCEPT operators of SQL are not supported in mySQL. We
suggest that instead, you use a join with all corresponding attributes equated in
place of an intersection. For instance, to get the intersection of R(a,b) and S(a,b),
write:

SELECT DISTINCT *
FROM R
WHERE EXISTS (SELECT * FROM S WHERE R.a = S.a AND R.b = S.b);
To get the set difference, here is a similar approach using a subquery:
SELECT DISTINCT *
FROM R
WHERE NOT EXISTS (SELECT * FROM S WHERE R.a = S.a AND R.b = S.b);
Note that both these expressions eliminate duplicates, but that is in accordance with the SQL
standard.

ANY and ALL


There are some discrepancies between the standard and how the ANY and ALL
operators are used in SQL. Only "=" seems to be handled completely correctly. Here
is a concrete example and the responses mySQL gives. The query, about a
Sells(bar, beer, price) relation, is:

SELECT * FROM Sells


WHERE price Op Quant(SELECT price FROM Sells);
Here, Op is one of the comparisons, and Quant is either ANY or ALL.
O
ANY ALL
p

> Correc
(1)
= t

< Corre
(1)
= ct

Corre Correc
=
ct t

< Corre
(1)
> ct

< (2) (2)

> (2) (2)

(1) mySQL gives an incorrect result, which in each of these cases is the same as what the other
of ANY and ALL gives.
(2) mySQL gives an incorrect result for both ANY and ALL. For each operator, the result is the
same independent of whether ANY or ALL is used. For <, the result is several tuples with low,
but different prices, and for > it is the other tuples in the relation Sells, i.e., some of the tuples
with high, but different prices.

This document was written originally by Jeff Ullman in the Winter of 2004.

Resources
• Database Systems: The Complete Book by Hector Garcia, Jeff Ullman, and Jennifer
Widom.
• A First Course in Database Systems by Jeff Ullman and Jennifer Widom.
• Gradiance SQL Tutorial.

Oracle Dates and Times


• Overview
• DATE Format
• The Current Time
• Operations on DATE
• Further Information

Overview
Oracle supports both date and time, albeit differently from the SQL2 standard. Rather than using
two separate entities, date and time, Oracle only uses one, DATE. The DATE type is stored in a
special internal format that includes not just the month, day, and year, but also
the hour, minute, and second.
The DATE type is used in the same way as other built-in types such as INT. For example, the
following SQL statement creates a relation with an attribute of type DATE:
create table x(a int, b date);

DATE Format
When a DATE value is displayed, Oracle must first convert that value from the special internal
format to a printable string. The conversion is done by a function TO_CHAR, according to a DATE
format. Oracle's default format for DATE is "DD-MON-YY". Therefore, when you issue the query
select b from x;
you will see something like:
B
---------
01-APR-98
Whenever a DATE value is displayed, Oracle will call TO_CHAR automatically with the default
DATE format. However, you may override the default behavior by calling TO_CHAR explicitly with
your own DATE format. For example,
SELECT TO_CHAR(b, 'YYYY/MM/DD') AS b
FROM x;
returns the result:
B
---------------------------------------------------------------------------
1998/04/01
The general usage of TO_CHAR is:
TO_CHAR(<date>, '<format>')
where the <format> string can be formed from over 40 options. Some of the more popular ones
include:

MM Numeric month (e.g., 07)


MON Abbreviated month name (e.g., JUL)
MONTH Full month name (e.g., JULY)
DD Day of month (e.g., 24)
DY Abbreviated name of day (e.g., FRI)
YYYY 4-digit year (e.g., 1998)
YY Last 2 digits of the year (e.g., 98)
RR
Like YY, but the two digits are ``rounded'' to a year in the range 1950 to 2049. Thus, 06
is considered 2006 instead of 1906, for example.
AM (or
Meridian indicator
PM)
HH Hour of day (1-12)
HH24 Hour of day (0-23)
MI Minute (0-59)
SS Second (0-59)
You have just learned how to output a DATE value using TO_CHAR. Now what about inputting a
DATE value? This is done through a function called TO_DATE, which converts a string to a DATE
value, again according to the DATE format. Normally, you do not have to call TO_DATE explicitly:
Whenever Oracle expects a DATE value, it will automatically convert your input string using
TO_DATE according to the default DATE format "DD-MON-YY". For example, to insert a tuple with a
DATE attribute, you can simply type:
insert into x values(99, '31-may-98');
Alternatively, you may use TO_DATE explicitly:
insert into x
values(99, to_date('1998/05/31:12:00:00AM', 'yyyy/mm/dd:hh:mi:ssam'));
The general usage of TO_DATE is:
TO_DATE(<string>, '<format>')
where the <format> string has the same options as in TO_CHAR.
Finally, you can change the default DATE format of Oracle from "DD-MON-YY" to something you
like by issuing the following command in sqlplus:
alter session set NLS_DATE_FORMAT='<my_format>';
The change is only valid for the current sqlplus session.

The Current Time


The built-in function SYSDATE returns a DATE value containing the current date and time on your
system. For example,
select to_char(sysdate, 'Dy DD-Mon-YYYY HH24:MI:SS') as "Current Time"
from dual;
returns
Current Time
---------------------------------------------------------------------------
Tue 21-Apr-1998 21:18:27
which is the time when I was preparing this document :-) Two interesting things to note here:
• You can use double quotes to make names case sensitive (by default, SQL is case
insensitive), or to force spaces into names. Oracle will treat everything inside the double
quotes literally as a single name. In this example, if "Current Time" is not quoted, it
would have been interpreted as two case insensitive names CURRENT and TIME, which
would actually cause a syntax error.
• DUAL is built-in relation in Oracle which serves as a dummy relation to put in the FROM
clause when nothing else is appropriate. For example, try "select 1+2 from dual;".
Another name for the built-in function SYSDATE is CURRENT_DATE. Be aware of these special
names to avoid name conflicts.

Operations on DATE
You can compare DATE values using the standard comparison operators such as =, !=, >, etc.
You can subtract two DATE values, and the result is a FLOAT which is the number of days between
the two DATE values. In general, the result may contain a fraction because DATE also has a time
component. For obvious reasons, adding, multiplying, and dividing two DATE values are not
allowed.
You can add and subtract constants to and from a DATE value, and these numbers will be
interpreted as numbers of days. For example, SYSDATE+1 will be tomorrow. You cannot multiply
or divide DATE values.
With the help of TO_CHAR, string operations can be used on DATE values as well. For example,
to_char(<date>, 'DD-MON-YY') like '%JUN%' evaluates to true if <date> is in June.

This document was written originally by Kristian Widjaja for Prof. Jeff Ullman's CS145 class in Autumn, 1997; revised by Jun Yang for Prof. Jennifer
Widom's CS145 class in Spring, 1998; further revisions by Prof. Ullman in Autumn, 1998.

Resources
• Database Systems: The Complete Book by Hector Garcia, Jeff Ullman, and Jennifer
Widom.
• A First Course in Database Systems by Jeff Ullman and Jennifer Widom.
• Gradiance SQL Tutorial.

Using Oracle PL/SQL


• Basic Structure of PL/SQL
• Variables and Types
• Simple PL/SQL Programs
• Control Flow in PL/SQL
• Cursors
• Procedures
• Discovering Errors
• Printing Variables
Note: The material on triggers that was formerly in this document has been moved to A New
Document on Constraints and Triggers.

Basic Structure of PL/SQL


PL/SQL stands for Procedural Language/SQL. PL/SQL extends SQL by adding constructs found
in procedural languages, resulting in a structural language that is more powerful than SQL. The
basic unit in PL/SQL is a block. All PL/SQL programs are made up of blocks, which can be
nested within each other. Typically, each block performs a logical action in he program. A block
has the following structure:
DECLARE
/* Declarative section: variables, types, and local subprograms. */

BEGIN

/* Executable section: procedural and SQL statements go here. */

/* This is the only section of the block that is required. */

EXCEPTION

/* Exception handling section: error handling statements go here. */

END;
Only the executable section is required. The other sections are optional. The only SQL
statements allowed in a PL/SQL program are SELECT, INSERT, UPDATE, DELETE and several other
data manipulation statements plus some transaction control. However, the SELECT statement has
a special form in which a single tuple is placed in variables; more on this later. Data definition
statements like CREATE, DROP, or ALTER are not allowed. The executable section also contains
constructs such as assignments, branches, loops, procedure calls, and triggers, which are all
described below (except triggers). PL/SQL is not case sensitive. C style comments (/* ... */)
may be used.
To execute a PL/SQL program, we must follow the program text itself by
• A line with a single dot ("."), and then
• A line with run;
As with Oracle SQL programs, we can invoke a PL/SQL program either by typing it in sqlplus
or by putting the code in a file and invoking the file in the various ways we learned in Getting
Started With Oracle.

Variables and Types


Information is transmitted between a PL/SQL program and the database through variables.
Every variable has a specific type associated with it. That type can be
• One of the types used by SQL for database columns
• A generic type used in PL/SQL such as NUMBER
• Declared to be the same as the type of some database column
The most commonly used generic type is NUMBER. Variables of type NUMBER can hold either an
integer or a real number. The most commonly used character string type is VARCHAR(n), where n
is the maximum length of the string in bytes. This length is required, and there is no default. For
example, we might declare:
DECLARE

price NUMBER;

myBeer VARCHAR(20);
Note that PL/SQL allows BOOLEAN variables, even though Oracle does not support BOOLEAN as a
type for database columns.
Types in PL/SQL can be tricky. In many cases, a PL/SQL variable will be used to manipulate
data stored in a existing relation. In this case, it is essential that the variable have the same type
as the relation column. If there is any type mismatch, variable assignments and comparisons may
not work the way you expect. To be safe, instead of hard coding the type of a variable, you
should use the %TYPE operator. For example:
DECLARE

myBeer Beers.name%TYPE;
gives PL/SQL variable myBeer whatever type was declared for the name column in relation
Beers.
A variable may also have a type that is a record with several fields. The simplest way to declare
such a variable is to use %ROWTYPE on a relation name. The result is a record type in which the
fields have the same names and types as the attributes of the relation. For instance:
DECLARE

beerTuple Beers%ROWTYPE;
makes variable beerTuple be a record with fields name and manufacture, assuming that the
relation has the schema Beers(name, manufacture).
The initial value of any variable, regardless of its type, is NULL. We can assign values to
variables, using the ":=" operator. The assignment can occur either immediately after the type of
the variable is declared, or anywhere in the executable portion of the program. An example:
DECLARE

a NUMBER := 3;

BEGIN

a := a + 1;

END;

run;
This program has no effect when run, because there are no changes to the database.

Simple Programs in PL/SQL


The simplest form of program has some declarations followed by an executable section
consisting of one or more of the SQL statements with which we are familiar. The major nuance
is that the form of the SELECT statement is different from its SQL form. After the SELECT clause,
we must have an INTO clause listing variables, one for each attribute in the SELECT clause, into
which the components of the retrieved tuple must be placed.
Notice we said "tuple" rather than "tuples", since the SELECT statement in PL/SQL only works if
the result of the query contains a single tuple. The situation is essentially the same as that of the
"single-row select" discussed in Section 7.1.5 of the text, in connection with embedded SQL. If
the query returns more than one tuple, you need to use a cursor, as described in the next section.
Here is an example:
CREATE TABLE T1(

e INTEGER,

f INTEGER

);

DELETE FROM T1;

INSERT INTO T1 VALUES(1, 3);

INSERT INTO T1 VALUES(2, 4);

/* Above is plain SQL; below is the PL/SQL program. */

DECLARE

a NUMBER;

b NUMBER;

BEGIN

SELECT e,f INTO a,b FROM T1 WHERE e>1;

INSERT INTO T1 VALUES(b,a);

END;

run;
Fortuitously, there is only one tuple of T1 that has first component greater than 1, namely (2,4).
The INSERT statement thus inserts (4,2) into T1.
Control Flow in PL/SQL
PL/SQL allows you to branch and create loops in a fairly familiar way.
An IF statement looks like:
IF <condition> THEN <statement_list> ELSE <statement_list> END IF;
The ELSE part is optional. If you want a multiway branch, use:
IF <condition_1> THEN ...

ELSIF <condition_2> THEN ...

... ...

ELSIF <condition_n> THEN ...

ELSE ...

END IF;
The following is an example, slightly modified from the previous one, where now we only do the
insertion if the second component is 1. If not, we first add 10 to each component and then insert:
DECLARE

a NUMBER;

b NUMBER;

BEGIN

SELECT e,f INTO a,b FROM T1 WHERE e>1;

IF b=1 THEN

INSERT INTO T1 VALUES(b,a);

ELSE

INSERT INTO T1 VALUES(b+10,a+10);

END IF;

END;

run;
Loops are created with the following:
LOOP

<loop_body> /* A list of statements. */

END LOOP;
At least one of the statements in <loop_body> should be an EXIT statement of the form
EXIT WHEN <condition>;
The loop breaks if <condition> is true. For example, here is a way to insert each of the pairs (1,
1) through (100, 100) into T1 of the above two examples:
DECLARE

i NUMBER := 1;

BEGIN

LOOP

INSERT INTO T1 VALUES(i,i);

i := i+1;

EXIT WHEN i>100;

END LOOP;

END;

run;
Some other useful loop-forming statements are:
• EXIT by itself is an unconditional loop break. Use it inside a conditional if you like.
• A WHILE loop can be formed with
• WHILE <condition> LOOP

• <loop_body>

END LOOP;
• A simple FOR loop can be formed with:
• FOR <var> IN <start>..<finish> LOOP

• <loop_body>

END LOOP;
Here, <var> can be any variable; it is local to the for-loop and need not be declared. Also,
<start> and <finish> are constants.

Cursors
A cursor is a variable that runs through the tuples of some relation. This relation can be a stored
table, or it can be the answer to some query. By fetching into the cursor each tuple of the
relation, we can write a program to read and process the value of each such tuple. If the relation
is stored, we can also update or delete the tuple at the current cursor position.
The example below illustrates a cursor loop. It uses our example relation T1(e,f) whose tuples
are pairs of integers. The program will delete every tuple whose first component is less than the
second, and insert the reverse tuple into T1.
1) DECLARE

/* Output variables to hold the result of the query: */


2) a T1.e%TYPE;

3) b T1.f%TYPE;

/* Cursor declaration: */

4) CURSOR T1Cursor IS

5) SELECT e, f

6) FROM T1

7) WHERE e < f

8) FOR UPDATE;

9) BEGIN

10) OPEN T1Cursor;

11) LOOP

/* Retrieve each row of the result of the above query

into PL/SQL variables: */

12) FETCH T1Cursor INTO a, b;

/* If there are no more rows to fetch, exit the loop: */

13) EXIT WHEN T1Cursor%NOTFOUND;

/* Delete the current tuple: */

14) DELETE FROM T1 WHERE CURRENT OF T1Cursor;

/* Insert the reverse tuple: */

15) INSERT INTO T1 VALUES(b, a);

16) END LOOP;

/* Free cursor used by the query. */

17) CLOSE T1Cursor;

18) END;

19) .

20) run;
Here are explanations for the various lines of this program:
• Line (1) introduces the declaration section.
• Lines (2) and (3) declare variables a and b to have types equal to the types of attributes e
and f of the relation T1. Although we know these types are INTEGER, we wisely make
sure that whatever types they may have are copied to the PL/SQL variables (compare
with the previous example, where we were less careful and declared the corresponding
variables to be of type NUMBER).
• Lines (4) through (8) define the cursor T1Cursor. It ranges over a relation defined by the
SELECT-FROM-WHERE query. That query selects those tuples of T1 whose first component
is less than the second component. Line (8) declares the cursor FOR UPDATE since we will
modify T1 using this cursor later on Line (14). In general, FOR UPDATE is unnecessary if
the cursor will not be used for modification.
• Line (9) begins the executable section of the program.
• Line (10) opens the cursor, an essential step.
• Lines (11) through (16) are a PL/SQL loop. Notice that such a loop is bracketed by LOOP
and END LOOP. Within the loop we find:
○ On Line (12), a fetch through the cursor into the local variables. In general, the
FETCH statement must provide variables for each component of the tuple retrieved.
Since the query of Lines (5) through (7) produces pairs, we have correctly
provided two variables, and we know they are of the correct type.
○ On Line (13), a test for the loop-breaking condition. Its meaning should be clear:
%NOTFOUND after the name of a cursor is true exactly when a fetch through that
cursor has failed to find any more tuples.
○ On Line (14), a SQL DELETE statement that deletes the current tuple using the
special WHERE condition CURRENT OF T1Cursor.
○ On Line (15), a SQL INSERT statement that inserts the reverse tuple into T1.
• Line (17) closes the cursor.
• Line (18) ends the PL/SQL program.
• Lines (19) and (20) cause the program to execute.

Procedures
PL/SQL procedures behave very much like procedures in other programming language. Here is
an example of a PL/SQL procedure addtuple1 that, given an integer i, inserts the tuple (i,
'xxx') into the following example relation:
CREATE TABLE T2 (

a INTEGER,

b CHAR(10)

);

CREATE PROCEDURE addtuple1(i IN NUMBER) AS


BEGIN

INSERT INTO T2 VALUES(i, 'xxx');

END addtuple1;

run;
A procedure is introduced by the keywords CREATE PROCEDURE followed by the procedure name
and its parameters. An option is to follow CREATE by OR REPLACE. The advantage of doing so is
that should you have already made the definition, you will not get an error. On the other hand,
should the previous definition be a different procedure of the same name, you will not be
warned, and the old procedure will be lost.
There can be any number of parameters, each followed by a mode and a type. The possible
modes are IN (read-only), OUT (write-only), and INOUT (read and write). Note: Unlike the type
specifier in a PL/SQL variable declaration, the type specifier in a parameter declaration must be
unconstrained. For example, CHAR(10) and VARCHAR(20) are illegal; CHAR or VARCHAR should be
used instead. The actual length of a parameter depends on the corresponding argument that is
passed in when the procedure is invoked.
Following the arguments is the keyword AS (IS is a synonym). Then comes the body, which is
essentially a PL/SQL block. We have repeated the name of the procedure after the END, but this
is optional. However, the DECLARE section should not start with the keyword DECLARE. Rather,
following AS we have:
... AS

<local_var_declarations>

BEGIN

<procedure_body>

END;

run;
The run at the end runs the statement that creates the procedure; it does not execute the
procedure. To execute the procedure, use another PL/SQL statement, in which the procedure is
invoked as an executable statement. For example:
BEGIN addtuple1(99); END;

run;
The following procedure also inserts a tuple into T2, but it takes both components as arguments:
CREATE PROCEDURE addtuple2(

x T2.a%TYPE,

y T2.b%TYPE)
AS

BEGIN

INSERT INTO T2(a, b)

VALUES(x, y);

END addtuple2;

run;
Now, to add a tuple (10, 'abc') to T2:
BEGIN

addtuple2(10, 'abc');

END;

run;
The following illustrates the use of an OUT parameter:
CREATE TABLE T3 (

a INTEGER,

b INTEGER

);

CREATE PROCEDURE addtuple3(a NUMBER, b OUT NUMBER)

AS

BEGIN

b := 4;

INSERT INTO T3 VALUES(a, b);

END;

run;

DECLARE

v NUMBER;
BEGIN

addtuple3(10, v);

END;

run;
Note that assigning values to parameters declared as OUT or INOUT causes the corresponding
input arguments to be written. Because of this, the input argument for an OUT or INOUT parameter
should be something with an "lvalue", such as a variable like v in the example above. A constant
or a literal argument should not be passed in for an OUT/INOUT parameter.
We can also write functions instead of procedures. In a function declaration, we follow the
parameter list by RETURN and the type of the return value:
CREATE FUNCTION <func_name>(<param_list>) RETURN <return_type> AS ...
In the body of the function definition, "RETURN <expression>;" exits from the function and
returns the value of <expression>.
To find out what procedures and functions you have created, use the following SQL query:
select object_type, object_name

from user_objects

where object_type = 'PROCEDURE'

or object_type = 'FUNCTION';
To drop a stored procedure/function:
drop procedure <procedure_name>;

drop function <function_name>;

Discovering Errors
PL/SQL does not always tell you about compilation errors. Instead, it gives you a cryptic
message such as "procedure created with compilation errors". If you don't see what is wrong
immediately, try issuing the command
show errors procedure <procedure_name>;
Alternatively, you can type, SHO ERR (short for SHOW ERRORS) to see the most recent compilation
error.
Note that the location of the error given as part of the error message is not always accurate!

Printing Variables
Sometimes we might want to print the value of a PL/SQL local variable. A ``quick-and-dirty''
way is to store it as the sole tuple of some relation and after the PL/SQL statement print the
relation with a SELECT statement. A more couth way is to define a bind variable, which is the
only kind that may be printed with a print command. Bind variables are the kind that must be
prefixed with a colon in PL/SQL statements, such as :new discussed in the section on triggers.
The steps are as follows:
1. We declare a bind variable as follows:
VARIABLE <name> <type>
where the type can be only one of three things: NUMBER, CHAR, or CHAR(n).
2. We may then assign to the variable in a following PL/SQL statement, but we must prefix
it with a colon.
3. Finally, we can execute a statement
PRINT :<name>;
outside the PL/SQL statement
Here is a trivial example, which prints the value 1.
VARIABLE x NUMBER

BEGIN

:x := 1;

END;

run;

PRINT :x;

This document was written originally by Yu-May Chang and Jeff Ullman for CS145, Autumn 1997; revised by Jun Yang for Prof. Jennifer Widom's
CS145 class in Spring, 1998; additional material by Jeff Ullman, Autumn 1998; further revisions by Jun Yang, Spring 1999; minor revisions by Jennifer
Widom, Spring 2000.

Resources
• Database Systems: The Complete Book by Hector Garcia, Jeff Ullman, and
Jennifer Widom.
• A First Course in Database Systems by Jeff Ullman and Jennifer Widom.
• Gradiance SQL Tutorial.

Constraints and Triggers


Constraints are declaractions of conditions about the database that must remain
true. These include attributed-based, tuple-based, key, and referential integrity
constraints. The system checks for the violation of the constraints on actions that
may cause a violation, and aborts the action accordingly. Information on SQL
constraints can be found in the textbook. The Oracle implementation of constraints
differs from the SQL standard, as documented in Oracle 9i SQL versus Standard
SQL.

Triggers are a special PL/SQL construct similar to procedures. However, a procedure is executed
explicitly from another block via a procedure call, while a trigger is executed implicitly
whenever the triggering event happens. The triggering event is either a INSERT, DELETE, or
UPDATE command. The timing can be either BEFORE or AFTER. The trigger can be either
row-level or statement-level, where the former fires once for each row affected by the triggering
statement and the latter fires once for the whole statement.

• Constraints:
○ Deferring Constraint Checking
○ Constraint Violations
• Triggers:
○ Basic Trigger Syntax
○ Trigger Example
○ Displaying Trigger Definition Errors
○ Viewing Defined Triggers
○ Dropping Triggers
○ Disabling Triggers
○ Aborting Triggers with Error
○ Mutating Table Errors

Deferring Constraint Checking


Sometimes it is necessary to defer the checking of certain constraints, most
commonly in the "chicken-and-egg" problem. Suppose we want to say:

CREATE TABLE chicken (cID INT PRIMARY KEY,


eID INT REFERENCES egg(eID));
CREATE TABLE egg(eID INT PRIMARY KEY,
cID INT REFERENCES chicken(cID));
But if we simply type the above statements into Oracle, we'll get an error. The
reason is that the CREATE TABLE statement for chicken refers to table egg, which
hasn't been created yet! Creating egg won't help either, because egg refers to
chicken.

To work around this problem, we need SQL schema modification commands. First, create
chicken and egg without foreign key declarations:
CREATE TABLE chicken(cID INT PRIMARY KEY,
eID INT);
CREATE TABLE egg(eID INT PRIMARY KEY,
cID INT);
Then, we add foreign key constraints:

ALTER TABLE chicken ADD CONSTRAINT chickenREFegg


FOREIGN KEY (eID) REFERENCES egg(eID)
INITIALLY DEFERRED DEFERRABLE;
ALTER TABLE egg ADD CONSTRAINT eggREFchicken
FOREIGN KEY (cID) REFERENCES chicken(cID)
INITIALLY DEFERRED DEFERRABLE;
INITIALLY DEFERRED DEFERRABLE tells Oracle to do deferred constraint checking. For
example, to insert (1, 2) into chicken and (2, 1) into egg, we use:

INSERT INTO chicken VALUES(1, 2);


INSERT INTO egg VALUES(2, 1);
COMMIT;
Because we've declared the foreign key constraints as "deferred", they are only
checked at the commit point. (Without deferred constraint checking, we cannot
insert anything into chicken and egg, because the first INSERT would always be a
constraint violation.)

Finally, to get rid of the tables, we have to drop the constraints first, because Oracle won't allow
us to drop a table that's referenced by another table.
ALTER TABLE egg DROP CONSTRAINT eggREFchicken;
ALTER TABLE chicken DROP CONSTRAINT chickenREFegg;
DROP TABLE egg;
DROP TABLE chicken;

Constraint Violations
In general, Oracle returns an error message when a constraint is violated.
Specifically for users of JDBC, this means an SQLException gets thrown, whereas for
Pro*C users the SQLCA struct gets updated to reflect the error. Programmers must
use the WHENEVER statement and/or check the SQLCA contents (Pro*C users) or
catch the exception SQLException (JDBC users) in order to get the error code
returned by Oracle.

Some vendor specific error code numbers are 1 for primary key constraint violations, 2291 for
foreign key violations, 2290 for attribute and tuple CHECK constraint violations. Oracle also
provides simple error message strings that have a format similar to the following:
ORA-02290: check constraint (YFUNG.GR_GR) violated
or
ORA-02291: integrity constraint (HONDROUL.SYS_C0067174) violated - parent
key not found
For more details on how to do error handling, please take a look at Pro*C Error handling or at
the Retrieving Exceptions section of JDBC Error handling.

Basic Trigger Syntax


Below is the syntax for creating a trigger in Oracle (which differs slightly from
standard SQL syntax):

CREATE [OR REPLACE] TRIGGER <trigger_name>

{BEFORE|AFTER} {INSERT|DELETE|UPDATE} ON <table_name>

[REFERENCING [NEW AS <new_row_name>] [OLD AS <old_row_name>]]

[FOR EACH ROW [WHEN (<trigger_condition>)]]

<trigger_body>
Some important points to note:

• You can create only BEFORE and AFTER triggers for tables. (INSTEAD OF triggers
are only available for views; typically they are used to implement view
updates.)
• You may specify up to three triggering events using the keyword OR.
Furthermore, UPDATE can be optionally followed by the keyword OF and a list
of attribute(s) in <table_name>. If present, the OF clause defines the event to
be only an update of the attribute(s) listed after OF. Here are some examples:
• ... INSERT ON R ...

• ... INSERT OR DELETE OR UPDATE ON R ...

... UPDATE OF A, B OR INSERT ON R ...
• If FOR EACH ROW option is specified, the trigger is row-level; otherwise, the
trigger is statement-level.
• Only for row-level triggers:
○ The special variables NEW and OLD are available to refer to new and old
tuples respectively. Note: In the trigger body, NEW and OLD must be
preceded by a colon (":"), but in the WHEN clause, they do not have a
preceding colon! See example below.
○ The REFERENCING clause can be used to assign aliases to the variables
NEW and OLD.
○ A trigger restriction can be specified in the WHEN clause, enclosed by
parentheses. The trigger restriction is a SQL condition that must be
satisfied in order for Oracle to fire the trigger. This condition cannot
contain subqueries. Without the WHEN clause, the trigger is fired for
each row.
• <trigger_body> is a PL/SQL block, rather than sequence of SQL statements.
Oracle has placed certain restrictions on what you can do in <trigger_body>,
in order to avoid situations where one trigger performs an action that triggers
a second trigger, which then triggers a third, and so on, which could
potentially create an infinite loop. The restrictions on <trigger_body> include:
○ You cannot modify the same relation whose modification is the event
triggering the trigger.
○ You cannot modify a relation connected to the triggering relation by
another constraint such as a foreign-key constraint.

Trigger Example
We illustrate Oracle's syntax for creating a trigger through an example based on the
following two tables:

CREATE TABLE T4 (a INTEGER, b CHAR(10));

CREATE TABLE T5 (c CHAR(10), d INTEGER);


We create a trigger that may insert a tuple into T5 when a tuple is inserted into T4.
Specifically, the trigger checks whether the new tuple has a first component 10 or
less, and if so inserts the reverse tuple into T5:

CREATE TRIGGER trig1


AFTER INSERT ON T4
REFERENCING NEW AS newRow
FOR EACH ROW
WHEN (newRow.a <= 10)
BEGIN
INSERT INTO T5 VALUES(:newRow.b, :newRow.a);
END trig1;
.
run;
Notice that we end the CREATE TRIGGER statement with a dot and run, as for all
PL/SQL statements in general. Running the CREATE TRIGGER statement only creates
the trigger; it does not execute the trigger. Only a triggering event, such as an
insertion into T4 in this example, causes the trigger to execute.

Displaying Trigger Definition Errors


As for PL/SQL procedures, if you get a message

Warning: Trigger created with compilation errors.


you can see the error messages by typing

show errors trigger <trigger_name>;


Alternatively, you can type, SHO ERR (short for SHOW ERRORS) to see the most recent
compilation error. Note that the reported line numbers where the errors occur are
not accurate.

Viewing Defined Triggers


To view a list of all defined triggers, use:

select trigger_name from user_triggers;


For more details on a particular trigger:
select trigger_type, triggering_event, table_name, referencing_names,
trigger_body
from user_triggers
where trigger_name = '<trigger_name>';

Dropping Triggers
To drop a trigger:

drop trigger <trigger_name>;

Disabling Triggers
To disable or enable a trigger:

alter trigger <trigger_name> {disable|enable};

Aborting Triggers with Error


Triggers can often be used to enforce contraints. The WHEN clause or body of the
trigger can check for the violation of certain conditions and signal an error
accordingly using the Oracle built-in function RAISE_APPLICATION_ERROR. The
action that activated the trigger (insert, update, or delete) would be aborted. For
example, the following trigger enforces the constraint Person.age >= 0:

create table Person (age int);


CREATE TRIGGER PersonCheckAge
AFTER INSERT OR UPDATE OF age ON Person
FOR EACH ROW
BEGIN
IF (:new.age < 0) THEN
RAISE_APPLICATION_ERROR(-20000, 'no negative age allowed');
END IF;
END;
.
RUN;
If we attempted to execute the insertion:
insert into Person values (-3);
we would get the error message:
ERROR at line 1:
ORA-20000: no negative age allowed
ORA-06512: at "MYNAME.PERSONCHECKAGE", line 3
ORA-04088: error during execution of trigger 'MYNAME.PERSONCHECKAGE'
and nothing would be inserted. In general, the effects of both the trigger and the triggering
statement are rolled back.
Mutating Table Errors
Sometimes you may find that Oracle reports a "mutating table error" when your
trigger executes. This happens when the trigger is querying or modifying a
"mutating table", which is either the table whose modification activated the trigger,
or a table that might need to be updated because of a foreign key constraint with a
CASCADE policy. To avoid mutating table errors:

• A row-level trigger must not query or modify a mutating table. (Of course,
NEW and OLD still can be accessed by the trigger.)
• A statement-level trigger must not query or modify a mutating table if the
trigger is fired as the result of a CASCADE delete.

This document was written originally by Yu-May Chang and Jeff Ullman for CS145 in Autumn, 1997; revised by Jun Yang for
Prof. Jennifer Widom's CS145 class in Spring, 1998; further revisions by Jun Yang, Spring 1999; further revisions by Jennifer
Widom, Spring 2000; minor revisions by Nathan Folkert, Spring 2001, Henry Hsieh, Autumn 2001, Antonios Hondroulis,
Spring 2002, and Glen Jeh, Spring 2002.

Resources
• Database Systems: The Complete Book by Hector Garcia, Jeff Ullman, and
Jennifer Widom.
• A First Course in Database Systems by Jeff Ullman and Jennifer Widom.
• Gradiance SQL Tutorial.

Introduction to Pro*C
Embedded SQL
• Overview
• Pro*C Syntax
○ SQL
○ Preprocessor Directives
○ Statement Labels
• Host Variables
○ Basics
○ Pointers
○ Structures
○ Arrays
○ Indicator Variables
○ Datatype Equivalencing
• Dynamic SQL
• Transactions
• Error Handling
○ SQLCA
○ WHENEVER Statement
• Demo Programs
• C++ Users
• List of Embedded SQL Statements Supported by Pro*C

Overview
Embedded SQL is a method of combining the computing power of a high-level
language like C/C++ and the database manipulation capabilities of SQL. It allows
you to execute any SQL statement from an application program. Oracle's embedded
SQL environment is called Pro*C.

A Pro*C program is compiled in two steps. First, the Pro*C precompiler recognizes the SQL
statements embedded in the program, and replaces them with appropriate calls to the functions in
the SQL runtime library. The output is pure C/C++ code with all the pure C/C++ portions intact.
Then, a regular C/C++ compiler is used to compile the code and produces the executable. For
details, see the section on Demo Programs.

Pro*C Syntax

SQL
All SQL statements need to start with EXEC SQL and end with a semicolon ";". You
can place the SQL statements anywhere within a C/C++ block, with the restriction
that the declarative statements do not come after the executable statements. As an
example:

{
int a;
/* ... */
EXEC SQL SELECT salary INTO :a
FROM Employee
WHERE SSN=876543210;
/* ... */
printf("The salary is %d\n", a);
/* ... */
}
Preprocessor Directives
The C/C++ preprocessor directives that work with Pro*C are #include and #if.
Pro*C does not recognize #define. For example, the following code is invalid:

#define THE_SSN 876543210


/* ... */
EXEC SQL SELECT salary INTO :a
FROM Employee
WHERE SSN = THE_SSN; /* INVALID */

Statement Labels
You can connect C/C++ labels with SQL as in:

EXEC SQL WHENEVER SQLERROR GOTO error_in_SQL;


/* ... */
error_in_SQL:
/* do error handling */
We will come to what WHENEVER means later in the section on Error Handling.

Host Variables

Basics
Host variables are the key to the communication between the host program and the
database. A host variable expression must resolve to an lvalue (i.e., it can be
assigned). You can declare host variables according to C syntax, as you declare
regular C variables. The host variable declarations can be placed wherever C
variable declarations can be placed. (C++ users need to use a declare section; see
the section on C++ Users.) The C datatypes that can be used with Oracle include:

• char
• char[n]
• int
• short
• long
• float
• double
• VARCHAR[n] - This is a psuedo-type recognized by the Pro*C precompiler. It is
used to represent blank-padded, variable-length strings. Pro*C precompiler
will convert it into a structure with a 2-byte length field and a n-byte
character array.
You cannot use register storage-class specifier for the host variables.
A host variable reference must be prefixed with a colon ":" in SQL statements, but should not be
prefixed with a colon in C statements. When specifying a string literal via a host variable, the
single quotes must be omitted; Pro*C understands that you are specifying a string based on the
declared type of the host variable. C function calls and most of the pointer arithmetic expressions
cannot be used as host variable references even though they may indeed resolve to lvalues. The
following code illustrates both legal and illegal host variable references:
int deptnos[3] = { 000, 111, 222 };
int get_deptno() { return deptnos[2]; }
int *get_deptnoptr() { return &(deptnos[2]); }
int main() {
int x; char *y; int z;
/* ... */
EXEC SQL INSERT INTO emp(empno, ename, deptno)
VALUES(:x, :y, :z); /* LEGAL */
EXEC SQL INSERT INTO emp(empno, ename, deptno)
VALUES(:x + 1, /* LEGAL: the reference is to x */
'Big Shot', /* LEGAL: but not really a host var */
:deptnos[2]); /* LEGAL: array element is fine */
EXEC SQL INSERT INTO emp(empno, ename, deptno)
VALUES(:x, :y,
:(*(deptnos+2))); /* ILLEGAL: although it has an
lvalue */
EXEC SQL INSERT INTO emp(empno, ename, deptno)
VALUES(:x, :y,
:get_deptno()); /* ILLEGAL: no function calls */
EXEC SQL INSERT INTO emp(empno, ename, deptno)
VALUES(:x, :y,
:(*get_depnoptr())); /* ILLEGAL: although it has an lvalue */
/* ... */
}

Pointers
You can define pointers using the regular C syntax, and use them in embedded SQL
statements. As usual, prefix them with a colon:

int *x;
/* ... */
EXEC SQL SELECT xyz INTO :x FROM ...;
The result of this SELECT statement will be written into *x, not x.

Structures
Structures can be used as host variables, as illustrated in the following example:

typedef struct {
char name[21]; /* one greater than column length; for '\0' */
int SSN;
} Emp;
/* ... */
Emp bigshot;
/* ... */
EXEC SQL INSERT INTO emp (ename, eSSN)
VALUES (:bigshot);

Arrays
Host arrays can be used in the following way:

int emp_number[50];
char name[50][11];
/* ... */
EXEC SQL INSERT INTO emp(emp_number, name)
VALUES (:emp_number, :emp_name);
which will insert all the 50 tuples in one go.

Arrays can only be single dimensional. The example char name[50][11] would seem to
contradict that rule. However, Pro*C actually considers name a one-dimensional array of strings
rather than a two-dimensional array of characters. You can also have arrays of structures.
When using arrays to store the results of a query, if the size of the host array (say n) is smaller
than the actual number of tuples returned by the query, then only the first n result tuples will be
entered into the host array.

Indicator Variables
Indicator variables are essentially "NULL flags" attached to host variables. You can
associate every host variable with an optional indicator variable. An indicator
variable must be defined as a 2-byte integer (using the type short) and, in SQL
statements, must be prefixed by a colon and immediately follow its host variable.
Or, you may use the keyword INDICATOR in between the host variable and indicator
variable. Here is an example:

short indicator_var;
EXEC SQL SELECT xyz INTO :host_var:indicator_var
FROM ...;
/* ... */
EXEC SQL INSERT INTO R
VALUES(:host_var INDICATOR :indicator_var, ...);
You can use indicator variables in the INTO clause of a SELECT statement to detect
NULL's or truncated values in the output host variables. The values Oracle can assign
to an indicator variable have the following meanings:

-1 The column value is NULL, so the value of the host variable is indeterminate.

0 Oracle assigned an intact column value to the host variable.

>0 Oracle assigned a truncated column value to the host variable. The integer
returned by the indicator variable is the original length of the column value.

Oracle assigned a truncated column variable to the host variable, but the
-2
original column value could not be determined.

You can also use indicator variables in the VALUES and SET clause of an INSERT or UPDATE
statement to assign NULL's to input host variables. The values your program can assign to an
indicator variable have the following meanings:

-1 Oracle will assign a NULL to the column, ignoring the value of the host variable.

>=0 Oracle will assign the value of the host variable to the column.

Datatype Equivalencing
Oracle recognizes two kinds of datatypes: internal and external. Internal datatypes
specify how Oracle stores column values in database tables. External datatypes
specify the formats used to store values in input and output host variables. At
precompile time, a default Oracle external datatype is assigned to each host
variable. Datatype equivalencing allows you to override this default equivalencing
and lets you control the way Oracle interprets the input data and formats the output
data.

The equivalencing can be done on a variable-by-variable basis using the VAR statement. The
syntax is:
EXEC SQL VAR <host_var> IS <type_name> [ (<length>) ];
For example, suppose you want to select employee names from the emp table, and
then pass them to a routine that expects C-style '\0'-terminated strings. You need
not explicitly '\0'-terminate the names yourself. Simply equivalence a host variable
to the STRING external datatype, as follows:

char emp_name[21];
EXEC SQL VAR emp_name IS STRING(21);
The length of the ename column in the emp table is 20 characters, so you allot
emp_name 21 characters to accommodate the '\0'-terminator. STRING is an Oracle
external datatype specifically designed to interface with C-style strings. When you
select a value from the ename column into emp_name, Oracle will automatically '\0'-
terminate the value for you.

You can also equivalence user-defined datatypes to Oracle external datatypes using the TYPE
statement. The syntax is:
EXEC SQL TYPE <user_type> IS <type_name> [ (<length>) ] [REFERENCE];
You can declare a user-defined type to be a pointer, either explicitly, as a pointer to
a scalar or structure, or implicitly as an array, and then use this type in a TYPE
statement. In these cases, you need to use the REFERENCE clause at the end of the
statement, as shown below:

typedef unsigned char *my_raw;


EXEC SQL TYPE my_raw IS VARRAW(4000) REFERENCE;
my_raw buffer;
/* ... */
buffer = malloc(4004);
Here we allocated more memory than the type length (4000) because the
precompiler also returns the length, and may add padding after the length in order
to meet the alignment requirement on your system.

Dynamic SQL
While embedded SQL is fine for fixed applications, sometimes it is important for a
program to dynamically create entire SQL statements. With dynamic SQL, a
statement stored in a string variable can be issued. PREPARE turns a character string
into a SQL statement, and EXECUTE executes that statement. Consider the following
example.

char *s = "INSERT INTO emp VALUES(1234, 'jon', 3)";


EXEC SQL PREPARE q FROM :s;
EXEC SQL EXECUTE q;
Alternatively, PREPARE and EXECUTE may be combined into one statement:

char *s = "INSERT INTO emp VALUES(1234, 'jon', 3)";


EXEC SQL EXECUTE IMMEDIATE :s;

TransactionsOracle PRO*C supports transactions as defined by the SQL standard.


A transaction is a sequence of SQL statements that Oracle treats as a single unit of
work. A transaction begins at your first SQL statement. A transaction ends when you
issue "EXEC SQL COMMIT" (to make permanent any database changes during the
current transaction) or "EXEC SQL ROLLBACK" (to undo any changes since the current
transaction began). After the current transaction ends with your COMMIT or ROLLBACK
statement, the next executable SQL statement will automatically begin a new
transaction.

If your program exits without calling EXEC SQL COMMIT, all database changes will be discarded.

Error HandlingAfter each executable SQL statement, your program can find the
status of execution either by explicit checking of SQLCA, or by implicit checking
using the WHENEVER statement. These two ways are covered in details below.
SQLCA
SQLCA (SQL Communications Area) is used to detect errors and status changes in
your program. This structure contains components that are filled in by Oracle at
runtime after every executable SQL statement.

To use SQLCA you need to include the header file sqlca.h using the #include directive. In
case you need to include sqlca.h at many places, you need to first undefine the macro SQLCA
with #undef SQLCA. The relevant chunk of sqlca.h follows:
#ifndef SQLCA
#define SQLCA 1

struct sqlca {
/* ub1 */ char sqlcaid[8];
/* b4 */ long sqlabc;
/* b4 */ long sqlcode;
struct {
/* ub2 */ unsigned short sqlerrml;
/* ub1 */ char sqlerrmc[70];
} sqlerrm;
/* ub1 */ char sqlerrp[8];
/* b4 */ long sqlerrd[6];
/* ub1 */ char sqlwarn[8];
/* ub1 */ char sqlext[8];
};
/* ... */
The fields in sqlca have the following meaning:

sqlcaid This string component is initialized to "SQLCA" to identify the SQL


Communications Area.

sqlcabc This integer component holds the length, in bytes, of the SQLCA structure.

sqlcode This integer component holds the status code of the most recently
executed SQL statement:

0 No error.

Statement executed but exception detected. This occurs when


>0 Oracle cannot find a row that meets your WHERE condition or
when a SELECT INTO or FETCH returns no rows.

Oracle did not execute the statement because of an error. When


<0 such errors occur, the current transaction should, in most cases,
be rolled back.
sqlerrm This embedded structure contains the following two components:

• sqlerrml - Length of the message text stored in sqlerrmc.


• sqlerrmc - Up to 70 characters of the message text corresponding to
the error code stored in sqlcode.

sqlerrp Reserved for future use.

sqlerrd This array of binary integers has six elements:

• sqlerrd[0] - Future use.


• sqlerrd[1] - Future use.
• sqlerrd[2] - Numbers of rows processed by the most recent SQL
statement.
• sqlerrd[3] - Future use.
• sqlerrd[4] - Offset that specifies the character position at which a
parse error begins in the most recent SQL statement.
• sqlerrd[5] - Future use.

sqlwarn This array of single characters has eight elements used as warning flags.
Oracle sets a flag by assigning to it the character 'W'.

sqlwarn[0] Set if any other flag is set.

Set if a truncated column value was assigned to an


sqlwarn[1]
output host variable.

Set if a NULL column value is not used in computing a


sqlwarn[2]
SQL aggregate such as AVG or SUM.

Set if the number of columns in SELECT does not equal


sqlwarn[3]
the number of host variables specified in INTO.

Set if every row in a table was processed by an UPDATE


sqlwarn[4]
or DELETE statement without a WHERE clause.

Set if a procedure/function/package/package body


sqlwarn[5] creation command fails because of a PL/SQL
compilation error.

sqlwarn[6] No longer in use.

sqlwarn[7] No longer in use.


sqlext Reserved for future use.

SQLCA can only accommodate error messages up to 70 characters long in its sqlerrm
component. To get the full text of longer (or nested) error messages, you need the sqlglm()
function:
void sqlglm(char *msg_buf, size_t *buf_size, size_t *msg_length);
where msg_buf is the character buffer in which you want Oracle to store the error
message; buf_size specifies the size of msg_buf in bytes; Oracle stores the actual
length of the error message in *msg_length. The maximum length of an Oracle error
message is 512 bytes.

WHENEVER Statement
This statement allows you to do automatic error checking and handling. The syntax
is:

EXEC SQL WHENEVER <condition> <action>;


Oracle automatically checks SQLCA for <condition>, and if such condition is
detected, your program will automatically perform <action>.

<condition> can be any of the following:


• SQLWARNING - sqlwarn[0] is set because Oracle returned a warning
• SQLERROR - sqlcode is negative because Oracle returned an error
• NOT FOUND - sqlcode is positive because Oracle could not find a row that meets
your WHERE condition, or a SELECT INTO or FETCH returned no rows
<action> can be any of the following:

• CONTINUE - Program will try to continue to run with the next statement if
possible
• DO - Program transfers control to an error handling function
• GOTO <label> - Program branches to a labeled statement
• STOP - Program exits with an exit() call, and uncommitted work is rolled back
Some examples of the WHENEVER statement:
EXEC SQL WHENEVER SQLWARNING DO print_warning_msg();
EXEC SQL WHENEVER NOT FOUND GOTO handle_empty;
Here is a more concrete example:

/* code to find student name given id */


/* ... */
for (;;) {
printf("Give student id number : ");
scanf("%d", &id);
EXEC SQL WHENEVER NOT FOUND GOTO notfound;
EXEC SQL SELECT studentname INTO :st_name
FROM student
WHERE studentid = :id;
printf("Name of student is %s.\n", st_name);
continue;
notfound:
printf("No record exists for id %d!\n", id);
}
/* ... */
Note that the WHENEVER statement does not follow regular C scoping rules.
Scoping is valid for the entire program. For example, if you have the following
statement somewhere in your program (such as before a loop):

EXEC SQL WHENEVER NOT FOUND DO break;


All SQL statements that occur after this line in the file would be affected. Make sure you use the
following line to cancel the effect of WHENEVER when it is no longer needed (such as after
your loop):
EXEC SQL WHENEVER NOT FOUND CONTINUE;

Demo Programs
Note: The demo programs will create and use four tables named DEPT, EMP, PAY1,
and PAY2. Be careful if any table in your database happens to have the same name!

Several demo programs are available in /afs/ir/class/cs145/code/proc on the leland


system. They are named sample*.pc (for C users) and cppdemo*.pc (for C++ users). ".pc" is
the extension for Pro*C code. Do not copy these files manually, since there are a couple of
customizations to do. To download and customize the demo programs, follow the instructions
below:
1. Make sure that you have run source /afs/ir/class/cs145/all.env
2. In your home directory, run load_samples <db_username> <db_passwd>
<sample_dir>, where <sample_dir> is the name of the directory where you
wish to put demo programs (e.g., load_samples sally etaoinshrdlu
cs145_samples)
3. cd <sample_dir>
4. Run make samples (or make cppsamples for C++) to compile all demo programs
Step (2) will set up the sample database, create a new directory as specified in
<sample_dir>, and copy the demo files into that directory. It will also change the
user name and password in the sample programs to be yours, so that you do not
have to type in your username and password every time when running a sample
program. However, sample1 and cppdemo1 do provide an interface for the user to
input the username and password, in case you would like to learn how to do it.
If you happen to make any mistake when entering username or password in Step (2), just run
clean_samples <db_username> <db_passwd> <sample_dir> in your home directory, and
then repeat Steps (2) to (4).
For Step (4), you can also compile each sample program separately. For example, make sample1
compiles sample1.pc alone. The compilation process actually has two phases:
1. proc iname=sample1.pc
converts the embedded SQL code to corresponding library calls and outputs
sample1.c
2. cc <a_number_of_flags_here> sample1.c
generates the executable sample1
To compile your own code, say, foo.pc, just change a few variables in Makefile: Add
the program name foo to variable SAMPLES and the source file name foo.pc to
variable SAMPLE_SRC. Then, do make foo after foo.pc is ready. foo.pc will be
precompiled to foo.c and then compiled to foo, the executable. C++ users will need
to add their program name to CPPSAMPLES instead of SAMPLES, and source file name
to CPPSAMPLE_SRC instead of SAMPLE_SRC.
The demo programs operate on the following tables:
CREATE TABLE DEPT
(DEPTNO NUMBER(2) NOT NULL,
DNAME VARCHAR2(14),
LOC VARCHAR2(13));

CREATE TABLE EMP


(EMPNO NUMBER(4) NOT NULL,
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4),
HIREDATE DATE,
SAL NUMBER(7, 2),
COMM NUMBER(7, 2),
DEPTNO NUMBER(2));

CREATE TABLE PAY1


(ENAME VARCHAR2(10),
SAL NUMBER(7, 2));

CREATE TABLE PAY2


(ENAME VARCHAR2(10),
SAL NUMBER(7, 2));
These tables are created automatically when you run load_samples in Step (2). A
few tuples are also inserted. You may like to browse the tables before running the
samples on them. You can also play with them as you like (e.g., inserting, deleting,
or updating tuples). These tables will be dropped automatically when you run
clean_samples. Note:clean_samples also wipes out the entire <sample_dir>; make
sure you move your own files to some other place before running this command!

You should take a look at the sample source code before running it. The comments at the top
describe what the program does. For example, sample1 takes an employee's EMPNO and retrieve
the name, salary, and commission for that employee from the table EMP.
You are supposed to study the sample source code and learn the following:
• How to connect to Oracle from the host program
• How to embed SQL in C/C++
• How to use cursors
• How to use host variables to communicate with the database
• How to use WHENEVER to take different actions on error messages.
• How to use indicator variables to detect NULL's in the output
Now, you can use these techniques to code your own database application program.
And have fun!

C++ Users
To get the precompiler to generate appropriate C++ code, you need to be aware of
the following issues:

• Code emission by precompiler. To get C++ code, you need to set the option
CODE=CPP while executing proc. C users need not worry about this option; the
default caters to their needs.
• Parsing capability. The PARSE option of proc may take the following values:
○ PARSE=NONE. C preprocessor directives are understood only inside a
declare section, and all host variables need to be declared inside a
declare section.
○ PARSE=PARTIAL. C preprocessor directives are understood; however, all
host variables need to be declared inside a declare section.
○ PARSE=FULL. C preprocessor directives are understood and host
variables can be declared anywhere. This is the default when CODE is
anything other than CPP; it is an error to specify PARSE=FULL with
CODE=CPP.
So, C++ users must specify PARSE=NONE or PARSE=PARTIAL. They therefore lose
the freedom to declare host variables anywhere in the code. Rather, the host
variables must be encapsulated in declare sections as follows:
EXEC SQL BEGIN DECLARE SECTION;
// declarations...
EXEC SQL END DECLARE SECTION;
You need to follow this routine for declaring the host and indicator variables
at all the places you do so.

• File extension. You need to specify the option CPP_SUFFIX=cc or CPP_SUFFIX=C.


• Location of header files. By default, proc searches for header files like stdio.h
in standard locations. However, C++ has its own header files, such as
iostream.h, located elsewhere. So you need to use the SYS_INCLUDE option to
specify the paths that proc should search for header files.

List of Embedded SQL Statements Supported by Pro*C


Declarative Statements

EXEC SQL ARRAYLEN To use host arrays with PL/SQL


EXEC SQL BEGIN DECLARE SECTION
To declare host variables
EXEC SQL END DECLARE SECTION

EXEC SQL DECLARE To name Oracle objects

EXEC SQL INCLUDE To copy in files

EXEC SQL TYPE To equivalence datatypes

EXEC SQL VAR To equivalence variables

EXEC SQL WHENEVER To handle runtime errors

Executable Statements

EXEC SQL ALLOCATE

EXEC SQL ALTER

EXEC SQL ANALYZE

EXEC SQL AUDIT

EXEC SQL COMMENT

EXEC SQL CONNECT

EXEC SQL CREATE


To define and control Oracle data
EXEC SQL DROP

EXEC SQL GRANT

EXEC SQL NOAUDIT

EXEC SQL RENAME

EXEC SQL REVOKE

EXEC SQL TRUNCATE

EXEC SQL CLOSE

EXEC SQL DELETE To query and manipulate Oracle data


EXEC SQL EXPLAIN PLAN

EXEC SQL FETCH

EXEC SQL INSERT

EXEC SQL LOCK TABLE


EXEC SQL OPEN

EXEC SQL SELECT

EXEC SQL UPDATE

EXEC SQL COMMIT

EXEC SQL ROLLBACK


To process transactions
EXEC SQL SAVEPOINT

EXEC SQL SET TRANSACTION

EXEC SQL DESCRIBE

EXEC SQL EXECUTE To use dynamic SQL

EXEC SQL PREPARE

EXEC SQL ALTER SESSION


To control sessions
EXEC SQL SET ROLE

EXEC SQL EXECUTE


To embed PL/SQL blocks
END-EXEC

This document was written originally by Ankur Jain and Jeff Ullman for CS145, Autumn 1997; revised by Jun Yang for Prof.
Jennifer Widom's CS145 class in Spring, 1998; further revisions by Roy Goldman for Prof. Jeff Ullman's CS145 class in
Autumn, 1999; further revisions by Calvin Yang for Prof. Jennifer Widom's CS145 class in Spring, 2002.

Resources
• Database Systems: The Complete Book by Hector Garcia, Jeff Ullman, and
Jennifer Widom.
• A First Course in Database Systems by Jeff Ullman and Jennifer Widom.
• Gradiance SQL Tutorial.

Introduction to JDBC
This document illustrates the basics of the JDBC (Java Database Connectivity) API
(Application Program Interface). Here, you will learn to use the basic JDBC API to
create tables, insert values, query tables, retrieve results, update tables, create
prepared statements, perform transactions and catch exceptions and errors.
This document draws from the official Sun tutorial on JDBC Basics.
• Overview
• Establishing a Connection
• Creating a JDBC Statement
• Creating a JDBC PreparedStatement
• Executing CREATE/INSERT/UPDATE Statements
• Executing SELECT Statements
• Notes on Accessing ResultSet
• Transactions
• Handling Errors with Exceptions
• Sample Code and Compilation Instructions

Overview
Call-level interfaces such as JDBC are programming interfaces allowing external
access to SQL database manipulation and update commands. They allow the
integration of SQL calls into a general programming environment by providing
library routines which interface with the database. In particular, Java based JDBC
has a rich collection of routines which make such an interface extremely simple and
intuitive.

Here is an easy way of visualizing what happens in a call level interface: You are writing a
normal Java program. Somewhere in the program, you need to interact with a database. Using
standard library routines, you open a connection to the database. You then use JDBC to send
your SQL code to the database, and process the results that are returned. When you are done, you
close the connection.
Such an approach has to be contrasted with the precompilation route taken with Embedded SQL.
The latter has a precompilation step, where the embedded SQL code is converted to the host
language code(C/C++). Call-level interfaces do not require precompilation and thus avoid some
of the problems of Embedded SQL. The result is increased portability and a cleaner client-server
relationship.

Establishing A Connection
The first thing to do, of course, is to install Java, JDBC and the DBMS on your
working machines. Since we want to interface with an Oracle database, we would
need a driver for this specific database as well. Fortunately, we have a responsible
administrator who has already done all this for us on the Leland machines.

As we said earlier, before a database can be accessed, a connection must be opened between our
program(client) and the database(server). This involves two steps:
• Load the vendor specific driver
Why would we need this step? To ensure portability and code reuse, the API was
designed to be as independent of the version or the vendor of a database as possible.
Since different DBMS's have different behavior, we need to tell the driver manager
which DBMS we wish to use, so that it can invoke the correct driver.
An Oracle driver is loaded using the following code snippet:
Class.forName("oracle.jdbc.driver.OracleDriver")
• Make the connection
Once the driver is loaded and ready for a connection to be made, you may create an
instance of a Connection object using:
Connection con = DriverManager.getConnection(
"jdbc:oracle:thin:@dbaprod1:1544:SHR1_PRD", username, passwd);
Okay, lets see what this jargon is. The first string is the URL for the database including
the protocol (jdbc), the vendor (oracle), the driver (thin), the server (dbaprod1), the port
number (1521), and a server instance (SHR1_PRD). The username and passwd are your
username and password, the same as you would enter into SQLPLUS to access your
account.
That's it! The connection returned in the last step is an open connection which we will use to
pass SQL statements to the database. In this code snippet, con is an open connection, and we will
use it below. Note:The values mentioned above are valid for our (Leland) environment. They
would have different values in other environments.

Creating JDBC Statements


A JDBC Statement object is used to send your SQL statements to the DBMS, and should not to
be confused with an SQL statement. A JDBC Statement object is associated with an open
connection, and not any single SQL Statement. You can think of a JDBC Statement object as a
channel sitting on a connection, and passing one or more of your SQL statements (which you ask
it to execute) to the DBMS.
An active connection is needed to create a Statement object. The following code snippet, using
our Connection object con, does it for you:
Statement stmt = con.createStatement() ;
At this point, a Statement object exists, but it does not have an SQL statement to pass on to the
DBMS. We learn how to do that in a following section.

Creating JDBC PreparedStatement


Sometimes, it is more convenient or more efficient to use a PreparedStatement object for
sending SQL statements to the DBMS. The main feature which distinguishes it from its
superclass Statement, is that unlike Statement, it is given an SQL statement right when it is
created. This SQL statement is then sent to the DBMS right away, where it is compiled. Thus, in
effect, a PreparedStatement is associated as a channel with a connection and a compiled SQL
statement.
The advantage offered is that if you need to use the same, or similar query with different
parameters multiple times, the statement can be compiled and optimized by the DBMS just once.
Contrast this with a use of a normal Statement where each use of the same SQL statement
requires a compilation all over again.
PreparedStatements are also created with a Connection method. The following snippet shows
how to create a parameterized SQL statement with three input parameters:
PreparedStatement prepareUpdatePrice = con.prepareStatement(
"UPDATE Sells SET price = ? WHERE bar = ? AND beer = ?");
Before we can execute a PreparedStatement, we need to supply values for the parameters. This
can be done by calling one of the setXXX methods defined in the class PreparedStatement.
Most often used methods are setInt, setFloat, setDouble, setString etc. You can set
these values before each execution of the prepared statement.
Continuing the above example, we would write:
prepareUpdatePrice.setInt(1, 3);
prepareUpdatePrice.setString(2, "Bar Of Foo");
prepareUpdatePrice.setString(3, "BudLite");

Executing CREATE/INSERT/UPDATE Statements


Executing SQL statements in JDBC varies depending on the ``intention'' of the SQL statement.
DDL (data definition language) statements such as table creation and table alteration statements,
as well as statements to update the table contents, are all executed using the method
executeUpdate. Notice that these commands change the state of the database, hence the name of
the method contains ``Update''.
The following snippet has examples of executeUpdate statements.
Statement stmt = con.createStatement();

stmt.executeUpdate("CREATE TABLE Sells " +


"(bar VARCHAR2(40), beer VARCHAR2(40), price REAL)" );
stmt.executeUpdate("INSERT INTO Sells " +
"VALUES ('Bar Of Foo', 'BudLite', 2.00)" );

String sqlString = "CREATE TABLE Bars " +


"(name VARCHAR2(40), address VARCHAR2(80), license INT)" ;
stmt.executeUpdate(sqlString);
Since the SQL statement will not quite fit on one line on the page, we have split it into two
strings concatenated by a plus sign(+) so that it will compile. Pay special attention to the space
following "INSERT INTO Sells" to separate it in the resulting string from "VALUES". Note also
that we are reusing the same Statement object rather than having to create a new one.
When executeUpdate is used to call DDL statements, the return value is always zero, while data
modification statement executions will return a value greater than or equal to zero, which is the
number of tuples affected in the relation.
While working with a PreparedStatement, we would execute such a statement by first plugging
in the values of the parameters (as seen above), and then invoking the executeUpdate on it.
int n = prepareUpdatePrice.executeUpdate() ;

Executing SELECT Statements


As opposed to the previous section statements, a query is expected to return a set of tuples as the
result, and not change the state of the database. Not surprisingly, there is a corresponding method
called executeQuery, which returns its results as a ResultSet object:
String bar, beer ;
float price ;

ResultSet rs = stmt.executeQuery("SELECT * FROM Sells");


while ( rs.next() ) {
bar = rs.getString("bar");
beer = rs.getString("beer");
price = rs.getFloat("price");
System.out.println(bar + " sells " + beer + " for " + price + "
Dollars.");
}
The bag of tuples resulting from the query are contained in the variable rs which is an instance
of ResultSet. A set is of not much use to us unless we can access each row and the attributes in
each row. The ResultSet provides a cursor to us, which can be used to access each row in turn.
The cursor is initially set just before the first row. Each invocation of the method next causes it
to move to the next row, if one exists and return true, or return false if there is no remaining
row.
We can use the getXXX method of the appropriate type to retrieve the attributes of a row. In the
previous example, we used getString and getFloat methods to access the column values.
Notice that we provided the name of the column whose value is desired as a parameter to the
method. Also note that the VARCHAR2 type bar, beer have been converted to Java String, and
the REAL to Java float.
Equivalently, we could have specified the column number instead of the column name, with the
same result. Thus the relevant statements would be:
bar = rs.getString(1);
price = rs.getFloat(3);
beer = rs.getString(2);
While working with a PreparedStatement, we would execute a query by first plugging in the
values of the parameters, and then invoking the executeQuery on it.
ResultSet rs = prepareUpdatePrice.executeQuery() ;

Notes on Accessing ResultSet


JDBC also offers you a number of methods to find out where you are in the result set using
getRow, isFirst, isBeforeFirst, isLast, isAfterLast.
There are means to make scroll-able cursors allow free access of any row in the result set. By
default, cursors scroll forward only and are read only. When creating a Statement for a
Connection, you can change the type of ResultSet to a more flexible scrolling or updatable
model:
Statement stmt = con.createStatement(
ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
ResultSet rs = stmt.executeQuery("SELECT * FROM Sells");
The different options for types are TYPE_FORWARD_ONLY, TYPE_SCROLL_INSENSITIVE, and
TYPE_SCROLL_SENSITIVE. You can choose whether the cursor is read-only or updatable using
the options CONCUR_READ_ONLY, and CONCUR_UPDATABLE. With the default cursor, you can scroll
forward using rs.next(). With scroll-able cursors you have more options:
rs.absolute(3); // moves to the third tuple
rs.previous(); // moves back one tuple (tuple 2)
rs.relative(2); // moves forward two tuples (tuple 4)
rs.relative(-3); // moves back three tuples (tuple 1)
There are a great many more details to the scroll-able cursor feature. Scroll-able cursors, though
useful for certain applications, are extremely high-overhead, and should be used with restraint
and caution. More information can be found at the New Features in the JDBC 2.0 API, where
you can find a more detailed tutorial on the cursor manipulation techniques.

Transactions
JDBC allows SQL statements to be grouped together into a single transaction. Thus, we can
ensure the ACID (Atomicity, Consistency, Isolation, Durability) properties using JDBC
transactional features.
Transaction control is performed by the Connection object. When a connection is created, by
default it is in the auto-commit mode. This means that each individual SQL statement is treated
as a transaction by itself, and will be committed as soon as it's execution finished. (This is not
exactly precise, but we can gloss over this subtlety for most purposes).
We can turn off auto-commit mode for an active connection with :
con.setAutoCommit(false) ;
and turn it on again with :

con.setAutoCommit(true) ;
Once auto-commit is off, no SQL statements will be committed (that is, the database will not be
permanently updated) until you have explicitly told it to commit by invoking the commit()
method:
con.commit() ;
At any point before commit, we may invoke rollback() to rollback the transaction,
and restore values to the last commit point (before the attempted updates).

Here is an example which ties these ideas together:


con.setAutoCommit(false);
Statement stmt = con.createStatement();
stmt.executeUpdate("INSERT INTO Sells VALUES('Bar Of Foo', 'BudLite',
1.00)" );
con.rollback();
stmt.executeUpdate("INSERT INTO Sells VALUES('Bar Of Joe', 'Miller',
2.00)" );
con.commit();
con.setAutoCommit(true);
Lets walk through the example to understand the effects of various methods. We first set auto-
commit off, indicating that the following statements need to be considered as a unit. We attempt
to insert into the Sells table the ('Bar Of Foo', 'BudLite', 1.00) tuple. However, this
change has not been made final (committed) yet. When we invoke rollback, we cancel our
insert and in effect we remove any intention of inserting the above tuple. Note that Sells now is
still as it was before we attempted the insert. We then attempt another insert, and this time, we
commit the transaction. It is only now that Sells is now permanently affected and has the new
tuple in it. Finally, we reset the connection to auto-commit again.
We can also set transaction isolation levels as desired. For example, we can set the transaction
isolation level to TRANSACTION_READ_COMMITTED, which will not allow a value to be accessed
until after it has been committed, and forbid dirty reads. There are five such values for isolation
levels provided in the Connection interface. By default, the isolation level is serializable. JDBC
allows us to find out the transaction isolation level the database is set to (using the Connection
method getTransactionIsolation) and set the appropriate level (using the Connection
method setTransactionIsolation method).
Usually rollback will be used in combination with Java's exception handling ability to recover
from (un)predictable errors. Such a combination provides an excellent and easy mechanism for
handling data integrity. We study error handling using JDBC in the next section.

Handling Errors with Exceptions


The truth is errors always occur in software programs. Often, database programs are critical
applications, and it is imperative that errors be caught and handled gracefully. Programs should
recover and leave the database in a consistent state. Rollback-s used in conjunction with Java
exception handlers are a clean way of achieving such a requirement.
The client(program) accessing a server(database) needs to be aware of any errors returned from
the server. JDBC give access to such information by providing two levels of error conditions:
SQLException and SQLWarning. SQLExceptions are Java exceptions which, if not handled, will
terminate the application. SQLWarnings are subclasses of SQLException, but they represent
nonfatal errors or unexpected conditions, and as such, can be ignored.
In Java, statements which are expected to ``throw'' an exception or a warning are enclosed in a
try block. If a statement in the try block throws an exception or a warning, it can be ``caught''
in one of the corresponding catch statements. Each catch statement specifies which exceptions
it is ready to ``catch''.
Here is an example of catching an SQLException, and using the error condition to rollback the
transaction:
try {
con.setAutoCommit(false) ;
stmt.executeUpdate("CREATE TABLE Sells (bar VARCHAR2(40), " +
"beer VARHAR2(40), price REAL)") ;
stmt.executeUpdate("INSERT INTO Sells VALUES " +
"('Bar Of Foo', 'BudLite', 2.00)") ;
con.commit() ;
con.setAutoCommit(true) ;

}catch(SQLException ex) {
System.err.println("SQLException: " + ex.getMessage()) ;
con.rollback() ;
con.setAutoCommit(true) ;
}
In this case, an exception is thrown because beer is defined as VARHAR2 which is a mis-spelling.
Since there is no such data type in our DBMS, an SQLException is thrown. The output in this
case would be:
Message: ORA-00902: invalid datatype
Alternatively, if your datatypes were correct, an exception might be thrown in case
your database size goes over space quota and is unable to construct a new table.
SQLWarnings can be retrieved from Connection objects, Statement objects, and
ResultSet objects. Each only stores the most recent SQLWarning. So if you execute
another statement through your Statement object, for instance, any earlier warnings
will be discarded. Here is a code snippet which illustrates the use of SQLWarnings:

ResultSet rs = stmt.executeQuery("SELECT bar FROM Sells") ;


SQLWarning warn = stmt.getWarnings() ;
if (warn != null)
System.out.println("Message: " + warn.getMessage()) ;
SQLWarning warning = rs.getWarnings() ;
if (warning != null)
warning = warning.getNextWarning() ;
if (warning != null)
System.out.println("Message: " + warn.getMessage()) ;
SQLWarnings (as opposed to SQLExceptions) are actually rather rare -- the most
common is a DataTruncation warning. The latter indicates that there was a problem
while reading or writing data from the database.

Sample Code and Compilation Instructions


Hopefully, by now you are familiar enough with JDBC to write serious code. Here is a
simple program which ties all the ideas in the tutorial together.

We have a few more pieces of sample code written by Craig Jurney at ITSS for educational
purposes. Feel free to use sample code as a guideline or even a skeleton for code that you write
in the future, but make a note that you were basing your solution on provided code.
SQLBuilder.java - Creation of a Relation
SQLLoader.java - Insertion of Tuples
SQLRunner.java - Processes Queries
SQLUpdater.java - Updating Tuples
SQLBatchUpdater.java - Batch Updating
SQLUtil.java - JDBC Utility Functions
Don't forget to use source /usr/class/cs145/all.env, which will correctly set your
classpath. By adding this to your global classpath you simplify commands. For example, you can
say:
elaine19:~$ javac SQLBuilder.java
elaine19:~$ java SQLBuilder
instead of:
elaine19:~$ javac SQLBuilder.java
elaine19:~$ java -classpath
/usr/pubsw/apps/oracle/8.1.5/jdbc/lib/classes111.zip:. SQLBuilder
There are static final values in each of the .java files for USERNAME and PASSWORD.
These must be changed to your own username and your own password so that you
can access the database.

This document was written originally by Nathan Folkert for Prof. Jennifer Widom's CS145 class, Spring 2000.
Subsequently, it was hacked by Mayank Bawa for Prof. Jeff Ullman's CS145 class, Fall 2000. Jim Zhuang made
a minor update for Summer 2005. Thanks to Matt Laue for typo correction.

Resources
• Database Systems: The Complete Book by Hector Garcia, Jeff Ullman, and Jennifer
Widom.
• A First Course in Database Systems by Jeff Ullman and Jennifer Widom.
• Gradiance SQL Tutorial.

Object-Relational Features of Oracle


• Defining Types
• Dropping Types
• Constructing Objects
• Methods
• Queries Involving Types
• Declaring Types For Relations
• References
• Nested Tables
• Nested Tables of References
• Converting Relations to Object-Relations

Defining Types
Oracle allows us to define types similar to the types of SQL. The syntax is
CREATE TYPE t AS OBJECT (
list of attributes and methods
);
/
• Note the slash at the end, needed to get Oracle to process the type definition.
For example here is a definition of a point type consisting of two numbers:
CREATE TYPE PointType AS OBJECT (
x NUMBER,
y NUMBER
);
/
An object type can be used like any other type in further declarations of object-types or table-
types. For instance, we might define a line type by:
CREATE TYPE LineType AS OBJECT (
end1 PointType,
end2 PointType
);
/
Then, we could create a relation that is a set of lines with ``line ID's'' as:
CREATE TABLE Lines (
lineID INT,
line LineType
);

Dropping Types
To get rid of a type such as LineType, we say:
DROP TYPE Linetype;
However, before dropping a type, we must first drop all tables and other types that use this type.
Thus, the above would fail because table Lines still exists and uses LineType.

Constructing Object Values


Like C++, Oracle provides built-in constructors for values of a declared type, and these
constructors bear the name of the type. Thus, a value of type PointType is formed by the word
PointType and a parenthesized list of appropriate values. For example, here is how we would
insert into Lines a line with ID 27 that ran from the origin to the point (3,4):
INSERT INTO Lines
VALUES(27, LineType(
PointType(0.0, 0.0),
PointType(3.0, 4.0)
)
);
That is, we construct two values of type PointType, these values are used to construct a value of
type LineType, and that value is used with the integer 27 to construct a tuple for Lines.

Declaring and Defining Methods


A type declaration can also include methods that are defined on values of that type. The method
is declared by MEMBER FUNCTION or MEMBER PROCEDURE in the CREATE TYPE statement, and the
code for the function itself (the definition of the method) is in a separate CREATE TYPE BODY
statement.
Methods have available a special tuple variable SELF, which refers to the ``current'' tuple. If SELF
is used in the definition of the method, then the context must be such that a particular tuple is
referred to. There are some examples of applying methods correctly in The Section on Queries
and The Section on Row Types.
For example, we might want to add a length function to LineType. This function will apply to
the ``current'' line object, but when it produces the length, it also multiplies by a ``scale factor.''
We revise the declaration of LineType to be:
CREATE TYPE LineType AS OBJECT (
end1 PointType,
end2 PointType,
MEMBER FUNCTION length(scale IN NUMBER) RETURN NUMBER,
PRAGMA RESTRICT_REFERENCES(length, WNDS)
);
/
• Like ODL methods, you need to specify the mode of each argument --- either IN, OUT, or
INOUT.
• It is legal, and quite common, for a method to take zero arguments. If so, omit the
parentheses after the function name.
• Note the ``pragma'' that says the length method will not modify the database (WNDS =
write no database state). This clause is necessary if we are to use length in queries.
All methods for a type are then defined in a single CREATE BODY statement, for example:
CREATE TYPE BODY LineType AS
MEMBER FUNCTION length(scale NUMBER) RETURN NUMBER IS
BEGIN
RETURN scale *
SQRT((SELF.end1.x-SELF.end2.x)*(SELF.end1.x-
SELF.end2.x) +
(SELF.end1.y-SELF.end2.y)*(SELF.end1.y-SELF.end2.y)
);
END;
END;
/
• Notice that the mode of the argument is not given here.

Queries to Relations That Involve User-Defined Types


Values of components of an object are accessed with the dot notation. We actually saw an
example of this notation above, as we found the x-component of point end1 by referring to
end1.x, and so on. In general, if N refers to some object O of type T, and one of the components
(attribute or method) of type T is A, then N.A refers to this component of object O.
For example, the following query finds the lengths of all the lines in relation Lines, using scale
factor 2 (i.e., it actually produces twice these lengths).
SELECT lineID, ll.line.length(2.0)
FROM Lines ll;
• Note that in order to access fields of an object, we have to start with an alias of a relation
name. While lineID, being a top-level attribute of relation LInes, can be referred to
normally, in order to get into the attribute line, we need to give relation Lines an alias
(we chose ll) and use it to start all paths to the desired subobjects.
• Dropping the ``ll.'' or replacing it by ``Lines.'' doesn't work.
• Notice also the use of a method in a query. Since line is an attribute of type LineType,
one can apply to it the methods of that type, using the dot notation shown.
Here are some other queries about the relation lines.
SELECT ll.line.end1.x, ll.line.end1.y
FROM Lines ll;
prints the x and y coordinates of the first end of each line.
SELECT ll.line.end2
FROM Lines ll;
prints the second end of each line, but as a value of type PointType, not as a pair of numbers.
For instance, one line of output would be PointType(3,4). Notice that type constructors are
used for output as well as for input.

Types Can Also Be Relation Schemas


The uses of types so far has been as ``column types,'' that is, types of attributes. In a CREATE
TABLE statement we can replace the parenthesized list of schema elements by the keyword OF
and the name of a type. This type is then said to be used as a ``row type.'' For example, to create
a relation each of whose tuples is a pair of points, we could say:
CREATE TABLE Lines1 OF LineType;
It is as if we had defined Lines1 by:
CREATE TABLE Lines1 (
end1 PointType,
end2 PointType
);
but the method length is also available whenever we refer to a tuple of lines1. For instance, we
could compute the average length of a line by:
SELECT AVG(ll.length(1.0))
FROM Lines1 ll;

References as a Type
For every type t, REF t is the type of references (object ID's if you will) to values of type t. This
type can be used in places where a type is called for. For instance, we could create a relation
Lines2 whose tuples were pairs of references to points:
CREATE TABLE Lines2 (
end1 REF PointType,
end2 REF PointType
);
We can use REF to create references from actual values. For example, suppose we have a relation
Points whose tuples are objects of type PointType. That is, Points is declared by:
CREATE TABLE Points OF PointType;
We could make Lines2 be the set of all lines between pairs of these points that go from left to
right (i.e., the x-value of the first is less than the x-value of the second) by:
INSERT INTO Lines2
SELECT REF(pp), REF(qq)
FROM Points pp, Points qq
WHERE pp.x < qq.x;
There are several important prohibitions, where you might imagine you could arrange for a
reference to an object, but you cannot.
• The points referred to must be tuples of a relation of type PointType, such as Points
above. They cannot be objects appearing in some column of another relation.
• It is not permissible to invent an object outside of any relation and try to make a reference
to it. For instance, we could not insert into Lines2 a tuple with contrived references such
as VALUES(REF(PointType(1,2)), REF(PointType(3,4))), even though the types of
things are right. The problem is that the points such as PointType(1,2) don't ``live'' in
any relation.
To follow a reference, we use the dot notation, as if the attribute of reference type were really the
same as the value referred to. For instance, this query gets the x-coordinates of the ends of all the
lines in Lines2.
SELECT ll.end1.x, ll.end2.x
FROM Lines2 ll;

Nested Tables
A more powerful use of object types in Oracle is the fact that the type of a column can be a table-
type. That is, the value of an attribute in one tuple can be an entire relation, as suggested by the
picture below, where a relation with schema (a,b) has b-values that are relations with schema
(x,y,z).
a b
x y z
- - -
-
- - -
- - -
x y z
-
- - -
x y z
- - - -
- - -
In order to have a relation as a type of some attribute, we first have to define a type using the AS
TABLE OF clause. For instance:
CREATE TYPE PolygonType AS TABLE OF PointType;
/
says that the type PolygonType is a relation whose tuples are of type PointType; i.e., they have
two components, x and y, which are real numbers.
Now, we can declare a relation one of whose columns has values that represent polygons; i.e.,
they are sets of points. A possible declaration, in which polygons are represented by a name and
a set of points is:
CREATE TABLE Polygons (
name VARCHAR2(20),
points PolygonType)
NESTED TABLE points STORE AS PointsTable;
The ``tiny'' relations that represent individual polygons are not stored directly as values of the
points attribute. Rather, they are stored in a single table, whose name must be declared
(although we cannot refer to it in any way). We see this declaration following the parenthesized
list of attributes for the table; the name PointsTable was chosen to store the relations of type
PolygonType.
• Be careful to get the punctuation right. There is one semicolon ending the CREATE TABLE
statement, and it goes after both the parenthesized list of attributes and the NESTED TABLE
clause.
When we insert into a relation like Polygons that has one or more columns that are of nested-
relation type, we use the type constructor for the nested-relation type (PolygonType in our
example) to surround the value of one of these nested relations. The value of the nested relation
is represented by a list of values of the appropriate type; in our example that type is PointType
and is represented by the type constructor of the same name.
Here is a statement inserting a polygon named ``square'' that consists of four points, the corners
of the unit square.
INSERT INTO Polygons VALUES(
'square', PolygonType(PointType(0.0, 0.0), PointType(0.0, 1.0),
PointType(1.0, 0.0), PointType(1.0, 1.0)
)
);
We can obtain the points of this square by a query such as:
SELECT points
FROM Polygons
WHERE name = 'square';
It is also possible to get a particular nested relation into the FROM clause by use of the keyword
THE, applied to a subquery whose result is a relation; the above query is an example, since it
returns a whole nested relation. For instance, the following query finds those points of the
polygon named square that are on the main diagonal (i.e., x=y).
SELECT ss.x
FROM THE(SELECT points
FROM Polygons
WHERE name = 'square'
) ss
WHERE ss.x = ss.y;
In this query, the nested relation is given an alias ss, which is used in the SELECT and WHERE
clauses as if it were any ordinary relation.

Combining Nested Relations and References


Things get tricky when we do the natural thing (to keep data normalized) and make a nested
table whose tuples are actually references to tuples in some other table. The problem is that the
nested table's attribute has no name. Oracle provides the name COLUMN_VALUE to use in this
circumstance. Here's an example that modifies the above discussion of polygons to have a nested
table of references. First, we create a new type that is a nested table of references to points:
CREATE TYPE PolygonRefType AS TABLE OF REF PointType;
/
Next, we need a new relation, similar to Polygons, but with the points of a polygon stored as a
nested table of references:
CREATE TABLE PolygonsRef (
name VARCHAR2(20),
pointsRef PolygonRefType)
NESTED TABLE pointsRef STORE AS PointsRefTable;
Remember that the points themselves must be stored in some relation of type PointType; we
omit this part of the process of creating and loading data. To query the points in a nested table, as
we did for the query above that asked for the points on the main diagonal, we write essentially
the same query, except that we must use COLUMN_VALUE to refer to the column of the nested
table. The query becomes:
SELECT ss.COLUMN_VALUE.x
FROM THE(SELECT pointsRef
FROM PolygonsRef
WHERE name = 'square'
) ss
WHERE ss.COLUMN_VALUE.x = ss.COLUMN_VALUE.y;

Converting Ordinary Relations to Object-Relations


If we have data in an ordinary relation (i.e., one whose attributes are all built-in types of SQL),
and we want to create an equivalent relation whose type is a user-defined object type or a
relation one or more of whose attributes are object types, we can use the form of an INSERT
statement that defines the inserted tuples by a query. The query can use the type constructors as
appropriate.
For example, suppose we have a relation LinesFlat declared by:
CREATE TABLE LinesFlat(
id INT,
x1 NUMBER,
y1 NUMBER,
x2 NUMBER,
y2 NUMBER
);
and this relation contains lines represented in the ``old'' style, that is, an ID and four components
representing the x- and y-coordinates of two points. We can copy this data into Lines and give it
the right structure by:
INSERT INTO Lines
SELECT id, LineType(PointType(x1,y1), PointType(x2,y2))
FROM LinesFlat;
Insertion with a SELECT clause into a table with nested relations is tricky. If we simply want to
insert into an existing nested relation, we can use THE with specified values. For instance, if we
want to insert the point (2.0, 3.0) into the nested relation for the polygon named ``triangle,'' we
can write:
INSERT INTO THE(SELECT points
FROM Polygons
WHERE name = 'triangle'
)
VALUES(PointType(2,0, 3.0));
Now, suppose we already have a ``flat'' relation representing points of polygons:
CREATE TABLE PolyFlat (
name VARCHAR2(20),
x NUMBER,
y NUMBER
);
If the points of a square are represented in PolyFlat, then we can copy them into Polygons by:
1. Querying PolyFlat for the points of a square.
2. Turning the collection of answers to our query into a relation by applying the keyword
MULTISET.
3. Turning the relation into a value of type PolygonType with the expression CAST ... AS
PolygonType.
4. Using 'square' and the value constructed in (3) as arguments to a VALUE expression.
Here is the command:
INSERT INTO Polygons VALUES('square',
CAST(
MULTISET(SELECT PointType(x, y)
FROM PolyFlat
WHERE name = 'square'
)
AS PolygonType
)
);
Even more complex is the way we can copy data from the flat PolyFlat to put all the polygons
and their sets of points into Polygons. The following almost works:
INSERT INTO Polygons
SELECT pp.name,
CAST(
MULTISET(SELECT PointType(x, y)
FROM PolyFlat qq
WHERE qq.name = pp.name
)
AS PolygonType
)
FROM PolyFlat pp;
The problem is that if there are four points, then there are four tuples with name 'square'
inserted. Adding DISTINCT after the first SELECT doesn't work. We have to find a way to perform
the insertion for each polygon name only once, and a reasonable way is to add a WHERE clause
that insists the x and y components of the PolyFlat tuple be lexicographically first. Here is a
working insertion command:
INSERT INTO Polygons
SELECT pp.name,
CAST(
MULTISET(SELECT PointType(x, y)
FROM PolyFlat qq
WHERE qq.name = pp.name
)
AS PolygonType
)
FROM PolyFlat pp
WHERE NOT EXISTS(
SELECT *
FROM PolyFlat rr
WHERE rr.name = pp.name AND
rr.x < pp.x OR
rr.x = pp.x AND rr.y < pp.y
);

This document was written originally by Jeff Ullman for CS145 in the Autumn of 1998. Special thanks to Ian Mizrahi for the detective work on the
COLUMN_VALUE feature.
Web-Database Programming: CGI and Java
Servlets
NOTE: This document assumes a basic knowledge of HTML. We will not be providing
documentation for HTML coding apart from the creation of forms. There are dozens
of tutorials available online. You might check out the NCSA Beginner's Guide to
HTML.

• Overview
• Retrieving Input from the User
○ Forms
○ Server-Side Input Handling - CGI
○ Server-Side Input Handling - Java
• Returning Output to the User
○ CGI Output
○ Java Output
• Sample Code and Coding Tips
○ CGI Sample Code
○ CGI Setup
○ CGI Debugging
○ Java Sample Code
○ Java Compilation in Unix
○ Servlet Setup
○ Handling Special Characters

Overview
CGI or Common Gateway Interface is a means for providing server-side services over the web
by dynamically producing HTML documents, other kinds of documents, or performing other
computations in response to communication from the user. In this assignment, students who want
to interface with the Oracle database using Oracle's Pro*C precompiled language will be using
CGI.
Java Servlets are the Java solution for providing web-based services. They provide a very similar
interface for interacting with client queries and providing server responses. As such, discussion
of much of the input and output in terms of HTML will overlap. Students who plan to interface
with Oracle using JDBC will be working with Java Servlets.
Both CGI and Java Servlets interact with the user through HTML forms. CGI programs reside in
a special directory, or in our case, a special computer on the network (cgi-courses.stanford.edu),
and provide service through a regular web server. Java Servlets are separate network object
altogether, and you'll have to run a special Servlet program on a specific port on a Unix machine.

Retrieving Input from the User


Input to CGI and Servlet programs is passed to the program using web forms. Forms include text
fields, radio buttons, check boxes, popup boxes, scroll tables, and the like.
Thus retrieving input is a two-step process: you must create an HTML document that provides
forms to allow users to pass information to the server, and your CGI or Servlet program must
have a means for parsing the input data and determining the action to take. This mechanism is
provided for you in Java Servlets. For CGI, you can either code it yourself, find libraries on the
internet that handle CGI input, or use the following example code that we put together for you:
cgiparse.c.

Forms
Forms are designated within an HTML document by the fill-out form tag:
<FORM METHOD = "POST" ACTION = "http://form.url.com/cgi-bin/cgiprogram">
... Contents of the form ...
</FORM>
The URL given after ACTION is the URL of the CGI program (your program). The METHOD is the
means of transferring data from the form to the CGI program. In this example, we have used the
"POST" method, which is the recommended method. There is another method called "GET", but
there are common problems associated with this method. Both will be discussed in the next
section.
Within the form you may have anything except another form. The tags used to create user
interface objects are INPUT, SELECT, and TEXTAREA.
The INPUT tag specifies a simple input interface:
<INPUT TYPE="text" NAME="thisinput" VALUE="default" SIZE=10 MAXLENGTH=20>

<INPUT TYPE="checkbox" NAME="thisbox" VALUE="on" CHECKED>

<INPUT TYPE="radio" NAME="radio1" VALUE="1">

<INPUT TYPE="submit" VALUE="done">

<INPUT TYPE="radio" NAME="radio1" VALUE="2" CHECKED>

<INPUT TYPE="hidden" NAME="notvisible" VALUE="5">


Which would produce the following form:
Top of Form
default done

Bottom of Form

The different attributes are mostly self-explanatory. The TYPE is the variety of input object that
you are presenting. Valid types include "text", "password", "checkbox", "radio", "submit",
"reset", and "hidden". Every input but "submit" and "reset" has a NAME which will be associated
with the value returned in the input to the CGI program. This will not be visible to the user
(unless they read the HTML source). The other fields will be explained with the types:
• "text" - refers to a simple text entry field. The VALUE refers to the default text
within the text field, the SIZE represents the visual length of the field, and the
MAXLENGTH indicates the maximum number of characters the textfield will
allow. There are defaults to all of these (nothing, 20, unlimited).
• "password" - the same as a normal text entry field, but characters entered
are obscured.
• "checkbox" - refers to a toggle button that is independently either on or off.
The VALUE refers to the string sent to the CGI server when the button is
checked (unchecked boxes are disregarded). The default value is "on".
• "radio" - refers to a toggle button that may be grouped with other toggle
buttons such that only one in the group can be on. It's essentially the same
as the checkbox, but any radio button with the same NAME attribute will be
grouped with this one.
• "submit" and "reset" - these are the pushbuttons on the bottom of most
forms you'll see that submit the form or clear it. These are not required to
have a NAME, and the VALUE refers to the label on the button. The default
names are "Submit Query" and "Reset" respectively.
• "hidden" - this input is invisible as far as the user interface is concerned
(though don't be fooled into thinking this is some kind of security feature --
it's easy to find "hidden" fields by perusing a document source or examining
the URL for a GET method). It simply creates an attribute/value binding
without need for user action that gets passed transparently along when the
form is submitted.
The second type of interface is the SELECT interface, which includes popup menus and scrolling
tables. Here are examples of both:
<SELECT NAME="menu">
<OPTION>option 1
<OPTION>option 2
<OPTION>option 3
<OPTION SELECTED>option 4
<OPTION>option 5
<OPTION>option 6
<OPTION>option 7
</SELECT>

<SELECT NAME="scroller" MULTIPLE SIZE=7>


<OPTION SELECTED>option 1
<OPTION SELECTED>option 2
<OPTION>option 3
<OPTION>option 4
<OPTION>option 5
<OPTION>option 6
<OPTION>option 7
</SELECT>
Which will give us:
Top of Form
option 1
option 2
option 3
option 4
option 5
option 6
option 4 Submit Query Reset
option 7

Bottom of Form

The SIZE attribute determines whether it is a menu or a scrolled list. If it is 1 or it is absent, the
default is a popup menu. If it is greater than 1, then you will see a scrolled list with SIZE
elements. The MULTIPLE option, which forces the select to be a scrolled list, signifies that a more
than one value may be selected (by default only one value can be selected in a scrolled list).
OPTION is more or less self-explanatory -- it gives the names and values of each field in the menu
or scrolled table, and you can specify which are SELECTED by default.
The final type of interface is the TEXTAREA interface:
<TEXTAREA NAME="area" ROWS=5 COLS=30>
Mary had a little lamb.
A little lamb?
A little lamb!
Mary had a little lamb.
It's fleece was white as snow.
</TEXTAREA>
Top of Form
Mary had a little lamb.
A little lamb?
A little lamb!
Mary had a little lamb.
It's fleece w as w hite as snow .
Submit Query

Bottom of Form

As usual, the NAME is the symbolic reference to which the input will be bound when submitted to
the CGI program. The ROWS and COLS values are the visible size of the field. Any number of
characters can be entered into a text area.
The default text of the text area is entered between the tags. Whitespace is supposedly respected
(as between <PRE> HTML tags), including the newline after the first tag and before the last tag.
Server-Side Input Handling -- CGI
The form contents will be assembled into an encoded query string. Using the GET
method, this string is available in the environment variable QUERY_STRING. It is
actually passed to the program through the URL -- examine the URL for the first of
the forms above:

http://asdf.asdf.asdf/asdf?thisinput=default&thisbox=on&radio1=2
Everything after the '?' is the query string. You'll see that a number of expressions
appear concatenated by & symbols -- each expression assigns a string value to
each form object. In this case, the text field named "thisinput" has the value
"default", which is what was typed into the field, the checkbox "thisbox" has the
value "on", and the radio button group "radio1" has the value "2" (the second
button is checked -- note that this is the value I gave it, not a default value. The
default is "on").

Let's look at another example from the second form:


http://zxcv.zxcv.zxcv/zxcv?menu=option+4&scroller=option+1&scroller=option+2
The menu has option 4 selected, and the scroller has option 1 and option 2 selected. Note that
spaces are converted to '+' symbols in the URL string. The character '+' is converted to its hex
value %2B. Other characters similarly converted are & (to %26), % (to %25), and $ (to %24).
This conversion is automatic.
Using GET is not recommended, however. Some systems will truncate the URL before passing it
to the CGI program, and thus the QUERY_STRING environment variable will contain only a prefix
of the actual query string. Instead, you should use the POST method.
The POST query string is encoded in precisely the same form as the GET query string, but instead
of being passed in the URL and read into the QUERY_STRING variable, it is given to the CGI
program as standard input, which you can thus read using ANSI functions or regular character
reading functions. The only quirk is that the server will not send EOF at the end of the data.
Instead, the size of the string is passed in the environment variable CONTENT_LENGTH, which can
be accessed using the normal stdlib.h function:
char *value;
int length;

value = getenv("CONTENT_LENGTH");
sscanf(value, "%d", &length);
Decoding the data is thus just a question of walking through the input and picking out the values.
These values can then be used to determine what the user wants to see.
We have written a very simple, linear-search-based mechanism for parsing the input string.
These are located, as mentioned above, at cgiparse.c. You might want to cut and paste these into
your own code or to use the .h file provided. You can use this in your CGI programs by calling
Initialize() at the beginning of your code, and then calling GetFirstValue(key) and
GetNextValue(key) to retreive the bindings for each of the FORM parameters. See the
comments in the file for more details.
Server-Side Input Handling -- Java
Java handles GET and POST slightly differently. The parsing of the input is done for you by Java,
so you are separated from the actual format of the input data completely. Your program will be
an object subclassed off of HttpServlet, the generalized Java Servlet class for handling web
services.
Servlet programs must override the doGet() or doPost() messages, which are methods that are
executed in response to the client. There are two arguments to these methods,
HttpServletRequest request and HttpServletResponse response. Let's take a look at a
very simple servlet program, the traditional HelloWorld (this time with a doGet method):
import java.io.*;
import java.text.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class Hello extends HttpServlet {


public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws IOException, ServletException {
response.setContentType("text/html");
PrintWriter out = response.getWriter();
out.println("<html>");
out.println("<head>");
String title = "Hello World";
out.println("<title>" + title + "</title>");
out.println("</head>");
out.println("<body bgcolor=white>");
out.println("<h1>" + title + "</h1>");
String param = request.getParameter("param");

if (param != null)
out.println("Thanks for the lovely param='" + param + "' binding.");

out.println("");
out.println("");
}
}
We'll discuss points in this code again in the section on Java Output, but for now, we will focus
on the input side. The argument HttpServletRequest request represents the client request,
and the values of the parameters passed from the HTML FORM can be retrieved by calling the
HttpServletRequest getParameter method. This method takes as its argument the name of
the parameter (the name of the HTML INPUT object), and returns as a Java String the value
assigned to the parameter. In cases where the parameter may have multiple bindings, the method
getParameterValues can be used to retrieve the values in an array of Java Strings -- note that
getParameter will return the first value of this array. It is through these mechanisms that you
can retrieve any of the values entered or implicit in the form.
As might be inferred from the example above, Java returns null if the parameter for whose name
you request does not have a value. Recall that unchecked buttons' bindings are not passed in a
POST message -- you can check for null to determine when buttons are off.

Returning Output to the User


In your project, you are going to be concerned with returning HTML documents to
the user. The documents will be dynamically created based on the output of the
query. You can format it however you like, using ordinary HTML formatting routines

CGI Output
The only work you have to do apart from constructing an HTML document on the fly with the
output from the query is to add a short header at the top of the file. Your header will represent
the MIME type for HTML, and consists of a single line of text followed by a blank line:
content-type: text/html

<HTML> ... file ... </HTML>


There are, of course, many other types that you can return, but this is all you'll need to return
your database queries.
CGI returns the HTML document to the user through standard output from the program, so you
can just use a regular printf function in your C programs. The format for setting the content
type is just:
printf("content-type: text/html\n\n");

Java Output
Let's look back at our Java code example. You'll see a number of differences between the Servlet
code and the CGI approach. Output is all handled by the HttpServletResponse object, which
allows you to set the content type through the setContentType method. Instead of printing the
HTTP header yourself, you tell the HttpServletResponse object that you want the content type
to be "text/html" explicitly.
All HTML is returned to the user through a PrintWriter object, that is retrieved from the
response object using the getWriter method. HTML code is then returned line by line using
the println method.
Assuming that you all have a basic background in Java, so we won't provide a detailed treatment
of exceptions here, but do note that IOException and ServletException both must either be
handled or thrown.

Sample Code and Coding Tips


I recommend that everyone attempt to play around a little bit with both of the
methods, Java Servlet and CGI, if you have the time and inclination (though you
only have to implement your database interface in one of them, of course).

CGI Sample Code


Here is a demonstration of a PRO*C CGI program.
You can also check out the source code.
The HTML page demonstrates a few input features, though the only ones that do anything are
the username and password fields. These are used to log onto your Oracle account when the CGI
program is executed, create a table, do some insertions, demonstrate the production of HTML
formatting through queries on the data (including a demonstration of constructing a new form,
which may provide some of you with ideas of how to make a really cool interface), and then
drop the table from your database. You may freely cannibalize whatever portions you find
useful.

CGI Setup
Your CGI script will be run from cgi-courses.stanford.edu. The URL for your CGI executable
will be: http://cgi-courses.stanford.edu/~username/cgi-bin/scriptname
You will need to perform the following actions before a CGI program will run:
• Get an account on cgi-courses.stanford.edu.
• Create a directory in your home folder to hold your cgi binary executables:
mkdir cgi-bin
• Set access levels on your cgi-bin directory for your cgi-courses.stanford.edu
account ("username" should be replaced with your username):
fs setacl cgi-bin username.cgi write
• Make sure that your executeables correctly set all environment variables
(normally this is done by /usr/class/cs145/all.env, but this is not available on
the cgi machine, so you have to do it explicitly). Here is an example function
that you should run before attempting to connect to the database (this
function is in C, but you can pretty much just lift the settings and paste them
into Perl or PHP, which also need them to connect to the database):
• void SetEnvs(void) {
• putenv("ORACLE_SID=SHR1_PRD");
• putenv("ORACLE_HOME=/usr/pubsw/apps/oracle/8.1.7");
• putenv("ORACLE_TERM=xsun5");
• putenv("TNS_ADMIN=/usr/class/cs145/sqlnet");
• putenv("TWO_TASK=SHR1_PRD");
}
• Move the executable into your cgi-bin folder.
• Change the permissions on your cgi executable:
chmod 701 scriptname
• Use HTML forms to access your new program at http://cgi-
courses.stanford.edu/~username/cgi-bin/scriptname.
Here is the homepage of the leland CGI service, which has a FAQ and gives some information
about the capabilities of the system. Please check here first if your CGI programs are giving you
errors.

CGI Debugging
Due to popular demand, a new cgi debugging feature was just added to the cgi service. It's not in
the leland CGI docs yet. If you access your script like so:
http://cgi-courses/cgi-bin/sboxd/~username/scriptname
The script will execute with extra debug info:
• All STDERR goes to the browser
• A header is included, so lack of any output or lack of Content Type will not
cause Internal Server Error.
If still receiving Internal Server Error, consult the cgi FAQ or look in the server log:
http://cgi-courses/logs/error_log.
Note, the log shows only several recent entries, due to system issues.
An alternative method is to run your cgi program from command-line, without using the web
browser. Put your CGI input into the environment variable QUERY_STRING and run your
program. For example (assuming your program is called cgiprog and expects two parameters
name1 and name2):
cd ~/cgi-bin
setenv QUERY_STRING 'name1=abc&name2=def'
cgiprog
Note: If you want to use debugging tools such as dbx or gdb, you need to modify Makefile to
add the flag -g after cc or g++.

Java Sample Code


You can provide your HTML FORMs on permanent webpages in your personal WWW directory
-- though this isn't recommended because you then have to hard code the Servlet addresses -- or
in the webpages subdirectory where you run your Servlet (see below in the Servlet setup
section). Alternatively (or additionally) you can integrate FORMs into Servlets by creating a
FORM on the fly in your Servlet program, which will be invoked when doPost() or doGet() are
invoked by the client. An example of a program that creates a FORM on the fly can be found at
RequestParamExample.java.
An example that uses JDBC to implement an interface for querying information about a certain
US state (based on the JDBC example programs provided in PDA assignment 5) can be found at
StateQuerier.java.
The following two examples implement the state query, but it separates the query form from the
answer form, providing these services with two different Servlets: StateQueryForm.java and
StateQueryAns.java.
You can find the very simple example given above in the text at Hello.java.
One last example demonstrates the concept of a Session, which we do not cover in this handout,
but you can use to liven up your interface can be found at HelloSession.java.

Java Compilation in Unix


Compiling Servlets in UNIX requires a few changes to your PATH and CLASSPATH
environment variables. These changes have been made for you in the source file
/afs/ir/class/cs145/all.env. They include the following additions:
setenv PATH /afs/ir/class/cs145/jsdk2.1:/usr/pubsw/apps/jdk1.2/bin:${PATH}
setenv CLASSPATH /afs/ir/class/cs145/jsdk2.1/servlet.jar:$CLASSPATH
If there are any difficulties, let us know. These have been tested on the elaine
machines and are assumed to be operational on the leland Sparc machines (elaine,
myth, epic, saga).

You also have to set up a specific directory structure to provide Servlets. The directory structure
required by Servlets is essentially:
[anydir]
[servletdir]
webpages
WEB-INF
servlets
A shell script to build this hierarchy is provided at
/afs/ir/class/cs145/code/bin/buildServletDirectory (after you run source
/afs/ir/class/cs145/all.env (which you probably should just add to your .cshrc file), you
can run buildServletDirectory by just typing the command).
You can store .html documents in your webpages directory, and they will be accessible at your
Servlet address (see below), while all Servlets you write have to be located in the servlets
directory to be recognized.
Further information on the Java Servlet API can be found at Servlet Package Documentation
page.

Servlet Setup
The directory structure for your servlets and HTML documents was outlined in the previous
section. Static HTML documents may be placed in the webpages directory and are accessible
from the web at the address http://machinexx:portnum/page.html, where machinexx refers
to the machine from which you're running the webserver (e.g. elaine12, saga22, myth7, etc.),
portnum is a specific port (see below), and page.html is the name of the HTML page that you
are serving. You may find it useful to create a static HTML document or a hierarchy of static
documents to serve as the jumping off point for your Servlets, where your HTML FORMs that
start the interaction with the database are found.
Servlets will be found in the directory servletdir/webpages/WEB-INF/servlets, and will just be
the .class files that you compile from your .java files using javac. These may be reached on the
web using the URL http://machinexx:portnum/servlet/servletname. Note that the servlet
directory is singular in the URL but plural in Unix, while the Servlet itself loses its .class in the
URL. HTML and other documents contained in the servlets directory cannot be accessed over
the web.
Once you have your directory set up and your Servlets compiled, you have to run the Java JSDK
2.1 webserver manually on a specific leland machine in order to provide these documents over
the web. The steps involved in starting the server are as follows:
• Choose a port number in the range 5000-65000. This will bind your server
application to that port for the machine on which you're running your server.
Try to choose a random number and remember it -- you will be the only
person on that machine who can use that port, and you will need it to have
access over the web.
• From the root of your servlet directory (if you run our buildServletDirectory
script, then it will be called servletdir), start the server by calling startserver
-port portnum from the Unix command line, where portnum is the port
number you chose above. The server will begin in the background, and you
can see it using the ps command. If you do not enter a port number, the
default port number, 8080, will be chosen for you (you can actually set the
default yourself -- after you've run the server once, it will create a
configuration file called "default.cfg" for you -- it finds the default port
number here).
• From your browser, enter the URL of a webpage or servlet contained in your
servletdir hierarchy using the address structure mentioned above. Now you
can play with your interface.
• If you would like to stop the server, issue the command stopserver.
• If you want to recompile your servlets, you have to stop the server and
restart it again. Static HTML pages that you are hosting from the webpages
directory, however, can be changed at will.

Handling Special Characters


The special characters &, <, and >, need to be escaped as &amp;, &lt;, and &gt;,
respectively in HTML text (see NCSA Beginner's Guide to HTML). Moreover, special
characters appearing in URL's need to be escaped, differently than when they
appear in HTML text. For example, if you link on text with special characters and
want to embed them into extended URLs as parameter values, you need to escape
them: convert space to + or %20, convert & to %26, convert = to %3D, convert %
to %25, etc. (In general, any special character can be escaped by a percent sign
followed by the character's hexadecimal ASCII value.) Important: Do NOT escape
the & that actually separates parameters! For example, if you want two parameters
p1 and p2 to have the values 3 and M&M, you should write something like:

http://cgi-courses.stanford.edu/~username/cgi-bin/cgiprog?p1=3&p2=M%26M
Be careful not to confuse the escape strings for HTML text with those for URL's.

This document was written by Nathan Folkert (with help from Vincent Chu) for Prof. Jennifer Widom's CS145 class in Spring
2000; revised by Calvin Yang for Prof. Widom's CS145 class in Spring 2002.

Oracle: Frequently Asked Questions


• What built-in functions/operators are available for manipulating strings?
• Can I print inside a PL/SQL program?
• Is it possible to write a PL/SQL procedure that takes a table name as input
and does something with that table?
• What is the correct syntax for ordering query results by row-type objects?
• How do I kill long-running queries in sqlplus, Pro*C, and JDBC?
• In Pro*C, why do I get a strange "break outside loop or switch" error
message?

What built-in functions/operators are available for manipulating strings?


The most useful ones are LENGTH, SUBSTR, INSTR, and ||:

• LENGTH(str) returns the length of str in characters.


• SUBSTR(str,m,n) returns a portion of str, beginning at character m, n
characters long. If n is omitted, all characters to the end of str will be
returned.
• INSTR(str1,str2,n,m) searches str1 beginning with its n-th character for the
m-th occurrence of str2 and returns the position of the character in str1 that
is the first character of this occurrence.
• str1 || str2 returns the concatenation of str1 and str2.
The example below shows how to convert a string name of the format 'last, first'
into the format 'first last':
SUBSTR(name, INSTR(name,',',1,1)+2)
|| ' '
|| SUBSTR(name, 1, INSTR(name,',',1,1)-1)
For case-insensitive comparisons, first convert both strings to all upper case using
Oracle's built-in function upper() (or all lower case using lower()).
Can I print inside a PL/SQL program?
Strictly speaking PL/SQL doesn't currently support I/O. But, there is a standard
package DBMS_OUTPUT that lets you do the trick. Here is an example:

-- create the procedure


CREATE PROCEDURE nothing AS
BEGIN
DBMS_OUTPUT.PUT_LINE('I did nothing');
-- use TO_CHAR to convert variables/columns
-- to printable strings
END;
.
RUN;
-- set output on; otherwise you won't see anything
SET SERVEROUTPUT ON;
-- invoke the procedure
BEGIN
nothing;
END;
.
RUN;
Then you should see "I did nothing" printed on your screen.

DBMS_OUTPUT is very useful for debugging PL/SQL programs. However, if you print too much,
the output buffer will overflow (the default buffer size is 2KB). In that case, you can set the
buffer size to a larger value, e.g.:
BEGIN
DBMS_OUTPUT.ENABLE(10000);
nothing;
END;
.
RUN;

Is it possible to write a PL/SQL procedure that takes a table name as input


and does something with that table?
For pure PL/SQL, the answer is no, because Oracle has to know the schema of the
table in order to compile the PL/SQL procedure. However, Oracle provides a
package called DBMS_SQL, which allows PL/SQL to execute SQL DML as well as DDL
dynamically at run time. For example, when called, the following stored procedure
drops a specified database table:

CREATE PROCEDURE drop_table (table_name IN VARCHAR2) AS


cid INTEGER;
BEGIN
-- open new cursor and return cursor ID
cid := DBMS_SQL.OPEN_CURSOR;
-- parse and immediately execute dynamic SQL statement
-- built by concatenating table name to DROP TABLE command
DBMS_SQL.PARSE(cid, 'DROP TABLE ' || table_name, dbms_sql.v7);
-- close cursor
DBMS_SQL.CLOSE_CURSOR(cid);
EXCEPTION
-- if an exception is raised, close cursor before exiting
WHEN OTHERS THEN
DBMS_SQL.CLOSE_CURSOR(cid);
-- reraise the exception
RAISE;
END drop_table;
.
RUN;

What is the correct syntax for ordering query results by row-type objects?
As a concrete example, suppose we have defined an object type PersonType with an
ORDER MEMBER FUNCTION, and we have created a table Person of PersonType objects.
Suppose you want to list all PersonType objects in Person in order. You'd probably
expect the following to work:

SELECT * FROM Person p ORDER BY p;


But it doesn't. Somehow, Oracle cannot figure out that you are ordering PersonType
objects. Here is a hack that works:

SELECT * FROM Person p ORDER BY DEREF(REF(p));

How do I kill long-running queries in sqlplus, Pro*C, and JDBC?


Sometimes it is necessary to stop long-running queries, either because they take
longer to run than you'd like, or because you realize you've made a mistake. It is
important kill off such queries properly so that they don't take up extra
computational resources and prevent others from using the system, especially near
project deadlines when resources are most strained.

As a general precautionary measure, please be sure to test your queries under sqlplus prompt
before running them through CGI or JDBC. It is much easier to kill a query in sqlplus than in
CGI or JDBC. If your test query takes a long time to run under sqlplus, you can simply hit Ctrl-
C to terminate it.
Never close an ssh or telnet or xterm window without properly logging out. Always quit your
programs (including sqlplus), stop Java servlets, and type "exit" or "logout" to quit. If you force-
close your ssh/telnet/xterm window, there may still be processes running in the background, and
you may be taking up system resources without knowing it.
If, for some reason, you cannot logout normally (for example, the system is not responding), you
should open another window, login to the same machine where you have the problem, and kill
the processes that is causing trouble:
Type "ps -aef | grep [username]" to find the Process IDs of your processes (replace
[username] with your leland user name), and kill the processes you want to
terminate using "kill [processID]". Always use the "kill" command without the -9
flag first. Use -9 flag only if you cannot kill it otherwise.

If you closed the window by mistake and do not remember which sweet hall
machine you were logged into, open another window immediately and log into any
sweet hall machine, then type "sweetfinger [username]" (replace [username] with
your actual leland user name). It will give you the machine names you were on a
few minutes ago. Then, log in to the appropriate machine and kill your processes
there.

If you issued a query through JDBC that is taking a long time to execute and you want to kill it,
you should stop your Java servlet. In most cases this will kill the query. You can also use the
setQueryTimeout([time in seconds]) method on a statement object to stop queries that run too
long.
If you issued a query through CGI that is taking a long time to execute, normally the CGI service
will kill it for you within 10 seconds. However, the above occasionally fails to work, and we do
not know of any better way of killing runaway queries issued by JDBC or CGI (other than asking
the administrator to kill them for you). That's why we ask you to always test your queries under
sqlplus first. It is much easier to kill queries there.

In Pro*C, why do I get a strange "break outside loop or switch" error


message?
If you get an error message

"break" outside loop or switch


when compiling your Pro*C program, chances that you have the following statement somewhere
before a loop:
EXEC SQL WHENEVER NOT FOUND DO break;
After the loop, you should insert the following statement:
EXEC SQL WHENEVER NOT FOUND CONTINUE;
This would cancel the previous WHENEVER statement. If you do not do this, you may get the
error message at subsequent SQL calls.

This document was written originally by Jun Yang for CS145 in Spring, 1999. Additions by Antonios Hondroulis and Calvin
Yang in Spring, 2002

The Behavior of NULL's in SQL


• NULL Basics
• MAX()
• a >= ALL()
• EXCEPT
• NOT IN
• EXCEPT NULL
• Database Implementation Compliance
• Conclusion

Rarely is the full behavior of the NULL value in SQL taught or described in detail, and with
good reason: Some of the SQL rules surrounding NULL can be surprising or unintuitive.
Unfortunately, if you have deal with NULL in real databases, the results can be downright
frustrating. The SQLite project, for example, uses trial and error to determine how a database
behaves in the presence of NULL values.
Fortunately, Date and Darwen's A Guide to the SQL Standard (fourth edition) [1] describes
SQL's rules concerning NULL in good detail.

NULL Basics
Intuitively, NULL approximately represents an unknown value.
• An arithmetic operation involving a NULL returns NULL. For example, NULL
minus NULL yields NULL, not zero. [2]
• A boolean comparison between two values involving a NULL returns neither
true nor false, but unknown in SQL's three-valued logic. [3] For example,
neither NULL equals NULL nor NULL not-equals NULL is true. Testing whether
a value is NULL requires an expression such as IS NULL or IS NOT NULL.
• An SQL query selects only values whose WHERE expression evaluates to true,
and groups whose HAVING clause evaluates to true.
• The aggregate COUNT(*) counts all NULL and non-NULL tuples;
COUNT(attribute) counts all tuples whose attribute value is not NULL. Other
SQL aggregate functions ignore NULL values in their computation. [4]

A Simple Case
Here is what the SQL standard mandates for some operations involving sets and multisets.
For a simple relation R
CREATE TABLE R (a INTEGER);
the following queries attempt to reliably determine the maximum known value of
the attribute a in the table R.

For brevity below, let


• empty refer to an empty result of no rows, and
• NULL (in the context of a table or result) refer to a table with one row holding
a NULL value.
MAX()
This is the obvious query:
SELECT MAX(a) FROM R
• If R is empty, the query must return NULL, not empty. [5]
• If R is a one-row table holding NULL, Date and Darwen don't specifically
declare what MAX() should return if its argument consists of exclusively NULL
values.
• If R is a table holding NULL and non-NULL integers, NULLs are ignored, and
MAX() returns the maximum integer.

a >= ALL()
This expression of the maximum seems consistent with mathematical logic, but fails completely
in SQL:
SELECT DISTINCT a
FROM R
WHERE a >= ALL (SELECT * FROM R)
• If R is empty, the query returns empty. The >= ALL test is vacuously true with
an empty subquery [6], but there is no value of a to exploit the test.
• If R holds a NULL value, the query returns empty, because the test a >=
ALL(...) returns unknown (not false!) for any NULL or maximum non-NULL
integer value of a if the subquery includes a NULL value. [7]
EXCEPT
This expression is one derivation of maximum as computed in relational algebra: subtract all the
non-maximum values from the table R, leaving the maximal ones:
(SELECT DISTINCT * FROM R)
EXCEPT
(SELECT R.a
FROM R, R AS S
WHERE R.a < S.a)
• If R is empty, the query returns empty.
• If R holds a NULL value, the query returns NULL, in addition to whatever
maximal integer is present (if any). The lower subquery never includes NULL,
so NULL is never subtracted from R.
NOT IN
This expression is another writing of maximum as computed in relational algebra: find values not
in the non-maximum values of R:
SELECT DISTINCT *
FROM R
WHERE a NOT IN (SELECT R.a
FROM R, R AS S
WHERE R.a < S.a)
This writing turns out to be subtly different from the last one.
• If R is empty, the query returns empty.
• If R holds one integer, and at least one NULL value, the query returns NULL, in
addition to whatever maximal integer is present (if any). In this case, the
subquery is always empty; the one available integer is compared only to
NULL values, so do not participate in the subquery's result. NULL NOT IN
(empty) is vacuously true as it is in mathematics [8], so NULL is selected as
part of the result.
• If R holds more than one integer, the query returns the maximal integer. In
this case, the subquery includes at least one value. Now that the subquery is
not empty, NULL NOT IN (nonempty result) evaluates to unknown (not false!),
and is no longer selected as part of the result. As an aside, NULL NOT IN
(nonempty result) returns unknown even if the nonempty result includes
NULL. [9]
EXCEPT NULL
Because it is somewhat awkward to have an expression for MAX return two rows whose values do
not equal, the following expression adjusts the EXCEPT expression to exclude NULL from the
answer:
(SELECT DISTINCT * FROM R)
EXCEPT
(SELECT R.a
FROM R, R AS S
WHERE R.a < S.a OR R.a IS NULL)
• If R is empty, the query returns empty.
• If R holds NULL, the query returns the maximal integer, or empty if R has no
integers. EXCEPT will remove NULL from the result if NULL appears in the
bottom subquery, even though NULL is not equal to NULL. [10] Similarly,
DISTINCT, UNION, and INTERSECT always returns at most one NULL.

Database Implementation Compliance


PostgreSQL 7.2.2
Where SQL mandates a behavior for a query above, PostgreSQL complies.
• MAX() of NULLs only: returns NULL (consistent with ignoring NULLs, then
computing the MAX() of an empty remainder).
Oracle 9i
Where SQL mandates a behavior for a query above, Oracle complies.
• MAX() of NULLs only: returns NULL (consistent with ignoring NULLs, then
computing the MAX() of an empty remainder).

Conclusion
No pair of the queries from the list above are equivalent when faced with NULLs in relational
data, despite their conceptual similarity.
The two implementations tested, PostgreSQL and Oracle, seem to comply with NULL behavior
for the set and multiset operations tested here, even when such behavior is sometimes subtle or
unintuitive.
Consider the above a good reason to define away NULLs from relational schema whenever
possible.

Footnotes for this page

[1] C. J. Date and Hugh Darwen A Guide to the SQL Standard. Fourth edition,
Addison-Wesley, Reading, Massachusetts, 1997. (ISBN 0-201-96426-0)

[2] page 236.

[3] page 239.

[4] page 236-237.

[5] page 237.

[6] page 176.

[7] page 244-245.

[8] page 176.

[9] page 244.

[10] page 249.


This document was written originally by Wang Lam for Prof. Jennifer Widom's CS145 class, Spring 2003.

Ajayajayajayajay

See More About:

• database normalization

• 1nf

• 2nf

• 3nf

• bcnf

Sponsored Links
SQL Backup ManagerSecure Online Backup for Businesses From MozyPro. Sign Up Now & Save!www.Mozy.com
Master Data SearchMatchMaker brings fast fuzzy search to Master Data Management solutionswww.exorbyte.com

The CAIA DesignationThe only global credential for alternative investment specialists.www.caia.org

Database Ads

Data Normalization Database Normalization ActionScript Guide Criminal Database RDBMS

If you've been working with databases for a while, chances are you've heard the term normalization. Perhaps someone's asked you
"Is that database normalized?" or "Is that in BCNF?" All too often, the reply is "Uh, yeah." Normalization is often brushed aside as a
luxury that only academics have time for. However, knowing the principles of normalization and applying them to your daily
database design tasks really isn't all that complicated and it could drastically improve the performance of your DBMS.

In this article, we'll introduce the concept of normalization and take a brief look at the most common normal forms. Future articles
will provide in-depth explorations of the normalization process.

What is Normalization?

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process:
eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make
sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database
consumes and ensure that data is logically stored.

The Normal Forms

The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as
normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five
(fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal
form is very rarely seen and won't be discussed in this article.
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only.
Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take
place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible
inconsistencies. That said, let's explore the normal forms.

First Normal Form (1NF)

First normal form (1NF) sets the very basic rules for an organized database:

• Eliminate duplicative columns from the same table.

• Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary
key).

Second Normal Form (2NF)

Second normal form (2NF) further addresses the concept of removing duplicative data:

• Meet all the requirements of the first normal form.


• Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

• Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form (3NF)

Third normal form (3NF) goes one large step further:

• Meet all the requirements of the second normal form.


• Remove columns that are not dependent upon the primary key.

Fourth Normal Form (4NF)

Finally, fourth normal form (4NF) has one additional requirement:

• Meet all the requirements of the third normal form.


• A relation is in 4NF if it has no multi-valued dependencies.
Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF
database.

If you'd like to ensure your database is normalized, explore our other articles in this series

COLUMN: Definition: Database tables are composed of individual columns corresponding to the attributes of the object.

ROW: Definition: In a relational database, a row consists of one set of attributes (or one tuple) corresponding to one instance of
the entity that a table schema describes.

Also Known As: Record

PRIMARY KEY: Definition: The primary key of a relational table uniquely identifies each record in the table. It can either be a
normal attribute that is guaranteed to be unique (such as Social Security Number in a table with no more than one record per
person) or it can be generated by the DBMS (such as a globally unique identifier, or GUID, in Microsoft SQL Server). Primary keys
may consist of a single attribute or multiple attributes in combination.

Examples:
Imagine we have a STUDENTS table that contains a record for each student at a university. The student's unique student ID
number would be a good choice for a primary key in the STUDENTS table. The student's first and last name would not be a good
choice, as there is always the chance that more than one student might have the same name.

For more information on keys, read the article Database Keys. For more on selecting appropriate primary keys for a table, read
Choosing a Primary Key.

DATA BASE:

As you may already know, databases use tables to organize information. (If you don’t have a basic familiarity with database
concepts, read What is a Database?) Each table consists of a number of rows, each of which corresponds to a single database
record. So, how do databases keep all of these records straight? It’s through the use of keys.

Primary Keys

The first type of key we’ll discuss is the primary key. Every database table should have one or more columns designated as the
primary key. The value this key holds should be unique for each record in the database. For example, assume we have a table
called Employees that contains personnel information for every employee in our firm. We’d need to select an appropriate primary
key that would uniquely identify each employee. Your first thought might be to use the employee’s name.

This wouldn’t work out very well because it’s conceivable that you’d hire two employees with the same name. A better choice might
be to use a unique employee ID number that you assign to each employee when they’re hired. Some organizations choose to use
Social Security Numbers (or similar government identifiers) for this task because each employee already has one and they’re
guaranteed to be unique. However, the use of Social Security Numbers for this purpose is highly controversial due to privacy
concerns. (If you work for a government organization, the use of a Social Security Number may even be illegal under the Privacy
Act of 1974.) For this reason, most organizations have shifted to the use of unique identifiers (employee ID, student ID, etc.) that
don’t share these privacy concerns.

Once you decide upon a primary key and set it up in the database, the database management system will enforce the uniqueness
of the key. If you try to insert a record into a table with a primary key that duplicates an existing record, the insert will fail.

Most databases are also capable of generating their own primary keys. Microsoft Access, for example, may be configured to use the
AutoNumber data type to assign a unique ID to each record in the table. While effective, this is a bad design practice because it
leaves you with a meaningless value in each record in the table. Why not use that space to store something useful?

Foreign Keys

The other type of key that we’ll discuss in this course is the foreign key. These keys are used to create relationships between
tables. Natural relationships exist between tables in most database structures. Returning to our employees database, let’s imagine
that we wanted to add a table containing departmental information to the database. This new table might be called Departments
and would contain a large amount of information about the department as a whole. We’d also want to include information about the
employees in the department, but it would be redundant to have the same information in two tables (Employees and Departments).
Instead, we can create a relationship between the two tables.

Let’s assume that the Departments table uses the Department Name column as the primary key. To create a relationship between
the two tables, we add a new column to the Employees table called Department. We then fill in the name of the department to
which each employee belongs. We also inform the database management system that the Department column in the Employees
table is a foreign key that references the Departments table. The database will then enforce referential integrity by ensuring that all
of the values in the Departments column of the Employees table have corresponding entries in the Departments table.

Note that there is no uniqueness constraint for a foreign key. We may (and most likely do!) have more than one employee
belonging to a single department. Similarly, there’s no requirement that an entry in the Departments table have any corresponding
entry in the Employees table. It is possible that we’d have a department with no employees

Candidate Key

Definition: A candidate key is a combination of attributes that can be uniquely used to identify a database record without any
extraneous data. ...
CHOOSING APRIMARY KEY: Databases depend upon keys to store, sort and compare records. If you’ve been around databases
for a while, you’ve probably heard about many different types of keys – primary keys, candidate keys, and foreign keys. When you
create a new database table, you’re asked to select one primary key that will uniquely identify records stored in that table.

The selection of a primary key is one of the most critical decisions you’ll make in the design of a new database. The most important
constraint is that you must ensure that the selected key is unique. If it’s possible that two records (past, present, or future) may
share the same value for an attribute, it’s a poor choice for a primary key. When evaluating this constraint, you should think
creatively. Let’s consider a few examples that caused issues for real-world databases:

• ZIP Codes do not make good primary keys for a table of towns. If you’re making a a simple lookup table of cities, ZIP code
seems to be a logical primary key. However, upon further investigation, you may realize that more than one town may share a
ZIP code. For example, four cities in New Jersey (Neptune, Neptune City, Tinton Falls and Wall Township) all share the ZIP code
07753.
• Social Security Numbers do not make good primary keys for a table of people for many reasons. First, most people consider their
SSN private and don’t want it used in databases in the first place. Second, some people don’t have SSNs – especially those who
have never set foot in the United States! Third, SSNs may be reused after an individual’s death. Finally, an individual may have
more than one SSN over a lifetime – the Social Security Administration will issue a new number in cases of fraud or identity
theft.
So, what makes a good primary key? If you’re unable to find an obvious answer, turn to your database system for support. A best
practice in database design is to use an internally generated primary key. The database management system can normally
generate a unique identifier that has no meaning outside of the database system. For example, you might use the Microsoft Access
AutoNumber datatype to create a field called RecordID. The AutoNumber datatype automatically increments the field each time you
create a new record. While the number itself is meaningless, it provides a great way to reference an individual record in queries.

Those are the basics on primary keys. Remember to choose carefully, as it’s difficult to change the primary key in a production
table. For a more in-depth look at all the types of database keys, read Database Keys.

DATABASE RELATIONSHIP” Definition: A relationship exists between two database tables when one table has a foreign key that
references the primary key of another table.

You might also like