Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Joe Celko's SQL for Smarties: Advanced SQL Programming
Joe Celko's SQL for Smarties: Advanced SQL Programming
Joe Celko's SQL for Smarties: Advanced SQL Programming
Ebook1,290 pages17 hours

Joe Celko's SQL for Smarties: Advanced SQL Programming

Rating: 3.5 out of 5 stars

3.5/5

()

Read preview

About this ebook

SQL for Smarties was hailed as the first book devoted explicitly to the advanced techniques needed to transform an experienced SQL programmer into an expert. Now, 10 years later and in the third edition, this classic still reigns supreme as the book written by an SQL master that teaches future SQL masters. These are not just tips and techniques; Joe also offers the best solutions to old and new challenges and conveys the way you need to think in order to get the most out of SQL programming efforts for both correctness and performance. In the third edition, Joe features new examples and updates to SQL-99, expanded sections of Query techniques, and a new section on schema design, with the same war-story teaching style that made the first and second editions of this book classics.
  • Expert advice from a noted SQL authority and award-winning columnist, who has given ten years of service to the ANSI SQL standards committee and many more years of dependable help to readers of online forums.
  • Teaches scores of advanced techniques that can be used with any product, in any SQL environment, whether it is an SQL-92 or SQL-99 environment.
  • Offers tips for working around system deficiencies.
  • Continues to use war stories--updated!--that give insights into real-world SQL programming challenges.
LanguageEnglish
Release dateJul 26, 2010
ISBN9780080460048
Joe Celko's SQL for Smarties: Advanced SQL Programming
Author

Joe Celko

Joe Celko served 10 years on ANSI/ISO SQL Standards Committee and contributed to the SQL-89 and SQL-92 Standards. Mr. Celko is author a series of books on SQL and RDBMS for Elsevier/MKP. He is an independent consultant based in Austin, Texas. He has written over 1200 columns in the computer trade and academic press, mostly dealing with data and databases.

Read more from Joe Celko

Related to Joe Celko's SQL for Smarties

Related ebooks

Computers For You

View More

Related articles

Reviews for Joe Celko's SQL for Smarties

Rating: 3.68750000625 out of 5 stars
3.5/5

16 ratings1 review

What did you think?

Tap to rate

Review must be at least 10 words

  • Rating: 3 out of 5 stars
    3/5
    it is a good book, but i did not find the answered to my problem.

Book preview

Joe Celko's SQL for Smarties - Joe Celko

Author

Database Design

THIS CHAPTER DISCUSSES THE DDL (Data Definition Language), which is used to create a database schema. It is related to the next chapter on the theory of database normalization. Most bad queries start with a bad schema. To get data out of the bad schema, you have to write convoluted code, and you are never sure if it did what it was meant to do.

One of the major advantages of databases, relational and otherwise, was that the data could be shared among programs so that an enterprise could use one trusted source for information. Once the data was separated from the programs, we could build tools to maintain, back up, and validate the data in one place, without worrying about hundreds or even thousands of application programs possibly working against each other.

SQL has spawned a whole branch of data modeling tools devoted to designing its schemas and tables. Most of these tools use a graphic or text description of the rules and the constraints on the data to produce a schema declaration statement that can be used directly in a particular SQL product. It is often assumed that a CASE tool will automatically prevent you from creating a bad design. This is simply not true.

Bad schema design leads to weird queries that are trying to work around the flaws. These flaws can include picking the wrong data types, denormalization, and missing or incorrect constraints. As Elbert Hubbard (American author, 1856–1915) put it: Genius may have its limitations, but stupidity is not thus handicapped.

1.1 Schema and Table Creation

The major problem in learning SQL is that programmers are used to thinking in terms of files rather than tables.

Programming languages are usually based on some underlying model; if you understand the model, the language makes much more sense. For example, FORTRAN is based on algebra. This does not mean that FORTRAN is exactly like algebra. But if you know algebra, FORTRAN does not look all that strange to you. You can write an expression in an assignment statement or make a good guess as to the names of library functions you have never seen before.

Programmers are used to working with files in almost every other programming language. The design of files was derived from paper forms; they are very physical and very dependent on the host programming language. A COBOL file could not easily be read by a FORTRAN program, and vice versa. In fact, it was hard to share files even among programs written in the same programming language!

The most primitive form of a file is a sequence of records, ordered within the file and referenced by physical position. You open a file, then read a first record, followed by a series of next records until you come to the last record to raise the end-of-file condition. You navigate among these records and perform actions one record at a time. The actions you take on one file have no effect on other files that are not in the same program. Only programs can change files.

The model for SQL is data kept in sets, not in physical files. The unit of work in SQL is the whole schema, not individual tables.

Sets are those mathematical abstractions you studied in school. Sets are not ordered, and the members of a set are all of the same type. When you perform an operation on a set, the action happens all at once to the entire membership of the set. That is, if I ask for the subset of odd numbers from the set of positive integers, I get all of them back as a single set. I do not build the set of odd numbers by sequentially inspecting one element at a time. I define odd numbers with a rule If the remainder is 1 when you divide the number by 2, it is odd that could test any integer and classify it. Parallel processing is one of many, many advantages of having a set-oriented model.

SQL is not a perfect set language any more than FORTRAN is a perfect algebraic language, as we will see. But if you are in doubt about something in SQL, ask yourself how you would specify it in terms of sets, and you will probably get the right answer.

1.1.1 CREATE SCHEMA Statement

A CREATE SCHEMA statement, defined in the SQL Standard, brings an entire schema into existence all at once. In practice, each product has very different utility programs to allocate physical storage and define a schema. Much of the proprietary syntax is concerned with physical storage allocations.

A schema must have a name and a default character set, usually ASCII or a simple Latin alphabet as defined in the ISO Standards. There is an optional AUTHORIZATION clause that holds a for security. After that the schema is a list of schema elements:

A schema is the skeleton of an SQL database; it defines the structures of the schema objects and the rules under which they operate. The data is the meat on that skeleton.

The only data structure in SQL is the table. Tables can be persistent (base tables), used for working storage (temporary tables), or virtual (VIEWs, common table expressions, and derived tables). The differences among these types are in implementation, not performance. One advantage of having only one data structure is that the results of all operations are also tables, you never have to convert structures, write special operators, or deal with any irregularity in the language.

The has to do with limiting user access to certain schema elements. The is not widely implemented yet, but it works as a constraint that applies to the schema as awhole. Finally, the , , and deal with the display of data. We are not really concerned with any of these schema objects; they are usually set in place by the DBA (database administrator) for the users, and we mere programmers do not get to change them.

Conceptually, a table is a set of zero or more rows, and a row is a set of one or more columns. Each column has a specific data type and constraints that make up an implementation of an abstract domain. The way a table is physically implemented does not matter, because you only access it with SQL. The database engine handles all the details for you and you never worry about the internals, as you would with a physical file.

In fact, almost no two SQL products use the same internal structures. SQL Server uses physically contiguous storage accessed by two kinds of indexes; Teradata uses hashing; Nucleus (SAND Technology) uses compressed bit vector; Informix and CA-Ingres use more than a dozen different kinds of indexes.

There are two common conceptual errors made by programmers who are accustomed to file systems or PCs. The first is thinking that a table is a file; the second is thinking that a table is a spreadsheet. Tables do not behave like either, and you will get surprises if you do not understand the basic concepts.

It is easy to imagine that a table is a file, a row is a record, and a column is a field. This concept is familiar, and when data moves from SQL to the host language, it must be converted into host language data types and data structures to be displayed and used.

The big differences between working with a file system and working with SQL are in the way SQL fits into a host program. If you are using a file system, your programs must open and close files individually. In SQL, the whole schema is connected to or disconnected from the program as a single unit. The host program might not be authorized to see or manipulate all of the tables and other schema objects, but that is established as part of the connection.

The program defines fields within a file, whereas SQL defines its columns in the schema. FORTRAN uses the FORMAT and READ statements to get data from a file. Likewise, a COBOL program uses a Data Division to define the fields and a READ to fetch it. And so it goes for every 3GL’s programming; the concept is the same, though the syntax and options vary.

A file system lets you reference the same data by a different name in each program. If a file’s layout changes, you must rewrite all the programs that use that file. When a file is empty, it looks exactly like all other empty files. When you try to read an empty file, the EOF (end of file) flag pops up and the program takes some action. Column names and data types in a table are defined within the database schema. Within reasonable limits, the tables can be changed without the knowledge of the host program.

The host program only worries about transferring the values to its own variables from the database. Remember the empty set from your high school math class? It is still a valid set. When a table is empty, it still has columns, but has zero rows. There is no EOF flag to signal an exception, because there is no final record.

Another major difference is that tables and columns can have constraints attached to them. A constraint is a rule that defines what must be true about the database after each transaction. In this sense, a database is more like a collection of objects than a traditional passive file system.

A table is not a spreadsheet, even though they look very similar when you view them on a screen or in a printout. In a spreadsheet you can access a row, a column, a cell, or a collection of cells by navigating with a cursor. A table has no concept of navigation. Cells in a spreadsheet can store instructions, not just data. There is no real difference between a row and column in a spreadsheet; you could flip them around completely and still get valid results. This is not true for an SQL table.

1.1.2 Manipulating Tables

The three basic table statements in the SQL DDL are CREATE TABLE, DROP TABLE, and ALTER TABLE. They pretty much do what you would think they do from their names: they bring a table into existence, remove a table, and change the structure of an existing table in the schema, respectively. We will explain them in detail shortly. Here is a simple list of rules for creating and naming a table.

The table name must be unique in the schema, and the column names must be unique within a table. SQL can handle a table and a column with the same name, but it is a good practice to name tables differently from their columns. (See items 4 and 6 in this list.)

The names in SQL can consist of letters, underscores, and digits. Vendors commonly allow other printing characters, but it is a good idea to avoid using anything except letters, underscores, and digits. Special characters are not portable and will not sort the same way in different products.

Standard SQL allows you to use spaces, reserved words, and special characters in a name if you enclose them in double quotation marks, but this should be avoided as much as possible.

The use of collective, class, or plural names for tables helps you think of them as sets. For example, do not name a table Employee unless there really is only one employee; use something like Employees or (better) Personnel, for the table name.

Use the same name for the same attribute everywhere in the schema. That is, do not name a column in one table sex and a column in another table gender when they refer to the same property. You should have a data dictionary that enforces this on your developers.

Use singular attribute names for columns and other scalar schema objects.

I have a separate book on SQL programming style that goes into more detail about this, so I will not mention it again.

A table must have at least one column. Though it is not required, it is also a good idea to place related columns in their conventional order in the table. By default, the columns will print out in the order in which they appear in the table. That means you should put name, address, city, state, and ZIP code in that order, so that you can read them easily in a display.

The conventions in this book are that keywords are in UPPERCASE, table names are Capitalized, and column names are in lowercase. I also use capital letter(s) followed by digit(s) for correlation names (e.g., the table Personnel would have correlation names P0, P1,…, Pn), where the digit shows the occurrence.

DROP TABLE

The DROP TABLE statement removes a table from the database. This is not the same as making the table an empty table. When a schema object is dropped, it is gone forever. The syntax of the statement is:

The clause has two options. If RESTRICT is specified, the table cannot be referenced in the query expression of any view or the search condition of any constraint. This clause is supposed to prevent the unpleasant surprise of having other things fail because they depended on this particular table for their own definitions. If CASCADE is specified, then such referencing objects will also be dropped along with the table.

Either the particular SQL product would post an error message, and in effect do a RESTRICT, or you would find out about any dependencies by having your database blow up when it ran into constructs that needed the missing table.

The DROP keyword and clause are also used in other statements that remove schema objects, such as DROP VIEW, DROP SCHEMA, DROP CONSTRAINT, and SO forth.

This is usually a DBA-only statement that, for obvious reasons, programmers are not typically allowed to use.

ALTER TABLE

The ALTER TABLE statement adds, removes, or changes columns and constraints within a table. This statement is in Standard SQL; it existed in most SQL products before it was standardized. It is still implemented in many different ways, so you should see your product for details. Again, your DBA will not want you to use this statement without permission. The Standard SQL syntax looks like this:

The DROP COLUMN clause removes the column from the table. Standard SQL gives you the option of setting the drop behavior, which most current products do not. The two options are RESTRICT and CASCADE. RESTRICT will not allow the column to disappear if it is referenced in another schema object. CASCADE will also delete any schema object that references the dropped column.

When this statement is available in your SQL product, I strongly advise that you first use the RESTRICT option to see if there are references before you use the CASCADE option.

As you might expect, the ADD COLUMN clause extends the existing table by putting another column on it. The new column must have a name that is unique within the table and that follows the other rules for a valid column declaration. The location of the new column is usually at the end of the list of the existing columns in the table.

The ALTER COLUMN clause can change a column and its definition. Exactly what is allowed will vary from product to product, but usually the data type can be changed to a compatible data type [e.g., you can make a CHAR (n) column longer, but not shorter; change an INTEGER to a REAL; and so forth].

The ADD clause lets you put a constraint on a table. Be careful, though, and find out whether your SQL product will check the existing data to be sure that it can pass the new constraint. It is possible in some older SQL products to leave bad data in the tables, and then you will have to clean them out with special routines to get to the actual physical storage.

The DROP CONSTRAINT clause requires that the constraint be given a name, so naming constraints is a good habit to get into. If the constraint to be dropped was given no name, you will have to find what name the SQL engine assigned to it in the schema information tables and use that name. The Standard does not say how such names are to be constructed, only that they must be unique within a schema. Actual products usually pick a long random string of digits and preface it with some letters to make a valid name that is so absurd no human being would think of it. A constraint name will also appear in warnings and error messages, making debugging much easier. The option behaves as it did for the DROP COLUMN clause.

CREATE TABLE

The CREATE TABLE statement does all the hard work. The basic syntax looks like the following, but there are actually more options we will discuss later.

The table definition includes data in the column definitions and rules for handling that data in the table constraint definitions. As a result, a table acts more like an object (with its data and methods) than like a simple, passive file.

Column Definitions

Beginning SQL programmers often fail to take full advantage of the options available to them, and they pay for it with errors or extra work in their applications. A column is not like a simple passive field in a file system. It has more than just a data type associated with it.

The first important thing to notice here is that each column must have a data type, which it keeps unless you ALTER the table. The SQL Standard offers many data types, because SQL must work with many different host languages. The data types fall into three major categories: numeric, character, and temporal data types. We will discuss the data types and their rules of operation in other sections; they are fairly obvious, so not knowing the details will not stop you from reading the examples that follow.

DEFAULT Clause

The default clause is an underused feature, whose syntax is:

The SQL 2003 Standard also added CURRENT_PATH and

Whenever the SQL engine does not have an explicit value to put into this column during an insertion, it will look for a DEFAULT clause and insert that value. The default option can be a literal value of the relevant data type, the current timestamp, the current date, the current user identifier, or so forth. If you do not provide a DEFAULT clause and the column is NULL-able, the system will provide a NULL as the default. If all that fails, you will get an error message about missing data.

This approach is a good way to make the database do a lot of work that you would otherwise have to code into all the application programs. The most common tricks are to use a zero in numeric columns; a string to encode a missing value (‘{{unknown}}’) or a true default (same address) in character columns; and the system timestamp to mark transactions.

1.1.3 Column Constraints

Column constraints are rules attached to a table. All the rows in the table are validated against them. File systems have nothing like this, since validation is done in the application programs. Column constraints are also one of the most underused features of SQL, so you will look like a real wizard if you can master them.

Constraints can be given a name and some attributes. The SQL engine will use the constraint name to alter the column and to display error messages.

A deferrable constraint can be "turned off during a transaction. The initial state tells you whether to enforce it at the start of the transaction or wait until the end of the transaction, before the COMMIT. Only certain combinations of these attributes make sense.

If INITIALLY DEFERRED is specified, then the constraint has to be DEFERRABLE.

If INITIALLY IMMEDIATE is specified or implicit and neither DEFERRABLE nor NOT DEFERRABLE is specified, then NOT DEFERRABLE is implicit.

The transaction statement can then use the following statement to set the constraints as needed.

This feature was new with full SQL-92, and it is not widely implemented in the smaller SQL products. In effect, they use ‘NOT DEFERRABLE INITIALLY IMMEDIATE’ on all the constraints.

NOT NULL Constraint

The most important column constraint is the NOT NULL, which forbids the use of NULLs in a column. Use this constraint routinely, and remove it only when you have good reason. It will help you avoid the complications of NULL values when you make queries against the data. The other side of the coin is that you should provide a DEFAULT value to replace the NULL that would have been created.

The NULL is a special marker in SQL that belongs to all data types. SQL is the only language that has such a creature; if you can understand how it works, you will have a good grasp of SQL. It is not a value; it is a marker that holds a place where a value might go. But it must be cast to a data type for physical storage.

A NULL means that we have a missing, unknown, miscellaneous, or inapplicable value in the data. It can mean many other things, but just consider those four for now. The problem is which of these four possibilities the NULL indicates depends on how it is used. To clarify this, imagine that I am looking at a carton of Easter eggs and I want to know their colors. If I see an empty hole, I have a missing egg, which I hope will be provided later. If I see a foil-wrapped egg, I have an unknown color value in my set. If I see a multicolored egg, I have a miscellaneous value in my set. If I see a cue ball, I have an inapplicable value in my set. The way you handle each situation is a little different.

When you use NULLs in math calculations, they propagate in the results so that the answer is another NULL. When you use them in logical expressions or comparisons, they return a logical value of UNKNOWN and give SQL its strange three-valued logic. They sort either always high or always low in the collation sequence. They group together for some operations, but not for others. In short, NULLs cause a lot of irregular features in SQL, which we will discuss later. Your best bet is just to memorize the situations and the rules for NULLs when you cannot avoid them.

CHECK() Constraint

The CHECK () constraint tests the rows of the table against a logical expression, which SQL calls a search condition, and rejects rows whose search condition returns FALSE. However, the constraint accepts rows when the search condition returns TRUE or UNKNOWN. This is not the same rule as the WHERE clause, which rejects rows that test UNKNOWN. The reason for this benefit-of-the-doubt feature is so that it will be easy to write constraints on NULL-able columns.

The usual technique is to do simple range checking, such as CHECK (rating BETWEEN 1 AND 10), or to verify that a columns value is in an enumerated set, such as CHECK (sex IN (0, 1, 2, 9)), with this constraint. Remember that the sex column could also be set to NULL, unless a NOT NULL constraint is also added to the columns declaration. Although it is optional, it is a really good idea to use a constraint name. Without it, most SQL products will create a huge, ugly, unreadable random string for the name, since they need to have one in the schema tables. If you provide your own, you can drop the constraint more easily and understand the error messages when the constraint is violated.

For example, you can enforce the rule that a firm must not hire anyone younger than 21 years of age for a job that requires a liquor-serving license by using a single check clause to check the applicant’s birth date and hire date. However, you cannot put the current system date into the CHECK () clause logic for obvious reasons, it is always changing.

The real power of the CHECK () clause comes from writing complex expressions that verify relationships with other rows, with other tables, or with constants. Before SQL-92, the CHECK () constraint could only reference columns in the table in which it was declared. In Standard SQL, the CHECK () constraint can reference any schema object. As an example of how complex things can get, consider a database of movies.

First, let’s enforce the rule that no country can export more than ten titles.

When doing a self-join, you must use the base table name and not all correlation names. Let’s make sure no movies from different countries have the same title.

Here is way to enforce the rule that you cannot export a movie to its own country of origin.

These table-level constraints often use a NOT EXISTS () predicate. Despite the fact that you can often do a lot of work in a single constraint, it is better to write a lot of small constraints, so that you know exactly what went wrong when one of them is violated.

Another important point to remember is that all constraints are true if the table is empty. This is handled by the CREATE ASSERTION statement, which we will discuss shortly.

UNIQUE and PRIMARY KEY Constraints

The UNIQUE constraint says that no duplicate values are allowed in the column or columns involved.

File system programmers understand the concept of a PRIMARY KEY, but for the wrong reasons. They are used to a file, which can have only one key because that key is used to determine the physical order of the records within the file. Tables have no order; the term PRIMARY KEY in SQL has to do with defaults in referential actions, which we will discuss later.

There are some subtle differences between UNIQUE and PRIMARY KEY. A table can have only one PRIMARY KEY but many UNIQUE constraints. A PRIMARY KEY is automatically declared to have a NOT NULL constraint on it, but a UNIQUE column can have NULLs in a row unless you explicitly add a NOT NULL constraint. Adding the NOT NULL whenever possible is a good idea, as it makes the column into a proper relational key.

There is also a multiple-column form of the , which is usually written at the end of the column declarations. It is a list of columns in parentheses after the proper keyword, and it means that the combination of those columns is unique. For example, I might declare PRIMARY KEY (city, department) so I can be sure that though I have offices in many cities and many identical departments in those offices, there is only one personnel department in Chicago.

REFERENCES Clause

The is the simplest version of a referential constraint definition, which can be quite tricky. For now, let us just consider the simplest case:

This relates two tables together, so it is different from the other options we have discussed so far. What this says is that the value in this column of the referencing table must appear somewhere in the referenced table’s column named in the constraint. Furthermore, the referenced column must be in a UNIQUE constraint. For example, you can set up a rule that the Orders table can have orders only for goods that appear in the Inventory table.

If no is given, then the PRIMARY KEY column of the referenced table is assumed to be the target. This is one of those situations where the PRIMARY KEY is important, but you can always play it safe and explicitly name a column. There is no rule to prevent several columns from referencing the same target column. For example, we might have a table of flight crews that has pilot and copilot columns that both reference a table of certified pilots.

A circular reference is a relationship in which one table references a second table, which in turn references the first table. The old gag about you cannot get a job until you have experience, and you cannot get experience until you have a job! is the classic version of this.

Notice that the columns in a multicolumn FOREIGN KEY must match to a multicolumn PRIMARY KEY or UNIQUE constraint. The syntax is:

Referential Actions

The REFERENCES clause can have two subclauses that take actions when a database event changes the referenced table. This feature came with Standard SQL and took a while to be implemented in most SQL products. The two database events are updates and deletes, and the subclauses look like this:

When the referenced table is changed, one of the referential actions is set in motion by the SQL engine.

The CASCADE option will change the values in the referencing table to the new value in the referenced table. This is a very common method of DDL programming that allows you to set up a single table as the trusted source for an identifier. This way the system can propagate changes automatically.

This removes one of the arguments for nonrelational system-generated surrogate keys. In early SQL products that were based on a file system for their physical implementation, the values were repeated for both the referenced and referencing tables. Why? The tables were regarded as separate units, like files.

Later SQL products regarded the schema as a whole. The referenced values appeared once in the referenced table, and the referencing tables obtained them by following pointer chains to that one occurrence in the schema. The results are much faster update cascades, a physically smaller database, faster joins, and faster aggregations.

The SET NULL option will change the values in the referencing table to a NULL. Obviously, the referencing column needs to be NULL-able.

The SET DEFAULT option will change the values in the referencing table to the default value of that column. Obviously, the referencing column needs to have some DEFAULT declared for it, but each referencing column can have its own default in its own table.

The NO ACTION option explains itself. Nothing is changed in the referencing table, and it is possible that some error message about reference violation will be raised. If a referential constraint does not specify any ON UPDATE or ON DELETE rule in the update rule, then NO ACTION is implicit. You will also see the reserved word RESTRICT in some products instead of NO ACTION.

Standard SQL has more options about how matching is done between the referenced and referencing tables. Most SQL products have not implemented them, so I will not mention them anymore.

Standard SQL has deferrable constraints. This option lets the programmer turn a constraint off during a session, so that the table can be put into a state that would otherwise be illegal. However, at the end of a session, all the constraints are enforced. Many SQL products have implemented these options, and they can be quite handy, but I will not mention them until we get to the section on transaction control.

1.1.4 UNIQUE Constraints versus UNIQUE Indexes

UNIQUE constraints are not the same thing as UNIQUE indexes. Technically speaking, indexes do not even exist in Standard SQL. They were considered too physical to be part of a logical model of a language. In practice, however, virtually all products have some form of access enhancement for the DBA to use, and most often, it is an index.

The column referenced by a FOREIGN KEY has to be either a PRIMARY KEY or a column with a UNIQUE constraint; a unique index on the same set of columns cannot be referenced, since the index is on one table and not a relationship between two tables.

Although there is no order to a constraint, an index is ordered, so the unique index might be an aid for sorting. Some products construct special index structures for the declarative referential integrity (DRI) constraints, which in effect pre-JOIN the referenced and referencing tables.

All the constraints can be defined as equivalent to some CHECK constraint. For example:

These predicates can be reworded in terms of other predicates and subquery expressions, and then passed on to the optimizer.

1.1.5 Nested UNIQUE Constraints

One of the basic tricks in SQL is representing a one-to-one or many-to-many relationship with a table that references the two (or more) entity tables involved by their primary keys. This third table has several popular names, such as junction table or join table, but we know that it is a relationship. This type of table needs constraints to ensure that the relationships work properly.

For example, here are two tables:

Yes, I know using names for a key is a bad practice, but it will make my examples easier to read. There are a lot of different relationships that we can make between these two tables. If you don’t believe me, just watch the Jerry Springer Show sometime. The simplest relationship table looks like this:

The Couples table allows us to insert rows like this:

Oops! I am shown twice with Hilary Duff, because the Couples table does not have its own key. This mistake is easy to make, but the way to fix it is not obvious.

The Orgy table gets rid of the duplicated rows and makes this a proper table. The primary key for the table is made up of two or more columns and is called a compound key because of that fact. These are valid rows now.

But the only restriction on the couples is that they appear only once. Every boy can be paired with every girl, much to the dismay of the Moral Majority. I think I want to make a rule that guys can have as many gals as they want, but the gals have to stick to one guy.

The way I do this is to use a NOT NULL UNIQUE constraint on the girl_name column, which makes it a key. It is a simple key, since it is only one column, but it is also a nested key, because it appears as a subset of the compound PRIMARY KEY.

The Playboys is a proper table, without duplicated results, but it also enforces the condition that I get to play around with one or more

Enjoying the preview?
Page 1 of 1