You are on page 1of 16

Relational Algebra

Lecture 7

Outline
Relational Algebra (Section 6.1)

Relational Algebra
Formalism for creating new relations from existing ones Its place in the big picture:
Declarative query language SQL, relational calculus

Algebra Relational algebra

Implementation

Relational Algebra
Five operators:
Union: ! Difference: Selection: s Projection: P Cartesian Product: "

Derived or auxiliary operators:


Intersection, complement Joins (natural,equi-join, theta join, semi-join) Renaming: r

1. Union and 2. Difference


R1 ! R2 Example:
ActiveEmployees ! RetiredEmployees

R1 R2 Example:
AllEmployees ! RetiredEmployees
5

What about Intersection?


It is a derived operator R1 # R2 = R1 (R1 R2) Also expressed as a join (will see later) Example
UnionizedEmployees # RetiredEmployees

3. Selection
Returns all tuples which satisfy a condition Notation: s c(R) (sometimes also !) Examples
s Salary > 40000 (Employee) s name = Smith (Employee) condition

The condition c can be =, <, $, >, %, <>


[in SQL: SELECT * FROM Employee WHERE Salary > 40000]

Selection Example Employee SSN 999999999 777777777 888888888 Name John Tony Alice DepartmentID 1 1 2 Salary 30,000 32,000 45,000

Find all employees with salary more than $40,000. s Salary > 40000 (Employee)

SSN Name 888888888 Alice

DepartmentID 2

Salary 45,000
8

4. Projection
Eliminates columns, then removes duplicates Notation: P A1,,An (R) (sometimes !) attributes Example:
project to social-security number and names: P SSN, Name (Employee) Output schema: Answer(SSN, Name)
[In SQL: SELECT DISTINCT SSN, Name FROM Employee]

Projection Example Employee SSN 999999999 777777777 888888888 SSN 999999999 777777777 888888888 Name John Tony Alice Name John Tony Alice
10

DepartmentID 1 1 2

Salary 30,000 32,000 45,000

P SSN, Name (Employee)

5. Cartesian Product
Combine each tuple in R1 with each tuple in R2 Notation: R1 " R2 Example:
Employee " Dependents

Very rare in practice; mainly used to express joins


[In SQL: SELECT * FROM R1, R2]
11

Cartesian Product Example Employee Name John Tony Dependents EmployeeSSN 999999999 777777777 SSN 999999999 777777777 Dname Emily Joe

Employee x Dependents Name SSN EmployeeSSN John 999999999 999999999 John 999999999 777777777 Tony 777777777 999999999 Tony 777777777 777777777

Note the output


Dname Emily Joe Emily Joe

12

Cartesian Product and SQL


select * from Employee, Dependents; +------+-----------+-------------+-------+ | Name | SSN | EmployeeSSN | Dname | +------+-----------+-------------+-------+ | John | 999999999 | 999999999 | Emily | | Tony | 777777777 | 999999999 | Emily | | John | 999999999 | 777777777 | Joe | | Tony | 777777777 | 777777777 | Joe | +------+-----------+-------------+-------+ select * from Employee, Dependents where SSN=EmpoyeeSSN; +------+-----------+-------------+-------+ | Name | SSN | EmployeeSSN | Dname | +------+-----------+--------------+-------+ | John | 999999999 | 999999999 | Emily | | Tony | 777777777 | 777777777 | Joe | 13 +------+-----------+-------------+-------+

Relational Algebra
Five operators:
Union: ! Difference: Selection: s Projection: P Cartesian Product: "

Derived or auxiliary operators:


Intersection, complement Joins (natural,equi-join, theta join, semi-join) Renaming: r

14

Renaming
Changes the schema, not the instance Schema: R(A1, , An ) Notation: r B1,,Bn (R) Example:
r LastName, SocSocNo (Employee) Output schema: Answer(LastName, SocSocNo)
[in SQL: SELECT Name AS LastName, SSN AS SocSocNo FROM Employee]
15

Renaming Example
Employee Name John Tony SSN 999999999 777777777

r LastName, SocSocNo (Employee)


LastName John Tony SocSocNo 999999999 777777777
16

Natural Join
Notation: R1 ! R2 Meaning: R1 ! R2 = P A(s C(R1 " R2))
Equality on common attributes names

Where:
The selection s C checks equality of all common attributes The projection eliminates the duplicate common attributes
[in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2 WHERE R1.B = R2.B Schema: R1(A,B), R2(B,C)]
17

Natural Join Example Employee Name John Tony Dependents SSN 999999999 777777777 SSN 999999999 777777777 Dname Emily Joe

Employee Dependents = P Name, SSN, Dname(s SSN=SSN2(Employee x r SSN2, Dname(Dependents)) Name John Tony SSN 999999999 777777777 Dname Emily Joe
18

Natural Join
R=
A X X Y Z B Y Z Z V

S=

B Z V Z

C U W V

R ! S=

A X X Y Y Z

B Z Z Z Z V

C U V U V W
19

Natural Join
Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ! S ? Given R(A, B, C), S(D, E), what is R ! S ? Given R(A, B), S(A, B), what is R ! S ?
20

Theta Join
A join that involves a predicate R1 ! q R2 = s q (R1 " R2) Here q can be any condition

21

Eq-join
A theta join where q is an equality R1 !A=B R2 = s A=B (R1 " R2) Equality on two different attributes Example:
Employee !SSN=SSN Dependents

Most useful join in practice (difference to natural join?)


22

Semijoin
R " S = P A1,,An (R ! S) Where A1, , An are the attributes in R Example:
Employee " Dependents

23

Semijoin - Example
Employee Name EmpId Harry 3415 Sally 2241 George 3401 Harriet 2202
DeptName Finance Sales Finance Sales

Dept DeptName Sales Production

Manager Harriet Charles

Employee ! Dept Name EmpId DeptName Sally 2241 Sales Harriet 2202 Sales

Only attributes of Employee are selected

24

Semijoins in Distributed Databases


Semijoins are used in distributed databases
Dependents Employee
SSN ... Name ... SSN Dname Age ... ...

network

Employee !ssn=ssn (s
R = Employee "!T

age>71 (Dependents))

T = P SSN s

age>71 (Dependents)
25

Answer = R ! Dependents

Complex RA Expressions
P name
buyer-ssn=ssn pid=pid

seller-ssn=ssn

P ssn s name=fred
Person Purchase

P pid s name=gizmo

Person Product 26

Application: Query Rewriting for Optimization


sname sname bid=100 rating > 5

sid=sid sid=sid

(Scan; write to bid=100 temp T1) Sailors Reserves

rating > 5

(Scan; write to temp T2)

Reserves

Sailors

The earlier we process selections, the fewer tuples we need to manipulate higher up in the tree (predicate pushdown) 27 Disadvantages?

Algebraic Laws (Examples)


Commutative and Associative Laws
R " S = S " R, R " (S " T) = (R " S) " T R ! S = S ! R, R ! (S ! T) = (R ! S) ! T

Laws involving selection


s s
C AND C"(R) C (R

= s C(s C"(R)) = s C(R) " s S) = s C (R) ! S

C"(R)

When C involves only attributes of R

Laws involving projections


P M(P N(R)) = P M,N(R)
28

Operations on Bags
A bag = a set with repeated elements All operations need to be dened carefully on bags {a,b,b,c}!{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} {a,b,b,b,c,c} {b,c,c,c,d} = {a,b,b} s C(R): preserve the number of occurrences P A(R): no duplicate elimination Cartesian product, join: no duplicate elimination

Important: Relational Engines work on bags, not sets!


29

Finally: RA has Limitations !


Cannot compute transitive closure
Name1 Fred Mary Mary Nancy Name2 Mary Joe Bill Lou Relationship Father Cousin Spouse Sister

Find all direct and indirect relatives of Fred Cannot express in RA !!! Need to write C program

30

Conclusion
Relational algebra is important for SQL We mainly need projects, selections, joins Joins are typically time consuming for large databases

31

You might also like