Professional Documents
Culture Documents
Lecture 7
Outline
Relational Algebra (Section 6.1)
Relational Algebra
Formalism for creating new relations from existing ones Its place in the big picture:
Declarative query language SQL, relational calculus
Implementation
Relational Algebra
Five operators:
Union: ! Difference: Selection: s Projection: P Cartesian Product: "
R1 R2 Example:
AllEmployees ! RetiredEmployees
5
3. Selection
Returns all tuples which satisfy a condition Notation: s c(R) (sometimes also !) Examples
s Salary > 40000 (Employee) s name = Smith (Employee) condition
Selection Example Employee SSN 999999999 777777777 888888888 Name John Tony Alice DepartmentID 1 1 2 Salary 30,000 32,000 45,000
Find all employees with salary more than $40,000. s Salary > 40000 (Employee)
DepartmentID 2
Salary 45,000
8
4. Projection
Eliminates columns, then removes duplicates Notation: P A1,,An (R) (sometimes !) attributes Example:
project to social-security number and names: P SSN, Name (Employee) Output schema: Answer(SSN, Name)
[In SQL: SELECT DISTINCT SSN, Name FROM Employee]
Projection Example Employee SSN 999999999 777777777 888888888 SSN 999999999 777777777 888888888 Name John Tony Alice Name John Tony Alice
10
DepartmentID 1 1 2
5. Cartesian Product
Combine each tuple in R1 with each tuple in R2 Notation: R1 " R2 Example:
Employee " Dependents
Cartesian Product Example Employee Name John Tony Dependents EmployeeSSN 999999999 777777777 SSN 999999999 777777777 Dname Emily Joe
Employee x Dependents Name SSN EmployeeSSN John 999999999 999999999 John 999999999 777777777 Tony 777777777 999999999 Tony 777777777 777777777
12
Relational Algebra
Five operators:
Union: ! Difference: Selection: s Projection: P Cartesian Product: "
14
Renaming
Changes the schema, not the instance Schema: R(A1, , An ) Notation: r B1,,Bn (R) Example:
r LastName, SocSocNo (Employee) Output schema: Answer(LastName, SocSocNo)
[in SQL: SELECT Name AS LastName, SSN AS SocSocNo FROM Employee]
15
Renaming Example
Employee Name John Tony SSN 999999999 777777777
Natural Join
Notation: R1 ! R2 Meaning: R1 ! R2 = P A(s C(R1 " R2))
Equality on common attributes names
Where:
The selection s C checks equality of all common attributes The projection eliminates the duplicate common attributes
[in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2 WHERE R1.B = R2.B Schema: R1(A,B), R2(B,C)]
17
Natural Join Example Employee Name John Tony Dependents SSN 999999999 777777777 SSN 999999999 777777777 Dname Emily Joe
Employee Dependents = P Name, SSN, Dname(s SSN=SSN2(Employee x r SSN2, Dname(Dependents)) Name John Tony SSN 999999999 777777777 Dname Emily Joe
18
Natural Join
R=
A X X Y Z B Y Z Z V
S=
B Z V Z
C U W V
R ! S=
A X X Y Y Z
B Z Z Z Z V
C U V U V W
19
Natural Join
Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ! S ? Given R(A, B, C), S(D, E), what is R ! S ? Given R(A, B), S(A, B), what is R ! S ?
20
Theta Join
A join that involves a predicate R1 ! q R2 = s q (R1 " R2) Here q can be any condition
21
Eq-join
A theta join where q is an equality R1 !A=B R2 = s A=B (R1 " R2) Equality on two different attributes Example:
Employee !SSN=SSN Dependents
Semijoin
R " S = P A1,,An (R ! S) Where A1, , An are the attributes in R Example:
Employee " Dependents
23
Semijoin - Example
Employee Name EmpId Harry 3415 Sally 2241 George 3401 Harriet 2202
DeptName Finance Sales Finance Sales
Employee ! Dept Name EmpId DeptName Sally 2241 Sales Harriet 2202 Sales
24
network
Employee !ssn=ssn (s
R = Employee "!T
age>71 (Dependents))
T = P SSN s
age>71 (Dependents)
25
Answer = R ! Dependents
Complex RA Expressions
P name
buyer-ssn=ssn pid=pid
seller-ssn=ssn
P ssn s name=fred
Person Purchase
P pid s name=gizmo
Person Product 26
sid=sid sid=sid
rating > 5
Reserves
Sailors
The earlier we process selections, the fewer tuples we need to manipulate higher up in the tree (predicate pushdown) 27 Disadvantages?
C"(R)
Operations on Bags
A bag = a set with repeated elements All operations need to be dened carefully on bags {a,b,b,c}!{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} {a,b,b,b,c,c} {b,c,c,c,d} = {a,b,b} s C(R): preserve the number of occurrences P A(R): no duplicate elimination Cartesian product, join: no duplicate elimination
Find all direct and indirect relatives of Fred Cannot express in RA !!! Need to write C program
30
Conclusion
Relational algebra is important for SQL We mainly need projects, selections, joins Joins are typically time consuming for large databases
31