Professional Documents
Culture Documents
Outline
n Vertical fragmentation
à Grouping, Splitting
n Information Requirements
n Clustering algorithm
à Bond Energy Algorithm (BEA)
n Partitioning algorithm
1
Vertical Fragmentation
§ Vertical fragmentation of a relation R produces
fragments R1, R2,…, Rr, each of which contains a
subset of R’s attributes as well as the primary key of R.
§ Objective
• To partition a relation into a set of smaller relations so that
many of the user applications will run on only one
fragment.
Vertical Fragmentation
§ Has been studied within the centralized context:
• design methodology
- Which allows the user queries to deal with smaller relations, thus causing a
smaller number of page accesses.
• physical clustering
– Most ‘active’ subrelations are identified and placed in faster memory
subsystem.
2
Vertical Fragmentation
§ Grouping
• Results in overlapping fragments
§ Splitting
• Generates non-overlapping fragments (non-primary key attributes)
• Fits the top-down design methodology
VF – Information requirements
§ Application Information
• Attribute affinities
– a measure that indicates how closely related the attributes are
– This is obtained from more primitive usage data
• Attribute usage values (access frequencies)
– Given a set of queries Q = {q1, q2,…, qq} that will run on the
relation R [A1, A2,…, An],
3
VF – Definition of use(qi,Aj)
Consider the following 4 queries for relation PROJ
A1 A2 A3 A4
q1 1 0 1 0
q2 0 1 1 0
q3 0 1 0 1
q4 0 0 1 1
4
VF – Access Characteristics
§ Attribute usage values are not sufficient
• Doesn’t represent the weight of application frequencies
§ Frequency measure
• aff(Ai, Aj)
• Measures the bond between two attributes according to
how they are accessed by applications
5
Attribute Affinity Matrix
§ This is a matrix that has the attributes of the relation
we are dealing with on both axes
Query Characteristics
§ Suppose there are three sites accessing the “PROJ” relation,
with these relative frequencies:
S1 S2 S3 Queries using
attributes Budget
q1 15 20 10 from S3 10 times a
day
q2 5 0 0
q3 25 25 25
q4 3 0 0
Matrix S describing which type of query is accessed how frequently from each site
6
VF – Calculation of aff(Ai, Aj)
Q Matrix: A1 A2 A3 A4 Access Frequencies:
S1 S2 S3
q1 1 0 1 0 q1 15 20 10
q2 0 1 1 0 q2 5 0 0
q3 25 25 25
q3 0 1 0 1
q4 3 0 0
q4 0 0 1 1
Assume each query in the previous example A1 A2 A3 A4
accesses the attributes once during each
execution.
A1 45 0 45 0
Then A2 0 80 5 75
aff(A1, A3) = 15*1 + 20*1+10*1
= 45 A3 45 5 53 3
A4 0 75 3 78
and the attribute affinity matrix
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 13
Class Exercise:
§ Given the following construct the affinity matrix:
Q Matrix: Access Frequencies:
A1 A2 A3 A4 A5 A6
S1 S2 S3
q1 0 1 1 0 1 1 q1 60 0 45
q2 0 5 0
q2 1 1 0 1 0 0
q3 5 7 2
q3 1 0 0 1 1 0
q4 35 38 13
q4 0 0 1 0 0 1
7
Answer
A1 A2 A3 A4 A5 A6
A1 19 5 0 19 14 0
A4 19 5 0 19 14 0
Outline
n Vertical fragmentation
à Grouping, Splitting
n Information Requirements
n Clustering algorithm
à Bond Energy Algorithm (BEA)
n Partitioning algorithm
8
VF – Clustering Algorithm
§ The next step is to group together the attribute(s) that have
high affinity for each other – this results in a clustered affinity
matrix
§ Then splitting the relation accordingly
§ Bond Energy Algorithm (BEA) has been used for clustering of
entities. BEA finds an ordering of entities (in our case
attributes) such that the global affinity measure
is minimized.
9
Bond Energy Algorithm
“Best” placement? Define contribution of a placement:
Where
n
bond(Ax,Ay) = ∑Aff (A ,A ) aff (A ,A )
z x z y
z=1
BEA – Example
Consider the following AA matrix and the corresponding CA matrix where A1
and A2 have been placed. Place A3:
A1 A2 A3 A4 A1 A2
A1 45 0 45 0
A1 45 0
AA = A2 0 80 5 75
CA =
A2 0 80
A3 45 5 53 3
A3 45 5
A4 0 75 3 78
A4 0 75
Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150 (Consider highest contribution)
Ordering (2-3-5) :
cont (A2,A3,A5) = 1780
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 20
10
BEA – Example
A1 45 45 0
A2 0 5 80
CA =
A3 45 53 5
A4 0 3 75
BEA – Example
A1 A3 A2 A4
A1 45 45 0 0
A2 0 5 80 75
CA =
A3 45 53 5 3
A4 0 3 75 78
11
BEA – Example
Row organization
A1 A3 A2 A4 A1 A3 A2 A4
A1 45 45 0 0 A1 45 45 0 0
A2 0 5 80 75 A3 45 53 5 3
CA = reordering rows
A3 45 53 5 3 A2 0 5 80 75
A4 0 3 75 78 A4 0 3 75 78
Outline
n Vertical fragmentation
à Grouping, Splitting
n Information Requirements
n Clustering algorithm
à Bond Energy Algorithm (BEA)
n Partitioning algorithm
12
VF – Algorithm
How can you divide a set of clustered attributes {A1, A2,
…, An} into two (or more) sets {A1, A2, …, Ai} and {Ai,
…, An} such that there are no (or minimal) applications
that access both (or more than one) of the sets.
A1 A2 A3 … Ai Ai+1 . . . Am
A1
A2
...
TA
Ai
Ai+1
...
BA
Am
Vertical Splitting
13
VF – ALgorithm
Define
TQ = set of applications that access only TA
BQ= set of applications that access only BA
OQ= set of applications that access both TA and BA
and
CTQ = total number of accesses to attributes by applications
that access only TA
CBQ = total number of accesses to attributes by applications
that access only BA
COQ = total number of accesses to attributes by applications
that access both TA and BA
Then find the point along the diagonal that maximizes
Split Quality
§ Compare by computing split quality
• A positive contribution for the good case
• A negative contribution for the bad case
§ Computation of the number is simple, given our
access model
• For each query q1 – q4
– Select the cases where attributes from one and where attributes
from both fragments are accessed (by inspecting matrix Q)
– For these cases we add over all sites the total number of accesses
made by taking them from Matrix M
14
Vertical Splitting
2
2 2 2
-78
VF – Correctness
A relation R, defined over attribute set A and key K, generates the
vertical partitioning FR = {R1, R2, …, Rr}.
§ Completeness
à Each set of attributes in the relation appears in either one of the
fragments F1, F2, F3, … etc
§ Reconstruction
à The initial relation, R, can be reconstructed from the fragments
F1, F2, F3, …. etc
§ Disjointness
à Each set of attributes is found in only one of the fragments
à Duplicated keys are not considered to be overlapping. Disjointness is
only on the non-primary key attributes
Distributed DBMS © 1998 M. Tamer Özsu & Patrick Valduriez Page 5. 30
15
VF - Exercise
n Consider the following Attribute Usage Matrix (Q) and Access Frequencies
(M) for a relation R{A1,A2,A3,A4}, which has 4 queries (Q1,Q2,Q3,Q4)
running at 3 sites (S1,S2,S3):
16