Professional Documents
Culture Documents
DESIGNING A
DISTRIBUTED
DATABASE FOR A
SUPPLY CHAIN
MANAGEMENT
SYSTEM
SAK5310 DISTRIBUTED DATABASE
List of Figures
Figure 1: Entity Relationship Diagram......................................................................................4
Figure 2: Attribute Usage Matrix............................................................................................13
Figure 3: Application Access Frequencies Matrix...................................................................14
Figure 4: Attribute Affinity Matrix..........................................................................................14
Figure 5: Fragmentation Location..........................................................................................20
2|Page
1 INTRODUCTION
A company by the name ACE Sdn. Bhd. is the main supplier of copper based wire cable in
Peninsular Malaysia. It has several warehouses strategically located at northern, east cost,
central and southern regions. Each of the warehouses stores wire cable of different type based
on the size of copper core diameter. ACE Sdn. Bhd. main office which is in Muar and the regional
management offices have to make sure the warehouses has enough supply of wire cable to
meet demand from the local market. The quantity of product in-store, the purchasing of
additional product, the storage capacity and the warehouse capital expenditure need to be
consider before any transaction can be taken place. Timely decision is the critical factor in their
day-to-day operations.
In order to cope with this complex process the company has decided to employ a new
Supply Chain Management System with Distributed Database Management System. The
following is the set of requirements for the company new system.
c. A storage table that keep track the quantity of the products based on length of
the cable in meter being stored in each of the warehouses.
d. A purchase order that keep track the purchase made by each of the warehouses
and the quantity based on the length of the cable in meter.
2 DATABASE INFOMATION
Detail information regarding the database for the supply chain management system i.e. The
Entity-relationship diagram, relation schemas, instances of the relations and the expression of
relationships among relations using links are presented below.
3|Page
Figure 1: Entity Relationship Diagram
WAREHOUSE_INFO
No Fields Type Key Comments
1 W_ID varchar(10) PK Warehouse info identifier
2 W_NAME varchar(10) Warehouse name
3 W_ADDR varchar(100) address
4 W_REGION varchar(15) region
5 W_CAPACITY Integer Floor capacity in cubic meter
6 W_CAPITAL Currency Capital expenditure
4|Page
PURCHASE
No Fields Type Key Comments
1 PO_ID varchar(10) PK Purchase order identifier
2 PO_AMOUNT integer Length of cable in meter
3 W_ID varchar(10) FK Warehouse id
4 P_ID varchar(10) FK Product id
STORAGE
No Fields Type Key Comments
1 W_ID varchar(10) PK,FK Warehouse id
2 P_ID varchar(10) PK,FK Product id
3 S_AMOUNT integer Quantity of cable in meter
WAREHOUSE_INFO
W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL
W001 Kulim 01 Kulim, Kedah NORTH 1000 1,000,000
W002 Kulim 02 Kulim, Kedah NORTH 1000 1,000,000
W003 Kajang 01 Kajang, CENTRAL 3000 5,000,000
Selangor
W004 Muar 01 Muar, Johor SOUTH 4000 10,000,000
W005 Kuantan 01 Kuantan, EAST COAST 2500 2,000,000
Pahang
PURCHASE
PO_ID PO_AMOUNT P_ID W_ID
PO_001 1500 P003 W005
PO_002 3000 P001 W004
PO_003 4000 P004 W001
PO_004 4500 P002 W002
PO_005 1000 P001 W005
STORAGE
W_ID P_ID S_AMOUNT
W001 P001 2500
5|Page
W001 P002 2500
W001 P003 2500
W002 P004 5000
W003 P001 5000
W003 P002 2000
W003 P003 3000
W004 P001 5000
W004 P002 5000
W004 P003 5000
W005 P002 3000
W005 P003 3000
W005 P004 3000
L1
WAREHOUSE_INFO
W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL
L2
PURCHASE
PO_ID PO_AMOUNT PO_DATE P_ID W_ID
L4 L3
STORAGE
W_ID P_ID S_AMOUNT
(Q1) SELECT *
FROM STORAGE
6|Page
(Q2) SELECT *
FROM PURCHASE
(Q3) SELECT *
FROM WAREHOUSE_INFO
WHERE W_REGION = “value”
(Q4) SELECT *
FROM WAREHOUSE_INFO
WHERE W_CAPACITY < 3000
2.5 ASSUMPTIONS
The following assumptions are applied in designing the distributed Database System for
Supply Chain Management System of ACE Sdn. Bhd. databases. This assumption will help us to
understand the data distribution in various locations.
i. The distributed DBMS software exists at each site where the data are stored.
ii. Each of DBMS is shared across in each site and at the same time they also have local
autonomy.
iii. There are four (4) sites for data distribution that located in each regions i.e Kulim,
Kajang, Muar and Kuantan. Muar acts as a head quarter.
iv. The company is using dedicated Metro-e network to connect all the sites. The database
and application has been designed to perform distribution at each site.
7|Page
3 FRAGMENTATION PHASE
For the purpose of this project we will be focusing on 1 example for each fragmentation
strategies (Hybrid fragmentation is excluded in this exercise).
8|Page
M10: P1 ^ P2 ^ P3
etc.
iii. Results
WAREHOUSE_INFO1 = σM1 (WAREHOUSE_INFO)
W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL
W001 Kulim 01 Kulim, Kedah NORTH 1000 1,000,000
W002 Kulim 02 Kulim, Kedah NORTH 1000 1,000,000
9|Page
Completeness:
The resulting fragmentation above satisfied based on the selection Predicates are
complete, and is guarantee to be completed because no information missing.
All tuples in the original relation WAREHOUSE_INFO can be found in the resulting
relations {WAREHOUSE_INFO1, WAREHOUSE_INFO4, WAREHOUSE_INFO6,
WAREHOUSE_INFO7}.
The resulting fragmentation is complete.
Reconstruction:
Original global relation WAREHOUSE can be reconstructed by the union operator on
the resulting fragmentation,
F WAREHOUSE = { WAREHOUSE_INFO1
∪WAREHOUSE_INFO4∪WAREHOUSE_INFO6∪WAREHOUSE_INFO7 }
Disjointness:
The resulting fragmentation above showing that no over lapping between data
tuple can’t be found.
WAREHOUSE_INFO
W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL
L4
STORAGE
W_ID P_ID S_AMOUNT
Owner L4 : WAREHOUSE_INFO
MEMBER L4 : STORAGE
10 | P a g e
ii. The application results in a fragmentation of STORAGE according to the fragmentations
WAREHOUSE_INFO1, WAREHOUSE_INFO4, WAREHOUSE_INFO6, WAREHOUSE_INFO7
obtained earlier. Therefore the derived fragmentation of STORAGE is defined as follows:
iii. Result
STORAGE1
W_ID P_ID S_AMOUNT
W001 P001 2500
W001 P002 2500
W001 P003 2500
W002 P004 5000
STORAGE2
W_ID P_ID S_AMOUNT
W003 P001 5000
W003 P002 2000
W003 P003 3000
STORAGE3
W_ID P_ID S_AMOUNT
W004 P001 5000
W004 P002 5000
W004 P003 5000
STORAGE4
W_ID P_ID S_AMOUNT
W005 P002 3000
W005 P003 3000
W005 P004 3000
Completeness:
The resulting fragmentation is complete. All tuples in the original relation
STORAGE can be found in the resulting relations = {STORAGE1, STORAGE2,
STORAGE3, STORAGE4}.
11 | P a g e
Reconstruction:
Original global relation STORAGE can be reconstructed by the union operator on
the resulting fragmentation,
FSTORAGE = {STORAGE1∪STORAGE2∪STORAGE3∪STORAGE4},
STORAGE = ∪ STORAGEi , ∀ STORAGE∈ FSTORAGE
Disjointness:
The resulting fragmentation showing the join graph below it is simple.
STORAGE1 WAREHOUSE_INFO1
STORAGE2 WAREHOUSE_INFO4
STORAGE3 WAREHOUSE_INFO6
STORAGE4 WAREHOUSE_INFO7
Q3: Find all the product and the purchase order being purchased by a particular
warehouse
SELECT P_ID, PO_ID
FROM PURCHASE
WHERE W_ID=”Value”
12 | P a g e
Q1 = 20, Q2 = 0, Q3 = 20
A1 A 2 A 3 A 4
Q1 1 0 0 0
Q2 0 1 1 0
Q3 1 0 1 1
Figure 2: Attribute Usage Matrix
ii. The sites access values to the queries are defined in a matrix called Application Access
Frequencies Matrix, where entry (i,j) denotes access (Qi,Sj):
S1 S2 S3 S4
Q1 10 20 10 10 = 50
Q2 5 0 10 5 = 20
Q3 25 20 0 25 = 70
Figure 3: Application Access Frequencies Matrix
iii. The Attribute Affinity Matrix, which shows how many times each (i,j) attribute
entry are accessed together, is calculated as follow: Based on the:
We can calculate the affinity matrix AA. Following matrix is the result of the
calculation.
13 | P a g e
A1 A2 A3 A4
A1 120 0 70 70
A2 0 20 20 0
A3 70 20 90 70
A4 70 0 70 70
iv. Next is to calculate the Clustered Affinity (CA) Matrix, three steps must be done:
the first is the Initialization, the second is the Iteration, and the third is the Row
Ordering
Initialization
According to the initialization step, we copy columns 1 and 2 of the Attribute
Affinity Matrix to the Clustered Affinity (CA) Matrix.
A1 A2
A1 120 0
A2 0 20
A3 70 20
A4 70 0
Iteration
Base on initialization above, we need to decide where the best placement for
column 3 and 4. For this purpose we will use Bond Energy Algorithm (BEA) which
stated as follows:
For Column 3:
Ordering (0-3-1)
cont(A0, A3, A1) = 2bond(A0, A3) + 2bond(A3, A1) – 2bond(A0, A1)
14 | P a g e
bond(A0, A3) = bond(A0, A1) = 0
bond(A3, A1) = (120 x 70) + (0 x 20) + (70 x 90) + (70 x 70) = 19600
cont(A0, A3, A1) = (2 x 0) + (2 x 19600) – (2 x 0) = 39200
Ordering (1-3-2)
cont(A1, A3, A2) = 2bond(A1, A3) + 2bond(A3, A2) – 2bond(A1, A2)
bond(A1, A3) = 19600
bond(A3, A2) = (0 x 70) + (20 x 20) + (20 + 90) + (0 x 70) = 2200
bond(A1, A2) = (120 x 0) + (0 x 20) + (70 x 20) + (70 x 0) = 1400
cont(A1, A3, A2) = (2 x 19600) + (2 x 2200) – (2 x 1400) = 40800
Ordering (2-3-4)
cont(A2, A3, A4) = 2bond(A2, A3) + 2bond(A3, A4) – 2bond(A2, A4)
bond(A2, A3) = 2200
bond(A3, A4) = 0
bond(A2, A4) = 0
cont(A1, A3, A2) = (2 x 2200) + 0 – 0 = 4400
Since the contribution of the ordering (1-3-2) is the largest, we will place the A 3
in between A1 and A2.
A1 A 3 A2
A1 120 70 0
A2 0 20 20
A3 70 90 20
A4 70 70 0
For Column 4:
Ordering (0-4-1)
cont(A0, A4, A1) = 2bond(A0, A4) + 2bond(A4, A1) – 2bond(A0, A1)
bond(A0, A4) = bond(A0, A1) = 0
bond(A4, A1) = (120 x 70) + (0 x 0) + (70 x 70) + (70 x 70) = 18200
cont(A0, A4, A1) = (2 x 0) + (2 x 18200) – (2 x 0) = 36400
Ordering (1-4-3)
cont(A1, A4, A3) = 2bond(A1, A4) + 2bond(A4, A3) – 2bond(A1, A3)
bond(A1, A4) = 18200
bond(A4, A3) = (70 x 70) + (20 x 0) + (90 x 70) + (70 x 70) = 16100
15 | P a g e
bond(A1, A3) = 19600
cont(A1, A4, A3) = (2 x 18200) + (2 x 16100) – (2 x 19600) = 29400
Ordering (3-4-2)
cont(A3, A4, A2) = 2bond(A3, A4) + 2bond(A4, A2) – 2bond(A3, A2)
bond(A3, A4) = 16100
bond(A4, A2) = (0 x 70) + (20 x 0) + (20 x 70) + (0 x 70) = 1400
bond(A3, A2) = 2200
cont(A3, A4, A2) = (2 x 16100) + (2 x 1400) – (2 x 2200) = 30600
Ordering (2-4-5)
cont(A2, A4, A5) = 2bond(A2, A4) + 2bond(A4, A5) – 2bond(A2, A5)
bond(A2, A4) = 1400
bond(A4, A5) = 0
bond(A2, A5) = 0
cont(A1, A3, A2) = (2 x 1400) + 0 – 0 = 2800
Since the contribution of the ordering (0-4-1) is largest value, we will place the
A4 in before of A1.
A4 A 1 A3 A2
A1 70 120 70 0
A2 0 0 20 20
A3 70 70 90 20
A4 70 70 70 0
Row Ordering
Finally the rows are organized in the same order as the columns and the result
shown below.
16 | P a g e
A4 A 1 A3 A2
A4 70 70 70 0
A1 70 120 70 0
A3 70 90 20
A2 0 0 20 20
There are 2 potential sets of portioning from the Clustered Affinity Matrix above.
They are depicted in the two matrix below. To decide which one is much better
option we need to compare the efficiency of the 2 options in regards to the
current set of queries. The idea is to maximize the total access to only one
fragment and minimized the total access to both fragments.
(OPTION 1)
A4 A 1 A3 A2
A4 70 70 70 0
A1 70 120 70 0
A3 70 90 20
A2 0 0 20 20
PURCHASE
Q2 = 20
PURCHASE 1 PURCHASE 2
Q1 = 50
Q3 = 70
(OPTION 2)
17 | P a g e
A4 A 1 A3 A2
A4 70 70 70 0
A1 70 120 70 0
A3 70 90 20
A2 0 0 20 20
PURCHASE
Q3 = 70
PURCHASE 1 PURCHASE 2
Q1 = 50 Q2 = 20
From the above illustrations, we can conclude that the option 1 is better
selection than option 2.
v. Result
PURCHASE 1
PO_ID P_ID W_ID
PO_001 P003 W005
PO_002 P001 W004
PO_003 P004 W001
PO_004 P002 W002
PO_005 P001 W005
PURCHASE 2
PO_ID PO_AMOUNT
PO_001 1500
PO_002 3000
PO_003 4000
PO_004 4500
PO_005 1000
18 | P a g e
vi. Checking for Correctness
Completeness:
Guaranteed by the partitioning algorithm, which assigns each attribute of the
global relation is assigned to one of the fragments {PURCHASE1, PURCHASE2}
Reconstruction:
Reconstruction can be achieved by joining the fragmentation
PURCHASE1 PURCHASE2
Disjointness:
The result of fragmentation showing that there is no any overlapping.
Duplicated keys are not considered to be overlapping.
4 ALLOCATION PHASE
This phase is to allocate the fragmentation table based on the fragmentation obtained
from Primary Horizontal Fragmentation for relation (WAREHOUSE_INFO), Derived
Horizontal Fragmentation for relation (STORAGE) and Vertical Fragmentation for
relation (PURCHASE). Figure below illustrates the distribution of relations. The
WAREHOUSE_INFO and STORAGE relations distribution depend on the location of each
site, and the PURCHASE relations distribution depending on the most access frequency.
This distribution of fragments achieves the best utilization and efficiency of data access,
processing, and retrieval and reduces the remote access since each fragment data
location store at the same site location. It also maximizes the local processing as well as
minimizes global processing.
19 | P a g e
SITE 1 SITE 3
NORTH EAST COAST
WAREHOUSE_INFO 1 METRO-E WAREHOUSE_INFO 7
STORAGE 1 Network STORAGE 4
PURCHASE 1 Communicatio
n
SITE 2 SITE 4
SOUTH CENTRAL
WAREHOUSE_INFO 6 WAREHOUSE_INFO 4
STORAGE 3 STORAGE 2
PURCHASE 2
Figure 5: Fragmentation Location
20 | P a g e