You are on page 1of 20

MINI PROJECT

DESIGNING A
DISTRIBUTED
DATABASE FOR A
SUPPLY CHAIN
MANAGEMENT
SYSTEM
SAK5310 DISTRIBUTED DATABASE

Mohammad Fairus Khalid (GS27814), Zunaidi Abdullah (GS27994),


Kumaraveel Ganeson (GS28253)
Table of Contents
1 INTRODUCTION...............................................................................................................3
2 DATABASE INFOMATION................................................................................................3
2.1 Relation Schemas........................................................................................................4
2.2 Instances of the Relations...........................................................................................5
2.3 Expression of Relationships among Relations using links............................................6
2.4 Lists of Queries that need to perform in this project..................................................7
2.5 ASSUMPTIONS............................................................................................................7
3 FRAGMENTATION PHASE................................................................................................8
3.1 Primary Horizontal Fragmentation..............................................................................8
3.2 Derived Horizontal Fragmentation............................................................................10
3.3 Vertical Fragmentation.............................................................................................12
4 ALLOCATION PHASE......................................................................................................20

List of Figures
Figure 1: Entity Relationship Diagram......................................................................................4
Figure 2: Attribute Usage Matrix............................................................................................13
Figure 3: Application Access Frequencies Matrix...................................................................14
Figure 4: Attribute Affinity Matrix..........................................................................................14
Figure 5: Fragmentation Location..........................................................................................20

2|Page
1 INTRODUCTION
A company by the name ACE Sdn. Bhd. is the main supplier of copper based wire cable in
Peninsular Malaysia. It has several warehouses strategically located at northern, east cost,
central and southern regions. Each of the warehouses stores wire cable of different type based
on the size of copper core diameter. ACE Sdn. Bhd. main office which is in Muar and the regional
management offices have to make sure the warehouses has enough supply of wire cable to
meet demand from the local market. The quantity of product in-store, the purchasing of
additional product, the storage capacity and the warehouse capital expenditure need to be
consider before any transaction can be taken place. Timely decision is the critical factor in their
day-to-day operations.
In order to cope with this complex process the company has decided to employ a new
Supply Chain Management System with Distributed Database Management System. The
following is the set of requirements for the company new system.

a. A product information that keep track of product type, product specification


and price per 100 meters.

b. A warehouse information that keep track the warehouse name, address,


covering region, floor capacity based on cubic meter and capital expenditure for making
purchase.

c. A storage table that keep track the quantity of the products based on length of
the cable in meter being stored in each of the warehouses.

d. A purchase order that keep track the purchase made by each of the warehouses
and the quantity based on the length of the cable in meter.

2 DATABASE INFOMATION

Detail information regarding the database for the supply chain management system i.e. The
Entity-relationship diagram, relation schemas, instances of the relations and the expression of
relationships among relations using links are presented below.

3|Page
Figure 1: Entity Relationship Diagram

2.1 Relation Schemas


PRODUCT_INFO
No Fields Type Key Comments
1 P_ID varchar(10) PK Product info identifier
2 P_TYPE varchar(10) Product type
3 P_SPEC Integer Copper core size
4 P_PRICE Currency Price per 100 meter

WAREHOUSE_INFO
No Fields Type Key Comments
1 W_ID varchar(10) PK Warehouse info identifier
2 W_NAME varchar(10) Warehouse name
3 W_ADDR varchar(100) address
4 W_REGION varchar(15) region
5 W_CAPACITY Integer Floor capacity in cubic meter
6 W_CAPITAL Currency Capital expenditure

4|Page
PURCHASE
No Fields Type Key Comments
1 PO_ID varchar(10) PK Purchase order identifier
2 PO_AMOUNT integer Length of cable in meter
3 W_ID varchar(10) FK Warehouse id
4 P_ID varchar(10) FK Product id

STORAGE
No Fields Type Key Comments
1 W_ID varchar(10) PK,FK Warehouse id
2 P_ID varchar(10) PK,FK Product id
3 S_AMOUNT integer Quantity of cable in meter

2.2 Instances of the Relations


PRODUCT_INFO
P_ID P_TYPE P_SPEC P_PRICE
P001 Cat A 5 100
P002 Cat B 4 80
P003 Cat C 3 60
P004 Cat D 2 40

WAREHOUSE_INFO
W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL
W001 Kulim 01 Kulim, Kedah NORTH 1000 1,000,000
W002 Kulim 02 Kulim, Kedah NORTH 1000 1,000,000
W003 Kajang 01 Kajang, CENTRAL 3000 5,000,000
Selangor
W004 Muar 01 Muar, Johor SOUTH 4000 10,000,000
W005 Kuantan 01 Kuantan, EAST COAST 2500 2,000,000
Pahang

PURCHASE
PO_ID PO_AMOUNT P_ID W_ID
PO_001 1500 P003 W005
PO_002 3000 P001 W004
PO_003 4000 P004 W001
PO_004 4500 P002 W002
PO_005 1000 P001 W005

STORAGE
W_ID P_ID S_AMOUNT
W001 P001 2500

5|Page
W001 P002 2500
W001 P003 2500
W002 P004 5000
W003 P001 5000
W003 P002 2000
W003 P003 3000
W004 P001 5000
W004 P002 5000
W004 P003 5000
W005 P002 3000
W005 P003 3000
W005 P004 3000

2.3 Expression of Relationships among Relations using links


PRODUCT_INFO
P_ID P_TYPE P_SPEC P_PRICE

L1
WAREHOUSE_INFO
W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL

L2
PURCHASE
PO_ID PO_AMOUNT PO_DATE P_ID W_ID
L4 L3
STORAGE
W_ID P_ID S_AMOUNT

2.4 Lists of Queries that need to perform in this project

(Q1) SELECT *
FROM STORAGE

6|Page
(Q2) SELECT *
FROM PURCHASE

(Q3) SELECT *
FROM WAREHOUSE_INFO
WHERE W_REGION = “value”

(Q4) SELECT *
FROM WAREHOUSE_INFO
WHERE W_CAPACITY < 3000

(Q5) SELECT SUM(W_CAPITAL)


FROM WAREHOUSE_INFO
WHERE W_REGION = “value”

(Q6) SELECT PO_ID


FROM PURCHASE

(Q7) SELECT SUM (PO_AMOUNT)


FROM PURCHASE
WHERE W_ID=”Value”

(Q8) SELECT P_ID, PO_ID


FROM PURCHASE
WHERE W_ID=”Value”

2.5 ASSUMPTIONS
The following assumptions are applied in designing the distributed Database System for
Supply Chain Management System of ACE Sdn. Bhd. databases. This assumption will help us to
understand the data distribution in various locations.

i. The distributed DBMS software exists at each site where the data are stored.
ii. Each of DBMS is shared across in each site and at the same time they also have local
autonomy.
iii. There are four (4) sites for data distribution that located in each regions i.e Kulim,
Kajang, Muar and Kuantan. Muar acts as a head quarter.
iv. The company is using dedicated Metro-e network to connect all the sites. The database
and application has been designed to perform distribution at each site.

7|Page
3 FRAGMENTATION PHASE
For the purpose of this project we will be focusing on 1 example for each fragmentation
strategies (Hybrid fragmentation is excluded in this exercise).

3.1 Primary Horizontal Fragmentation


Requirements:
i. The target fragmentation is WAREHOUSE_INFO table.
ii. There are 2 applications trying to access this table. The first application wants to access
information according to the region where the warehouses are located. The second
application wants to access information for warehouse with floor capacity less than
3000 cubic meter.

Primary Horizontal Fragmentation Steps:


i. List of simple predicates based on the above requirements

P1: W_REGION = “NORTH”


P2: W_REGION = “CENTRAL”
P3: W_REGION = “SOUTH”
P4: W_REGION = “EAST COAST”
P5: W_CAPACITY < 3000
P6: W_CAPACITY >= 3000

ii. List of minterm predicates


Notes: this base on semantics of the database not on the current value

(Meaningful minterm predicates)


M1: W_REGION = “NORTH” ^ W_CAPACITY < 3000
M2: W_REGION = “NORTH” ^ W_CAPACITY >= 3000
M3: W_REGION = “CENTRAL” ^ W_CAPACITY < 3000
M4: W_REGION = “CENTRAL” ^ W_CAPACITY >= 3000
M5: W_REGION = “SOUTH” ^ W_CAPACITY < 3000
M6: W_REGION = “SOUTH” ^ W_CAPACITY >= 3000
M7: W_REGION = “EAST COAST” ^ W_CAPACITY < 3000
M8: W_REGION = “EAST COAST” ^ W_CAPACITY >= 3000

(Meaningless minterm predicates)


M9: P1 ^ P2

8|Page
M10: P1 ^ P2 ^ P3
etc.

iii. Results
WAREHOUSE_INFO1 = σM1 (WAREHOUSE_INFO)
W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL
W001 Kulim 01 Kulim, Kedah NORTH 1000 1,000,000
W002 Kulim 02 Kulim, Kedah NORTH 1000 1,000,000

WAREHOUSE_INFO2 = σM2 (WAREHOUSE_INFO) – empty table


W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL

WAREHOUSE_INFO3 = σM3 (WAREHOUSE_INFO) – empty table


W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL

WAREHOUSE_INFO4 = σM4 (WAREHOUSE_INFO)


W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL
W003 Kajang 01 Kajang, CENTRAL 3000 5,000,000
Selangor

WAREHOUSE_INFO5 = σM5 (WAREHOUSE_INFO) - empty table


W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL

WAREHOUSE_INFO6 = σM6 (WAREHOUSE_INFO)


W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL
W004 Muar 01 Muar, Johor SOUTH 4000 10,000,000

WAREHOUSE_INFO7 = σM7 (WAREHOUSE_INFO)


W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL
W005 Kuantan 01 Kuantan, EAST COAST 2500 2,000,000
Pahang

WAREHOUSE_INFO8 = σM8 (WAREHOUSE_INFO) – empty table


W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL

iv. Checking for Correctness


NOTES: Now on forward the empty table are eliminated from our equation

9|Page
Completeness:
 The resulting fragmentation above satisfied based on the selection Predicates are
complete, and is guarantee to be completed because no information missing.
 All tuples in the original relation WAREHOUSE_INFO can be found in the resulting
relations {WAREHOUSE_INFO1, WAREHOUSE_INFO4, WAREHOUSE_INFO6,
WAREHOUSE_INFO7}.
 The resulting fragmentation is complete.

Reconstruction:
 Original global relation WAREHOUSE can be reconstructed by the union operator on
the resulting fragmentation,
F WAREHOUSE = { WAREHOUSE_INFO1
∪WAREHOUSE_INFO4∪WAREHOUSE_INFO6∪WAREHOUSE_INFO7 }

WAREHOUSE_INFO = ∪ WAREHOUSE_INFOi ∀WAREHOUSE_INFO ∈ FWAREHOUSE_INFO

Disjointness:
 The resulting fragmentation above showing that no over lapping between data
tuple can’t be found.

3.2 Derived Horizontal Fragmentation


Requirements:
i. The target derived horizontal fragmentation is STORAGE table.
ii. There is 1 application trying to access this table. The application wants to find the
product type and quantity or amount of product at certain regions. This is useful to
determine each site have enough supply of wire cable.

Derived Horizontal Fragmentation Steps:


i. Based on the application requirement above we will need to derive the fragmentation
of STORAGE with respect to WAREHOUSE_INFO.

WAREHOUSE_INFO
W_ID W_NAME W_ADDR W_REGION W_CAPACITY W_CAPITAL

L4
STORAGE
W_ID P_ID S_AMOUNT

Owner L4 : WAREHOUSE_INFO
MEMBER L4 : STORAGE

10 | P a g e
ii. The application results in a fragmentation of STORAGE according to the fragmentations
WAREHOUSE_INFO1, WAREHOUSE_INFO4, WAREHOUSE_INFO6, WAREHOUSE_INFO7
obtained earlier. Therefore the derived fragmentation of STORAGE is defined as follows:

STORAGE1 = STORAGE WAREHOUSE_INFO1


STORAGE2 = STORAGE WAREHOUSE_INFO4
STORAGE3 = STORAGE WAREHOUSE_INFO6
STORAGE4 = STORAGE WAREHOUSE_INFO7

iii. Result

STORAGE1
W_ID P_ID S_AMOUNT
W001 P001 2500
W001 P002 2500
W001 P003 2500
W002 P004 5000

STORAGE2
W_ID P_ID S_AMOUNT
W003 P001 5000
W003 P002 2000
W003 P003 3000

STORAGE3
W_ID P_ID S_AMOUNT
W004 P001 5000
W004 P002 5000
W004 P003 5000

STORAGE4
W_ID P_ID S_AMOUNT
W005 P002 3000
W005 P003 3000
W005 P004 3000

iv. Checking for Correctness

Completeness:
 The resulting fragmentation is complete. All tuples in the original relation
STORAGE can be found in the resulting relations = {STORAGE1, STORAGE2,
STORAGE3, STORAGE4}.

11 | P a g e
Reconstruction:
 Original global relation STORAGE can be reconstructed by the union operator on
the resulting fragmentation,
FSTORAGE = {STORAGE1∪STORAGE2∪STORAGE3∪STORAGE4},
STORAGE = ∪ STORAGEi , ∀ STORAGE∈ FSTORAGE
Disjointness:
 The resulting fragmentation showing the join graph below it is simple.

STORAGE1 WAREHOUSE_INFO1
STORAGE2 WAREHOUSE_INFO4
STORAGE3 WAREHOUSE_INFO6
STORAGE4 WAREHOUSE_INFO7

3.3 Vertical Fragmentation


Requirements:
i. The target fragmentation is PURCHASE table.
ii. The application accessing the table need to make several queries below:

Q1: List out all the purchase order


SELECT PO_ID
FROM PURCHASE

Q2: Find amount of order made by a particular warehouse


SELECT SUM (PO_AMOUNT)
FROM PURCHASE
WHERE W_ID=”Value”

Q3: Find all the product and the purchase order being purchased by a particular
warehouse
SELECT P_ID, PO_ID
FROM PURCHASE
WHERE W_ID=”Value”

Access frequencies from all 4 regions are as follow:


Site 1: North region
Q1 = 10 , Q2 = 5, Q3 = 25

Site 2: Central region

12 | P a g e
Q1 = 20, Q2 = 0, Q3 = 20

Site 3: South region


Q1 = 10, Q2 = 10, Q3 = 0

Site 4: East Coast region


Q1 = 10, Q2 = 5, Q3 = 25

Vertical Fragmentation Steps:


i. First we need to find attribute usage values base on the information provided above. As
notational convenience, we let A1 = PO_ID, A2 = PO_AMOUNT, A3 = W_ID and A4 =
P_ID. The usage values are defined in matrix form below, where entry (i, j) denotes
use(Qi, Aj)

A1 A 2 A 3 A 4
Q1 1 0 0 0

Q2 0 1 1 0

Q3 1 0 1 1
Figure 2: Attribute Usage Matrix

ii. The sites access values to the queries are defined in a matrix called Application Access
Frequencies Matrix, where entry (i,j) denotes access (Qi,Sj):

S1 S2 S3 S4
Q1 10 20 10 10 = 50

Q2 5 0 10 5 = 20

Q3 25 20 0 25 = 70
Figure 3: Application Access Frequencies Matrix
iii. The Attribute Affinity Matrix, which shows how many times each (i,j) attribute
entry are accessed together, is calculated as follow: Based on the:

We can calculate the affinity matrix AA. Following matrix is the result of the
calculation.

13 | P a g e
A1 A2 A3 A4
A1 120 0 70 70

A2 0 20 20 0

A3 70 20 90 70

A4 70 0 70 70

Figure 4: Attribute Affinity Matrix

iv. Next is to calculate the Clustered Affinity (CA) Matrix, three steps must be done:
the first is the Initialization, the second is the Iteration, and the third is the Row
Ordering

Initialization
According to the initialization step, we copy columns 1 and 2 of the Attribute
Affinity Matrix to the Clustered Affinity (CA) Matrix.

A1 A2
A1 120 0

A2 0 20

A3 70 20

A4 70 0

Iteration
Base on initialization above, we need to decide where the best placement for
column 3 and 4. For this purpose we will use Bond Energy Algorithm (BEA) which
stated as follows:

cont(Ai, Ak, Aj) = 2bond(Ai, Ak) + 2bond(Ak, Aj) – 2bond(Ai, Aj)

For Column 3:
Ordering (0-3-1)
cont(A0, A3, A1) = 2bond(A0, A3) + 2bond(A3, A1) – 2bond(A0, A1)

14 | P a g e
bond(A0, A3) = bond(A0, A1) = 0
bond(A3, A1) = (120 x 70) + (0 x 20) + (70 x 90) + (70 x 70) = 19600
cont(A0, A3, A1) = (2 x 0) + (2 x 19600) – (2 x 0) = 39200

Ordering (1-3-2)
cont(A1, A3, A2) = 2bond(A1, A3) + 2bond(A3, A2) – 2bond(A1, A2)
bond(A1, A3) = 19600
bond(A3, A2) = (0 x 70) + (20 x 20) + (20 + 90) + (0 x 70) = 2200
bond(A1, A2) = (120 x 0) + (0 x 20) + (70 x 20) + (70 x 0) = 1400
cont(A1, A3, A2) = (2 x 19600) + (2 x 2200) – (2 x 1400) = 40800

Ordering (2-3-4)
cont(A2, A3, A4) = 2bond(A2, A3) + 2bond(A3, A4) – 2bond(A2, A4)
bond(A2, A3) = 2200
bond(A3, A4) = 0
bond(A2, A4) = 0
cont(A1, A3, A2) = (2 x 2200) + 0 – 0 = 4400

Since the contribution of the ordering (1-3-2) is the largest, we will place the A 3
in between A1 and A2.

A1 A 3 A2
A1 120 70 0

A2 0 20 20

A3 70 90 20

A4 70 70 0

For Column 4:
Ordering (0-4-1)
cont(A0, A4, A1) = 2bond(A0, A4) + 2bond(A4, A1) – 2bond(A0, A1)
bond(A0, A4) = bond(A0, A1) = 0
bond(A4, A1) = (120 x 70) + (0 x 0) + (70 x 70) + (70 x 70) = 18200
cont(A0, A4, A1) = (2 x 0) + (2 x 18200) – (2 x 0) = 36400

Ordering (1-4-3)
cont(A1, A4, A3) = 2bond(A1, A4) + 2bond(A4, A3) – 2bond(A1, A3)
bond(A1, A4) = 18200
bond(A4, A3) = (70 x 70) + (20 x 0) + (90 x 70) + (70 x 70) = 16100

15 | P a g e
bond(A1, A3) = 19600
cont(A1, A4, A3) = (2 x 18200) + (2 x 16100) – (2 x 19600) = 29400

Ordering (3-4-2)
cont(A3, A4, A2) = 2bond(A3, A4) + 2bond(A4, A2) – 2bond(A3, A2)
bond(A3, A4) = 16100
bond(A4, A2) = (0 x 70) + (20 x 0) + (20 x 70) + (0 x 70) = 1400
bond(A3, A2) = 2200
cont(A3, A4, A2) = (2 x 16100) + (2 x 1400) – (2 x 2200) = 30600

Ordering (2-4-5)
cont(A2, A4, A5) = 2bond(A2, A4) + 2bond(A4, A5) – 2bond(A2, A5)
bond(A2, A4) = 1400
bond(A4, A5) = 0
bond(A2, A5) = 0
cont(A1, A3, A2) = (2 x 1400) + 0 – 0 = 2800

Since the contribution of the ordering (0-4-1) is largest value, we will place the
A4 in before of A1.

A4 A 1 A3 A2
A1 70 120 70 0

A2 0 0 20 20

A3 70 70 90 20

A4 70 70 70 0

Row Ordering
Finally the rows are organized in the same order as the columns and the result
shown below.

16 | P a g e
A4 A 1 A3 A2
A4 70 70 70 0

A1 70 120 70 0

A3 70 90 20

A2 0 0 20 20

There are 2 potential sets of portioning from the Clustered Affinity Matrix above.
They are depicted in the two matrix below. To decide which one is much better
option we need to compare the efficiency of the 2 options in regards to the
current set of queries. The idea is to maximize the total access to only one
fragment and minimized the total access to both fragments.
(OPTION 1)
A4 A 1 A3 A2
A4 70 70 70 0

A1 70 120 70 0

A3 70 90 20

A2 0 0 20 20

PURCHASE

Q2 = 20

PURCHASE 1 PURCHASE 2

Q1 = 50
Q3 = 70

(OPTION 2)

17 | P a g e
A4 A 1 A3 A2
A4 70 70 70 0

A1 70 120 70 0

A3 70 90 20

A2 0 0 20 20

PURCHASE

Q3 = 70

PURCHASE 1 PURCHASE 2

Q1 = 50 Q2 = 20

OPTION 1 = 120 x 1 fragment; 20 x 2 fragments


OPTION 2 = 70 x 1 fragment; 70 x 2 fragments

From the above illustrations, we can conclude that the option 1 is better
selection than option 2.

v. Result
PURCHASE 1
PO_ID P_ID W_ID
PO_001 P003 W005
PO_002 P001 W004
PO_003 P004 W001
PO_004 P002 W002
PO_005 P001 W005

PURCHASE 2
PO_ID PO_AMOUNT
PO_001 1500
PO_002 3000
PO_003 4000
PO_004 4500
PO_005 1000

18 | P a g e
vi. Checking for Correctness
Completeness:
 Guaranteed by the partitioning algorithm, which assigns each attribute of the
global relation is assigned to one of the fragments {PURCHASE1, PURCHASE2}
Reconstruction:
Reconstruction can be achieved by joining the fragmentation
PURCHASE1 PURCHASE2
Disjointness:
 The result of fragmentation showing that there is no any overlapping.
Duplicated keys are not considered to be overlapping.

4 ALLOCATION PHASE
This phase is to allocate the fragmentation table based on the fragmentation obtained
from Primary Horizontal Fragmentation for relation (WAREHOUSE_INFO), Derived
Horizontal Fragmentation for relation (STORAGE) and Vertical Fragmentation for
relation (PURCHASE). Figure below illustrates the distribution of relations. The
WAREHOUSE_INFO and STORAGE relations distribution depend on the location of each
site, and the PURCHASE relations distribution depending on the most access frequency.
This distribution of fragments achieves the best utilization and efficiency of data access,
processing, and retrieval and reduces the remote access since each fragment data
location store at the same site location. It also maximizes the local processing as well as
minimizes global processing.

19 | P a g e
SITE 1 SITE 3
NORTH EAST COAST
WAREHOUSE_INFO 1 METRO-E WAREHOUSE_INFO 7
STORAGE 1 Network STORAGE 4
PURCHASE 1 Communicatio
n

SITE 2 SITE 4
SOUTH CENTRAL
WAREHOUSE_INFO 6 WAREHOUSE_INFO 4
STORAGE 3 STORAGE 2
PURCHASE 2
Figure 5: Fragmentation Location

20 | P a g e

You might also like