Professional Documents
Culture Documents
Spring 2018
1 External Memory
Motivation
External Dictionaries
2-3 Trees
(a, b)-Trees
B-Trees
External sorting
1 External Memory
Motivation
External Dictionaries
2-3 Trees
(a, b)-Trees
B-Trees
External sorting
Current architectures:
registers (very fast, very small)
cache L1, L2 (still fast, less small)
main memory
external memory: disk or cloud (slow, very large)
1 External Memory
Motivation
External Dictionaries
2-3 Trees
(a, b)-Trees
B-Trees
External sorting
1 External Memory
Motivation
External Dictionaries
2-3 Trees
(a, b)-Trees
B-Trees
External sorting
(keys < k) (keys > k) (keys < k1 ) (keys > k1 and < k2 ) (keys < k1 )
23TreeSearch(k, v ← root)
1. Let c0 , k1 , . . . , kd , cd be keys and children at v , in order
2. if k ≥ k1
3. i ← maximal index such that ki ≤ k
4. if ki = k return ki
5. else i ← 0
6. 23TreeSearch(k, ci )
18 31 36 51
12 21 24 28 33 39 42 48 56 62
NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
18 31 36 51
12 21 24 28 33 39 42 48 56 62
NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
18 31 36 51
12 19 21 24 28 33 39 42 48 56 62
NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
18 21 31 36 51
12 19 24 28 33 39 42 48 56 62
NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
18 21 31 36 51
12 19 24 28 33 39 42 48 56 62
18 21 31 36 51
12 19 24 28 33 39 41 42 48 56 62
18 21 31 36 41 51
12 19 24 28 33 39 42 48 56 62
18 21 31 41 51
12 19 24 28 33 39 42 48 56 62
25 43
18 21 31 41 51
12 19 24 28 33 39 42 48 56 62
(NIL-nodes not shown to simplify picture)
As with BSTs and AVL trees, we first swap the KVP k with its
successor, so that it is now at a leaf `.
Delete k and one NIL-child from `.
If ` now has 0 keys (underflow )
I If ` is the root, simply delete it. Else let p be the parent of `.
I If some immediate sibling u is a 2-node, perform a transfer :
F Find the key kp in p that is between keys of ` and u.
F “Rotate:” move kp into `, move adjacent KVP from u into p, and
re-arrange children suitably.
I Otherwise, we merge ` and a 1-node sibling u:
F Find the key kp in parent p between keys of ` and u.
F Combine ` and u into one node and move kp into it.
F Recurse in p if it now has underflow.
36
25 43
18 21 31 41 51
12 19 24 28 33 39 42 48 56 62
36
25 48
18 21 31 41 51
12 19 24 28 33 39 42 56 62
36
25 48
18 21 31 41 56
12 19 24 28 33 39 42 51 62
36
25 48
18 21 31 41 56
12 19 24 28 33 39 42 51 62
36
25 48
18 31 41 56
12 21 24 28 33 39 42 51 62
36
25 48
18 31 41 56
12 21 24 28 33 39 42 51 62
36
25 48
18 31 41 56
12 21 24 28 33 39 42 51 62
36
25 48
18 31 56
12 21 24 28 33 39 41 51 62
36
25
18 31 48 56
12 21 24 28 33 39 41 51 62
25 36
18 31 48 56
12 21 24 28 33 39 41 51 62
25 36
18 31 48 56
12 21 24 28 33 39 41 51 62
1 External Memory
Motivation
External Dictionaries
2-3 Trees
(a, b)-Trees
B-Trees
External sorting
If a ≥ b/2, then search, insert, delete work just like for 2-3 trees, after
re-defining underflow/overflow to consider the above constraints.
25
14 20 26 38 44 50 56
10 12 16 18 22 24 28 30 32 34 36 40 42 46 48 52 54 58 60
NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL
h−1
X
Total: n ≥ 1 + 2(a − 1) ai = 2ah − 1
i=0
so height of tree with n KVPs is O(loga (n)) = O log n/ log a .
Can also prove height is Ω(logb (n)).
Biniaz, Jamshidpey, Schost (SCS, UW) CS240 – Module 11 Spring 2018 12 / 18
Outline
1 External Memory
Motivation
External Dictionaries
2-3 Trees
(a, b)-Trees
B-Trees
External sorting
log n
Total cost is O · (log m) = O(log n).
log m
This is no better than 2-3-trees or AVL-trees.
Biniaz, Jamshidpey, Schost (SCS, UW) CS240 – Module 11 Spring 2018 13 / 18
Dictionaries in external memory
Recall: In an AVL tree or 2-3 tree, Θ(log n) pages are loaded in the worst
case.
Instead, use a B-tree of order m, where m is chosen so that an m-node fits
into a single page.
Each operation can be done with Θ(height) page loads.
The height of a B-tree is Θ log n/ log m ; this results in large savings of
page loads.
1 External Memory
Motivation
External Dictionaries
2-3 Trees
(a, b)-Trees
B-Trees
External sorting
d-Way-Merge(S1 , . . . , Sd )
S1 , . . . , Sd are sorted sets (arrays/lists/stacks/queues)
1. P ← empty min-priority queue
2. S ← empty set
3. for i ← 1 to d do
4. P.insert((first element of Si ,i))
5. while P is not empty do
6. (x , i) ← deleteMin(P)
7. remove x from Si and append it to S
8. if Si is not empty do
9. P.insert((first element of Si ,i))
Internal (M = 8):
Internal (M = 8):
39 5 28 22 10 33 29 37
Internal (M = 8):
5 10 22 28 29 33 37 39
sorted run
Internal (M = 8):
sorted run sorted run sorted run sorted run sorted run sorted run
Internal (M = 8):
sorted run sorted run sorted run sorted run sorted run sorted run
Internal (M = 8):
5 10 8 21 11 12 (priority queue not shown)
S1 S2 S3 S
sorted run sorted run sorted run sorted run sorted run sorted run
Internal (M = 8):
10 8 21 11 12 5 (priority queue not shown)
S1 S2 S3 S
sorted run sorted run sorted run sorted run sorted run sorted run
Internal (M = 8):
10 21 11 12 5 8 (priority queue not shown)
S1 S2 S3 S
sorted run sorted run sorted run sorted run sorted run sorted run
5 8
Internal (M = 8):
10 21 11 12 (priority queue not shown)
S1 S2 S3 S
sorted run sorted run sorted run sorted run sorted run sorted run
5 8
Internal (M = 8):
21 11 12 10 (priority queue not shown)
S1 S2 S3 S
sorted run sorted run sorted run sorted run sorted run sorted run
5 8
Internal (M = 8):
22 28 21 11 12 10 (priority queue not shown)
S1 S2 S3 S
sorted run sorted run sorted run sorted run sorted run sorted run
5 8
Internal (M = 8):
22 28 21 12 10 11 (priority queue not shown)
S1 S2 S3 S
sorted run sorted run sorted run sorted run sorted run sorted run
5 8 10 11
Internal (M = 8):
22 28 21 12 (priority queue not shown)
S1 S2 S3 S
5 8 10 11 12 13 21 22 28 29 30 31 33 35 36 37 39 40 42 45 49 52 53 54
sorted run
Internal (M = 8):
(priority queue not shown)
S1 S2 S3 S
5 8 10 11 12 13 21 22 28 29 30 31 33 35 36 37 39 40 42 45 49 52 53 54 1 2 3 4 6 7 9 14 15 16 17 18 20 23 24 26 27 32 43 44 46 47 48 50
Internal (M = 8):
(priority queue not shown)
S1 S2 S3 S
sorted run
5 8 10 11 12 13 21 22 28 29 30 31 33 35 36 37 39 40 42 45 49 52 53 54 1 2 3 4 6 7 9 14 15 16 17 18 20 23 24 26 27 32 43 44 46 47 48 50
Internal (M = 8):
(priority queue not shown)
S1 S2 S3 S