You are on page 1of 4

Dictionaries 1/19/2005 11:37 PM

Hash Functions and


Hash Tables (§2.5.2)
A hash function h maps keys of a given type to
Dictionaries and Hash Tables integers in a fixed interval [0, N − 1]
Example:
h(x) = x mod N
0 ∅ is a hash function for integer keys
1 025-612-0001
The integer h(x) is called the hash value of key x
2 981-101-0002
3 ∅
4 451-229-0004 A hash table for a given key type consists of
„ Hash function h

„ Array (called table) of size N

When implementing a dictionary with a hash table,


the goal is to store item (k, o) at index i = h(k)
Dictionaries and Hash Tables 1 Dictionaries and Hash Tables 4

Dictionary ADT (§2.5.1) Example


The dictionary ADT models a Dictionary ADT methods:
searchable collection of key- „ findElement(k): if the We design a hash table for 0 ∅
element items dictionary has an item with a dictionary storing items 1 025-612-0001
The main operations of a key k, returns its element,
(SSN, Name), where SSN 2 981-101-0002
dictionary are searching, else, returns the special 3 ∅
inserting, and deleting items
element NO_SUCH_KEY (social security number) is a 4 451-229-0004
„ insertItem(k, o): inserts item nine-digit positive integer
Multiple items with the same (k, o) into the dictionary


key are allowed „ removeElement(k): if the Our hash table uses an
Applications: dictionary has an item with array of size N = 10,000 and 9997 ∅
address book key k, removes it from the 9998 200-751-9998
the hash function
„

„ credit card authorization dictionary and returns its 9999 ∅


mapping host names (e.g.,
element, else returns the h(x) = last four digits of x
„
special element
cs16.net) to internet addresses NO_SUCH_KEY
(e.g., 128.148.34.101)
„ size(), isEmpty()
„ keys(), elements()
Dictionaries and Hash Tables 2 Dictionaries and Hash Tables 5

Log File (§2.5.1) Hash Functions (§ 2.5.3)


A log file is a dictionary implemented by means of an unsorted
sequence The hash code map is
We store the items of the dictionary in a sequence (based on a
A hash function is
applied first, and the
„
doubly-linked lists or a circular array), in arbitrary order usually specified as the
Performance: compression map is
composition of two
„ insertItem takes O(1) time since we can insert the new item at the applied next on the
beginning or at the end of the sequence functions: result, i.e.,
findElement and removeElement take O(n) time since in the worst
Hash code map:
„
case (the item is not found) we traverse the entire sequence to h(x) = h2(h1(x))
look for an item with the given key
h1: keys → integers The goal of the hash
The log file is effective only for dictionaries of small size or for
function is to
dictionaries on which insertions are the most common Compression map:
operations, while searches and removals are rarely performed “disperse” the keys in
(e.g., historical record of logins to a workstation) h2: integers → [0, N − 1] an apparently random
way
Dictionaries and Hash Tables 3 Dictionaries and Hash Tables 6

1
Dictionaries 1/19/2005 11:37 PM

Collision Handling
Hash Code Maps (§2.5.3) (§ 2.5.5)
Memory address: Component sum:
„ We reinterpret the memory „ We partition the bits of Collisions occur when 0 ∅
address of the key object as the key into components 1 025-612-0001
an integer (default hash code different elements are 2 ∅
of fixed length (e.g., 16
of all Java objects)
or 32 bits) and we sum mapped to the same 3 ∅
Good in general, except for 4
„
numeric and string keys
the components cell 451-229-0004 981-101-0004

(ignoring overflows)
Integer cast: „ Suitable for numeric keys Chaining: let each
„ We reinterpret the bits of the of fixed length greater cell in the table point Chaining is simple,
key as an integer than or equal to the
Suitable for keys of length to a linked list of but requires
„
number of bits of the
less than or equal to the integer type (e.g., long elements that map additional memory
number of bits of the integer
type (e.g., byte, short, int
and double in Java) there outside the table
and float in Java)
Dictionaries and Hash Tables 7 Dictionaries and Hash Tables 10

Hash Code Maps (cont.) Linear Probing (§2.5.5)


Open addressing: the
Polynomial accumulation: Polynomial p(z) can be colliding item is placed in a
Example:
„ We partition the bits of the evaluated in O(n) time different cell of the table „ h(x) = x mod 13
key into a sequence of
components of fixed length
using Horner’s rule: Linear probing handles
„ Insert keys 18, 41,
The following collisions by placing the
(e.g., 8, 16 or 32 bits) „
22, 44, 59, 32, 31,
colliding item in the next
a0 a1 … an−1 polynomials are
successively computed,
(circularly) available table cell 73, in this order
„ We evaluate the polynomial
Each table cell inspected is
p(z) = a0 + a1 z + a2 z2 + … each from the previous
referred to as a “probe”
… + an−1zn−1 one in O(1) time
Colliding items lump together,
at a fixed value z, ignoring p0(z) = an−1 causing future collisions to 0 1 2 3 4 5 6 7 8 9 10 11 12
overflows pi (z) = an−i−1 + zpi−1(z) cause a longer sequence of
„ Especially suitable for strings (i = 1, 2, …, n −1) probes
(e.g., the choice z = 33 gives 41 18 44 59 32 22 31 73
at most 6 collisions on a set We have p(z) = pn−1(z) 0 1 2 3 4 5 6 7 8 9 10 11 12
of 50,000 English words)
Dictionaries and Hash Tables 8 Dictionaries and Hash Tables 11

Compression
Maps (§2.5.4) Search with Linear Probing
Consider a hash table A Algorithm findElement(k)
Division: Multiply, Add and that uses linear probing
i ← h(k)
p←0
„ h2 (y) = y mod N Divide (MAD): findElement(k) repeat
„ The size N of the „ h2 (y) = (ay + b) mod N „ We start at cell h(k) c ← A[i]
hash table is usually if c = ∅
„ a and b are „ We probe consecutive
return NO_SUCH_KEY
chosen to be a prime nonnegative integers locations until one of the
else if c.key () = k
following occurs
„ The reason has to do such that return c.element()
Š An item with key k is
with number theory a mod N ≠ 0 found, or
else
i ← (i + 1) mod N
and is beyond the „ Otherwise, every Š An empty cell is found,
p←p+1
scope of this course integer would map to or
until p = N
Š N cells have been
the same value b unsuccessfully probed
return NO_SUCH_KEY

Dictionaries and Hash Tables 9 Dictionaries and Hash Tables 12

2
Dictionaries 1/19/2005 11:37 PM

Performance of
Updates with Linear Probing Hashing
To handle insertions and insert Item(k, o) In the worst case, searches,
deletions, we introduce a insertions and removals on a The expected running
We throw an exception
special object, called
„
if the table is full hash table take O(n) time time of all the dictionary
AVAILABLE, which replaces „ We start at cell h(k) The worst case occurs when ADT operations in a
deleted elements all the keys inserted into the hash table is O(1)
„ We probe consecutive dictionary collide
removeElement(k) cells until one of the In practice, hashing is
The load factor α = n/N
„ We search for an item with following occurs
affects the performance of a
very fast provided the
key k Š A cell i is found that is
hash table load factor is not close
either empty or stores
„ If such an item (k, o) is AVAILABLE, or Assuming that the hash to 100%
found, we replace it with the values are like random Applications of hash
Š N cells have been
special item AVAILABLE numbers, it can be shown
and we return element o
unsuccessfully probed tables:
We store item (k, o) in that the expected number of
„ „ small databases
„ Else, we return cell i probes for an insertion with
open addressing is „ compilers
NO_SUCH_KEY
1 / (1 − α) „ browser caches
Dictionaries and Hash Tables 13 Dictionaries and Hash Tables 16

Universal Hashing
Double Hashing (§ 2.5.6)
Double hashing uses a
secondary hash function Common choice of
d(k) and handles compression map for the A family of hash functions
collisions by placing an secondary hash function: is universal if, for any
item in the first available d2(k) = q − k mod q 0<i,j<M-1,
cell of the series Theorem: The set of
where Pr(h(j)=h(k)) < 1/N.
(i + jd(k)) mod N all functions, h, as
for j = 0, 1, … , N − 1
q<N Choose p as a prime
defined here, is
„

„ q is a prime between M and 2M.


The secondary hash universal.
function d(k) cannot The possible values for Randomly select 0<a<p
have zero values d2(k) are and 0<b<p, and define
The table size N must be 1, 2, … , q h(k)=(ak+b mod p) mod N
a prime to allow probing
of all the cells
Dictionaries and Hash Tables 14 Dictionaries and Hash Tables 17

Example of Double Hashing Proof of Universality (Part 1)


k h (k ) d (k ) Probes Let f(k) = ak+b mod p
Consider a hash So a(j-k) is a multiple of p
18 5 3 5
Let g(k) = k mod N
table storing integer 41 2 1 2 But both are less than p
keys that handles 22 9 6 9 So h(k) = g(f(k)).
44 5 5 5 10 So a(j-k) = 0. I.e., j=k.
collision with double 59 7 4 7 f causes no collisions: (contradiction)
32 6 3 6
hashing 31 5 4 5 9 0 „ Let f(k) = f(j). Thus, f causes no collisions.
„ N = 13 73 8 4 8
„ Suppose k<j. Then
„ h(k) = k mod 13
d(k) = 7 − k mod 7 ⎢ aj + b ⎥ ⎢ ak + b ⎥
„
0 1 2 3 4 5 6 7 8 9 10 11 12 aj + b − ⎢ ⎥ p = ak + b − ⎢ p ⎥ p
Insert keys 18, 41, ⎣ p ⎦ ⎣ ⎦
22, 44, 59, 32, 31, ⎛ ⎢ aj + b ⎥ ⎢ ak + b ⎥ ⎞
73, in this order 31 41 18 32 59 73 22 44 a ( j − k ) = ⎜⎜ ⎢ ⎥−⎢ ⎥ ⎟⎟ p
0 1 2 3 4 5 6 7 8 9 10 11 12 ⎝ ⎣ p ⎦ ⎣ p ⎦⎠
Dictionaries and Hash Tables 15 Dictionaries and Hash Tables 18

3
Dictionaries 1/19/2005 11:37 PM

Proof of Universality (Part 2)


If f causes no collisions, only g can make h cause
collisions.
Fix a number x. Of the p integers y=f(k), different from x,
the number such that g(y)=g(x) is at most ⎡p / N ⎤ −1
Since there are p choices for x, the number of h’s that will
cause a collision between j and k is at most
p( p − 1)
p (⎡ p / N ⎤ − 1) ≤
N
There are p(p-1) functions h. So probability of collision is
at most p( p − 1) / N 1
=
p ( p − 1) N
Therefore, the set of possible h functions is universal.
Dictionaries and Hash Tables 19

You might also like