You are on page 1of 33

data compression basics

HUFFMAN CODING

Overview
In this chapter, we describe a very popular coding algorithm called the Huffman coding algorithm Present a procedure for building Huffman codes when the probability model for the source is known A procedure for building codes when the source statistics are unknown Describe a new technique for code design that are in some sense similar to the Huffman coding approach

data compression basics

Huffman Coding Algorithm

Huffman Coding Algorithm

data compression basics

Huffman Coding Algorithm

Huffman Coding Algorithm

data compression basics

Huffman Coding Algorithm

Huffman Coding Algorithm

data compression basics

Huffman Coding Algorithm

Huffman Coding Algorithm

10

data compression basics

Minimum Variance Huffman Codes

11

Minimum Variance Huffman Codes

12

data compression basics

Minimum Variance Huffman Codes

13

Minimum Variance Huffman Codes

14

data compression basics

Minimum Variance Huffman Codes

15

Huffman Coding (using binary tree)


Algorithm in 5 steps:
1. Find the grey-level probabilities for the image by finding the histogram 2. Order the input probabilities (histogram magnitudes) from smallest to largest 3. Combine the smallest two by addition 4. GOTO step 2, until only two probabilities are left 5. By working backward along the tree, generate code by alternating assignment of 0 and 1

16

data compression basics

Huffman Coding (using binary tree)


Coding Procedures for an N-symbol source
Source reduction
List all probabilities in a descending order Merge the two symbols with smallest probabilities into a new compound symbol Repeat the above two steps for N-2 steps

Codeword assignment
Start from the smallest source and work back to the original source Each merging point corresponds to a node in binary codeword tree
17

Example 1
We have an image with 2 bits/pixel, giving 4 possible gray levels. The image is 10 rows by 10 columns. In step 1 we find the histogram for the image.

18

data compression basics

Example 1
Converted into probabilities by normalizing to the total number of pixels
Gray level 0 has 20 pixels Gray level 1 has 30 pixels Gray level 2 has 10 pixels Gray level 3 has 40 pixels

a. Step 1: Histogram
19

Example 1
Step 2, the probabilities are ordered.

20

10

data compression basics

Example 1
Step 3, combine the smallest two by addition.

21

Example 1
Step 4 repeats steps 2 and 3, where reorder (if necessary) and add the two smallest probabilities.

d. Step 4: Reorder and add until only two values remain.


22

11

data compression basics

Example 1
Step 5, actual code assignment is made.
Start on the right-hand side of the tree and assign 0s & 1s 0 is assigned to 0.6 branch & 1 to 0.4 branch

23

Example 1
The assigned 0 & 1 are brought back along the tree & wherever a branch occurs the code is put on both branches

24

12

data compression basics

Example 1
Assign the 0 & 1 to the branches labeled 0.3, appending to the existing code.

25

Example 1
Finally, the codes are brought back one more level, & where the branch splits another assignment 0 & 1 occurs (at 0.1 & 0.2 branch)

26

13

data compression basics

Example 1

Now we have Huffman code for this image


2 gray levels have 3 bits to represent & 1 gray level has 1 bit assigned Gray level represented by 1 bit, g3, is the most likely to occur (40% of the time) & thus has least information in the information theoretic sense.
27

Exercise
Using the example 1, find a Huffman code using the minimum variance procedure.

EE465: Introduction to Digital Image Processing

28

14

data compression basics

Example 2
Step 1: Source reduction symbol x S N E W p(x) 0.5

0.5

0.5 0.5 (NEW) compound symbols


29

0.25 0.125 0.125

0.25
0.25 (EW)

Example 2
Step 2: Codeword assignment symbol x NEW S 0 N 1 0 S EW 10 E N 1 0 W 111 110 W E 1 0 p(x) 0.5 codeword 0 0 1 10 110 111

0.5

0.5

0.25 0.25 0 0.5 0.125 0 0.25 1 0.125 1

30

15

data compression basics

Example 2
0 1 NEW 0 1 0 S EW 10 N 1 0 110 W E 1 0 NEW 1 0 1 S or EW 01 N 1 0 000 001 W E

The codeword assignment is not unique. In fact, at each merging point (node), we can arbitrarily assign 0 and 1 to the two branches (average code length is the same).
31

Example 2
Step 1: Source reduction symbol x e a i o u p(x) 0.4 0.4 0.4 0.6 (aiou) 0.4

0.2 0.2 0.1 0.1

0.2 0.2
0.2 (ou)

0.4 (iou) 0.2

compound symbols

32

16

data compression basics

Example 2
Step 2: Codeword assignment symbol x e a p(x) 0.4 0.4 0.2 0.2 0.4 0.4 (iou) 0.2 0.6 0 (aiou) 0.4 1 codeword 1 01

i o u

0.2 0.1 0.1

0.2
0.2 (ou)

000 0010 0011

compound symbols
33

Example 2
0 1 (aiou) e 00 01 (iou) a 000 001 (ou) i 0010 0011 o u binary codeword tree representation
34

17

data compression basics

Example 2
symbol x e a i o u p(x) 0.4 0.2 0.2 0.1 0.1 codeword length 1 1 01 2 3 000 0010 4 0011 4

l
i 1

pi li

0.4 1 0.2 2 0.2 3 0.1 4 0.1 4 2.2bps


5

H(X )
i 1

pi log 2 pi

2.122bps

r l

H ( X ) 0.078bps

If we use fixed-length codes, we have to spend three bits per sample, which gives code redundancy of 3-2.122=0.878bps
35

Example 3
Step 1: Source reduction

compound symbol
36

18

data compression basics

Example 3
Step 2: Codeword assignment

compound symbol
37

Adaptive Huffman Coding

38

19

data compression basics

Adaptive Huffman Coding

39

Update Procedure

40

20

data compression basics

Update Procedure

41

Update Procedure

42

21

data compression basics

Update Procedure

43

Update Procedure

44

22

data compression basics

Update Procedure

45

Dynamic Huffman Coding

46

23

data compression basics

T
Stage 1 (First occurrence of t ) r /\ 0 t(1) Order: 0,t(1) * r represents the root * 0 represents the null node * t(1) denotes the occurrence of T with a frequency of 1
47

TE
Stage 2 (First occurrence of e) r / \ 1 t(1) / \ 0 e(1) Order: 0,e(1),1,t(1)
48

24

data compression basics

TEN
Stage 3 (First occurrence of n ) r / \ 2 t(1) / \ 1 e(1) / \ 0 n(1) Order: 0,n(1),1,e(1),2,t(1) : Misfit
49

Reorder: TEN
r / \ t(1) 2 / \ 1 e(1) / \ 0 n(1) Order: 0,n(1),1,e(1),t(1),2
50

25

data compression basics

TENN
Stage 4 ( Repetition of n ) r / \ t(1) 3 / \ 2 e(1) / \ 0 n(2) Order: 0,n(2),2,e(1),t(1),3 : Misfit
51

Reorder: TENN
r / \ n(2) 2 / \ 1 e(1) / \ 0 t(1) Order: 0,t(1),1,e(1),n(2),2 t(1),n(2) are swapped

52

26

data compression basics

TENNE
Stage 5 (Repetition of e ) r / \ n(2) 3 / \ 1 e(2) / \ 0 t(1) Order: 0,t(1),1,e(2),n(2),3
53

TENNES
Stage 6 (First occurrence of s) r / \ n(2) 4 / \ 2 e(2) / \ 1 t(1) / \ 0 s(1) Order: 0,s(1),1,t(1),2,e(2),n(2),4
54

27

data compression basics

TENNESS
Stage 7 (Repetition of s) r / \ n(2) 5 / \ 3 e(2) / \ 2 t(1) / \ 0 s(2) Order: 0,s(2),2,t(1),3,e(2),n(2),5 : Misfit
55

Reorder: TENNESS
r / \ n(2) 5 / \ 3 e(2) / \ 1 s (2) / \ 0 t(1) Order : 0,t(1),1,s(2),3,e(2),n(2),5 s(2) and t(1) are swapped

56

28

data compression basics

TENNESSE
Stage 8 (Second repetition of e ) r / \ n(2) 6 / \ 3 e(3) / \ 1 s(2) / \ 0 t(1) Order : 0,t(1),1,s(2),3,e(3),n(2),6 : Misfit
57

Reorder: TENNESSE
r / \ e(3) 5 / \ 3 n(2) / \ 1 s(2) / \ 0 t(1) Order : 1,t(1),1,s(2),3,n(2),e(3),5 N(2) and e(3) are swapped

58

29

data compression basics

TENNESSEE
Stage 9 (Second repetition of e )
r / \1 e(4) 5 0/ \1 3 n(2) 0/ \1 1 s(2) 0/ \1 0 t(1)
0

Order : 1,t(1),1,s(2),3,n(2),e(4),5

59

ENCODING
The letters can be encoded as follows: e:0 n : 11 s : 101 t : 1001

60

30

data compression basics

Average Code Length


Average code length =
i=0,n (length*frequency)/ i=0,n

frequency

= { 1(4) + 2(2) + 3(2) + 1(4) } / (4+2+2+1) = 18 / 9 = 2

61

ENTROPY
Entropy = i=1,n

(pi log2 pi)

= - ( 0.44 * log20.44 + 0.22 * log20.22 + 0.22 * log20.22 + 0.11 * log20.11 ) = - (0.44 * log0.44 + 2(0.22 * log0.22 + 0.11 * log0.11) / log2 = 1.8367

62

31

data compression basics

Ordinary Huffman Coding


TENNESSE 9 0/ \1 5 e(4) 0/ \1 s(2) 3 0/ \1 t(1) n(2) ENCODING E:1 S : 00 T : 010 N : 011

Average code length = (1*4 + 2*2 + 2*3 + 3*1) / 9 = 1.89

63

SUMMARY
The average code length of ordinary Huffman coding seems to be better than the Dynamic version,in this exercise. But, actually the performance of dynamic coding is better. The problem with static coding is that the tree has to be constructed in the transmitter and sent to the receiver. The tree may change because the frequency distribution of the English letters may change in plain text technical paper, piece of code etc. Since the tree in dynamic coding is constructed on the receiver as well, it need not be sent. Considering this, Dynamic coding is better. Also, the average code length will improve if the transmitted text is bigger.

64

32

data compression basics

Summary of Huffman Coding Algorithm


Achieve minimal redundancy subject to the constraint that the source symbols be coded one at a time Sorting symbols in descending probabilities is the key in the step of source reduction The codeword assignment is not unique. Exchange the labeling of 0 and 1 at any node of binary codeword tree would produce another solution that equally works well Only works for a source with finite number of symbols (otherwise, it does not know where to start)
65

33

You might also like