Professional Documents
Culture Documents
Multimedia Data
Compression
•Compression with loss and lossless
•Huffman coding
•Entropy coding
•Adaptive coding
•Dictionary-based coding(LZW)
chapter3: Multimedia
2
Compression
Why Compress
• Raw data are huge.
• Audio:
CD quality music
44.1kHz*16bit*2 channel=1.4Mbps
• Video:
near-DVD quality true color animation
640px*480px*30fps*24bit=220Mbps
• Impractical in storage and bandwidth
chapter3: Multimedia
9
Compression
Huffman Coding
• Huffman codes can be used to compress
information
– Like WinZip – although WinZip doesn’t use the
Huffman algorithm
– JPEGs do use Huffman as part of their compression
process
• The basic idea is that instead of storing each
character in a file as an 8-bit ASCII value, we will
instead store the more frequently occurring
characters using fewer bits and less frequently
occurring characters using more bits
– On average this should decrease the filesize (usually ½)
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
chapter3: Multimedia Compression 11
Huffman Coding
• We then pick the nodes with the smallest
frequency and combine them together to
form a new node
– The selection of these nodes is the Greedy part
• The two selected nodes are removed from
the set, but replace by the combined node
• This continues until we have only 1 node
left in the set
e,3 d,2 u,2 l,2 sp,2 k,1 b,1 v,1 i,1 s,1
i,1 s,1
b,1 v,1
b,1 v,1
e,3 4 4 3 2
b,1 v,1
e,3 4 4 5
b,1 v,1
7 4 5
b,1 v,1
7 9
e,3 4 4 5
b,1 v,1
16
7 9
e,3 4 4 5
b,1 v,1
chapter3: Multimedia Compression 22
Huffman Coding
• Now we assign codes to the tree by
placing a 0 on every left branch and a 1 on
every right branch
• A traversal of the tree from root to leaf
give the Huffman code for that particular
leaf character
• Note that no code is the prefix of another
code
b,1 v,1
chapter3: Multimedia
27
Compression
Variations of RLE (Zero-suppression
technique)
• Assumes that only one symbol appears
often (blank)
• Replace blank sequence by M-byte and a
byte with number of blanks in sequence
– Example: M3, M4, M14,…
• Some other definitions are possible
– Example:
• M4 = 8 blanks, M5 = 16 blanks, M4M5=24 blanks
chapter3: Multimedia
28
Compression
Adaptive Coding
Motivations:
– The previous algorithms (Huffman) require the statistical
knowledge which is often not available (e.g., live audio, video).
– Even when it is available, it could be a heavy overhead.
– Higher-order models incur more overhead. For example, a 255
entry probability table would be required for a 0-order model. An
order-1 model would require 255 such probability tables. (A
order-1 model will consider probabilities of occurrences of 2
symbols)
The solution is to use adaptive algorithms. Adaptive
Huffman Coding is one such mechanism that we will
study.
The idea of “adaptiveness” is however applicable to other
adaptive compression algorithms.
ENCODER
Initialize_model();
do { DECODER
c = getc( input ); Initialize_model();
encode( c, output ); while ( c = decode (input)) != eof) {
update_model( c ); putc( c, output)
} while ( c != eof) update_model( c );
}
r The key is that, both encoder and decoder use exactly the
same initialize_model and update_model routines.
Yes
by the ASCII encoding of
the character. This allows
Increment
node weight
Yes
code and a new
STOP character. Also, the
procedure creates a new
chapter3: Multimedia Compression 32
Example
NYT
#0
Initial Huffman
Counts: Tree
Root
(number of W=16
#8
occurrences)
W=6 E
B:2 #6 W=10
C:2 #7
W=2 W=4
D:2 #4 #5
E:10
NYT B C D
W=2 W=2 W=2
#0 #1 #2 #3
Counts:
(number of Root
W=16+1
occurrences) #10
A:1 W=6+1 E
#8 W=10
B:2 #9
C:2 W=2+1 W=4
#6 #7
D:2
E:10 W=1 B C D
#2 W=2 W=2 W=2
#3 #4 #5
NYT A
#0 W=1
#1
W=1+1 B C D
#2 W=2 W=2 W=2
#3 #4 #5
NYT A
W=1+1
#0
#1
W=2
B
#2
W=2
C
W=2
D
W=2 Swap nodes 1
A
#3 #4 #5 and 5
NYT W=2
#0 #1
Counts:
Root
A:3 W=18+1
#10
B:2
C:2 W=8+1 E
#8 W=10
D:2 #9
E:10 W=4 W=4+1
#6 #7
W=2
#2 B C A
W=2 W=2 W=2+1
#3 #4 #5
D
NYT W=2
#0 #1
chapter3: Multimedia Compression 36
Swapping … contd.
Counts:
Root
A:3+1 W=19+1
#10
B:2
C:2 W=9+1 E
#8 W=10
D:2 #9
E:10 W=4 W=5+1
#6 #7
W=2 B C A
#2 W=2 W=2 W=3+1
#3 #4 #5
D
NYT W=2
#0 #1
W=2 B C A
#2 W=2 W=2 W=4
#3 #4 #5
D
NYT W=2
#0 #1
Swap nodes 5
and 6
W=2 B
#2 W=2
#3
D
NYT W=2
#0 #1
Swap nodes 8
and 9
W=2 B
#2 W=2
#3
D
NYT W=2
#0 #1
BABAABAAA
BABAABAAA P=A
C=empty
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
BABAABAAA P=B
C=empty
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
65 A 257 AB
BABAABAAA P=A
C=empty
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
65 A 257 AB
256 BA 258 BAA
BABAABAAA P=A
C=empty
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
65 A 257 AB
256 BA 258 BAA
257 AB 259 ABA
BABAABAAA P=A
C=A
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
65 A 257 AB
256 BA 258 BAA
257 AB 259 ABA
65 A 260 AA
BABAABAAA P=AA
C=empty
ENCODER OUTPUT STRING TABLE
output code representing codeword string
66 B 256 BA
65 A 257 AB
256 BA 258 BAA
257 AB 259 ABA
65 A 260 AA
260 AA
chapter3: Multimedia Compression 51
LZW Decompression
<66><65><256><257><65><260>.
Simply forget about adding any more entries and use the table
as is.