You are on page 1of 23

Module 6:

Image Compression: Need for compression, redundancy, classification of


image compression schemes, Huffman coding, arithmetic coding, dictionary
based compression, transform based compression
Image compression standards- JPEG& MPEG, vector quantization, wavelet
based image compression.

Module #6
Title: Image compression: redundancy

Explanation:

Image compression:

 Data compression refers to the process of reducing the amount of data required to
represent a given quantity of information.
 Data redundancy is a central issue in DIP.
 Data that either provide no relevant information or simply restate which is already known
is said to contain data redundancy.
 For achieving compression we have to reduce redundancy.
 If n1 and n2 denote the no: of information carrying units in two data sets that convey the
same information, the relative data redundancy RD of the first data set can be defined as
1
𝑅𝐷 =
𝐶𝑅
Where CR is called the compression ratio and is given by
𝑛1
𝐶𝑅 =
𝑛2
 In digital image compression, 3 basic data redundancies can be identified and exploited.
They are coding redundancy, interpixel redundancy and psychovisual redundancy.
 Coding Redundancy: Data compression can be achieved by encoding the data using an
appropriate encoding scheme. The elements of an encoding scheme are
i. Code: a system of symbols used to represent a body of information or set of
events (letters, numbers etc.)
ii. Code word: a sequence of symbols used to represent a piece of information
or an event
iii. Word length: number of symbols in each code word
 Let a discrete random variable rk in the interval [0, 1] represents the gray values
constituting an image and that each rk occurs with a probability of pr(rk). Then
pr(rk) is given by
𝑛𝑘
𝑝𝑟 (𝑟𝑘 ) = ; 𝑘 = 𝑜, 1, 2, … , 𝐿 − 1
𝑛
where L is the no: of gray values, nk is the histogram and n is the total no: of
pixels in the image.
 Let the no: of bits used to represent rk is l(rk), then the average no: of bite required
to represent each pixel is

𝐿𝑎𝑣𝑔 = ∑𝐿−1
𝑘=0 𝑙(𝑟𝑘 )𝑝𝑟 (𝑟𝑘 ).
 Thus the total no: of bits required to code an M x N image is MNLavg.
 Two types of codes are there: constant length coding (fixed length coding) and
variable length coding.
 Consider the table given below:

 Code 1 represents a fixed length coding scheme and code 2 represents a variable
length coding scheme. i.e. Lavg of code 1 is 3 bits and that of code 2 is 2.7 bits.
The resulting compression ratio is CR= 3/2.7=1.11 and redundancy is RD=1-
(1/1.11)=0.099. i.e. 9.9% of the first coding scheme is redundant. This is coding
redundancy.

 Coding Redundancy:
 Interpixel redundancy implies that any pixel value can bereasonably predicted by
its neighbors (i.e., correlated).
 Structural or geometric relationships between the objects in an image can lead to
interpixel redundancy.
 A variety of names, including spatial redundancy, interframe redundancy and
geometric redundancy have been coined to refer these interpixel dependencies.
We use the term interpixel redundancy to encompass them all.
 In order to reduce this, the pixel array is transformed into a more efficient form.
For eg: the differences between adjacent pixels can be used to represent an image.

 Psychovisual Redundancy:
 Certain information simply has less relative importance than other information in
normal visual processing. This information is said to be psychovisually redundant.
 This type of redundancy is associated with real or quantifiable information.
 Since the elimination of this result in loss of quantifiable information, it is
referred to as quantization.

Questions:

1. What do you mean by redundancy? Explain the different types of redundancies.


2. Explain image compression.

References:

1. “Digital image processing”, by Gonzalez and Woods


2. www.wikipedia.org
Module #6
Title: Image compression: image compression model

Explanation:

Image compression model:

 A compression system consists of 2 structural blocks: encoder and decoder.


 The encoder is made of a source encoder, which removes input redundancies, and a
channel encoder, which increases the noise immunity of the source encoder’s output.

 The mapper transforms the input image into a format designed to reduce interpixel
redundancies.
 The quantizer reduces the accuracy of the mappers output in accordance with some
predefined fidelity criteria. This reduces the psychovisual redundancy.
 The symbol encoder reduces the coding redundancy by assigning shortest codeword to
the most frequently occurring data.
 The reverse process is performed in the source decoder section.

Questions:

3. Explain the image compression model with the help of a block diagram.
4. Explain how the various types of redundancies are reduced by the compression system.

References:
3. “Digital image processing”, by Gonzalez and Woods
4. www.wikipedia.org
Module #6
Title: Image compression: elements of information theory

Explanation:

Elements of information theory:

 The information carried by an event of probability P(E) is given by


1
𝐼(𝐸) = 𝑙𝑜𝑔 = −log⁡(𝑃(𝐸))
𝑃(𝐸)
 Logarithm to the base ‘m’ is used. If m=2, then the unit of I(E) is bits.
 Consider a source whose outputs are discrete random variables. The set of source
symbols {a1, a2, ... an} is referred to as source alphabets A and the elements are called the
symbols.
 The average information called the ‘entropy’ is given by
𝑛

𝐻 = − ∑ 𝑃(𝑎𝑖 )𝑙𝑜𝑔 ⁡(𝑃(𝑎𝑖 ))


𝑖=1
 Entropy defines the average information obtained by observing a single source output.

Shannon’s first theorem (noiseless coding theorem):

 It defines the minimum average code word length per source symbol that can be
achieved.
 A source of information with finite ensemble statistically independent source symbols is
called zero-memory source.
 According to this theorem
𝐿𝑎𝑣𝑔
𝐻(𝑧) = lim
𝑛→∞ 𝑛

Questions:

1. Define the terms information and entropy.


2. State the noiseless coding theorem..

References:
1. “Digital image processing”, by Gonzalez and Woods
2. www.wikipedia.org
Module #6
Title: Image compression: variable length coding (Huffman coding)

Explanation:

Variable length coding:

 It is used to reduce coding redundancy.


 Different types are there.
 Examples: Huffman coding, Arithmetic coding, Golomb coding, etc.

Huffman coding:

 It is a variable length coding.


 It generates block codes.
 It is a probabilistic coding technique, because for performing Huffman coding we need
the probability of occurrence of all symbols.
 Huffman algorithm is given by
o Sort the symbol according to the decrease in probability value.
o Then the source reduction is according to the following table.

o After the above step the code assignment is according to the following table.
Huffman decoding:
 It is uniquely decidable because any string of code word can be decoded in only one way.
 For the binary code of the above figure, a left to right scan of the encoded string
010100111100 reveals that the first valid code word is 01010, which is the code for
symbol a3. The next valid code is 011, which corresponds to symbol a1. Continuing in
this manner reveals the completely decoded message to be a3a1a2a2a6.

Questions:

1. Explain the Huffman coding algorithm.


2. Code the following symbols using Huffman coding: symbols and their probability
values are given.

References:

1. “Digital image processing”, 3/e, by Gonzalez and Woods (Page no: 564-565)
2. www.wikipedia.org
Module #6
Title: Transform based compression techniques

Explanation:

 In transform coding, a reversible, linear transform (such as Fourier transform, DCT, etc) is used
to map the image into a set of transform coefficients, whch are then quantized and coded.

 Transform coding is a type of data compression for "natural" data like audio signals or
photographic images. The transformation is typically lossless (perfectly reversible) on its
own but is used to enable better (more targeted) quantization, which then results in a
lower quality copy of the original input (lossy compression).
 In transform coding, knowledge of the application is used to choose information to
discard, thereby lowering its bandwidth. The remaining information can then be
compressed via a variety of methods. When the output is decoded, the result may not be
identical to the original input, but is expected to be close enough for the purpose of the
application.
 In the encoder side, 4 relatively straight forward operations are performed: subimage
decomposition, transformation, quantization and coding.
 An N x N image is first subdivided into subimages of size n x n, which are then
transformed to generate (N/n)2 subimage transform arrays, each of size n x n. the goal of
the transformation process is to decorrelate the pixels of each subimage, or to pack as
much information as possible into the smallest number of transform coefficients.
 The quantization stage then selectively eliminates or more coarsely quantizes the
coefficients that carry the least information. These coefficients have the smallest impact
on reconstructed subimage quality. The encoding process terminates by coding, normally
a variable length coding like Huffman coding.
 In decoding stage, the reverse of all operations performed in encoding stage are
performed to reconstruct the original image.
 Example are image compression standards like JPEG, JPEG2000, etc.
Module #6
Title: Image compression: MPEG, JPEG, JPEG2000

Explanation:

MPEG standard with the help of its block diagram.


(Elaborate the explanation based on each block)

 The key idea is to combine transform coding (in the form of the Discrete
Cosine Transform (DCT) of 8 × 8 pixel blocks) with predictive coding (in
the form of differential Pulse Code Modulation (PCM)) in order to reduce
storage and computation of the compressed image, and at the same time to
give a high degree of compression and adaptability.
 Since motion compensation is difficult to perform in the transform domain,
the first step in the interframe coder is to create a motion compensated
prediction error in the pixel domain.
 For each block of current frame, a prediction block in the reference frame is
found using motion vector found during motion estimation, and differenced
to generate prediction error signal. This computation requires only a single
frame store in the encoder and decoder.
 The resulting error signal is transformed using 2D DCT, quantized by an
adaptive quantizer, entropy encoded using a VariableLength Coder (VLC)
and buffered for transmission over a fixed rate channel.

JPEG2000 standard with the help of its block diagram.


(Elaborate the explanation based on each block)

General block diagram is given by

 Pre-processing:
o Image tiling (optional) - for each image component
o DC level shifting - samples of each tile are subtracted the same
quantity
o Color transformation (optional) - from RGB to Y Cb Cr
 Discrete Wavelet Transform (DWT) is used to decompose each
tilecomponent into different sub-bands.
 The transform is in the form of dyadic decomposition and use
biorthogonalwavelets.
 1-D sets of samples are decomposed into low-pass and high-pass samples.
 Low-pass samples represent a down-sampled, low resolution version of
the original set.
 High pass samples represent a down-sampled residual version of the
original set (details).

 After transformation, all coefficients are quantized using scalar quatization.


 Quantization reduces coefficients in precision.
 The coefficients in a code block are separated into bit-planes. The
individualbit-planes are coded in 1-3 coding passes
 Each of these coding passes collects contextual information about the
bitplane data. The contextual information along with the bit-planes are used
by thearithmetic encoder to generate the compressed bit-stream.
 The bit planes of the coefficients in a code block (i.e., the bits of equal
significance across the coefficients in a code block) are entropy coded.
 The encoding can be done in such a way that certain regions of interest can
be coded at a higher quality than the background.
 Markers are added to the bit stream to allow for error resilience

JPEG standard with the help of its block diagram.


(Elaborate the explanation based on each block)
 It is a compression standard using the transform coding techniques.
 Its block diagram is given below.
 At first the whole image is divided into 8x8 blocks.
 Each block is DCT coded.
 The DC coefficient of each block is divided from the remaining AC coefficients and is
DPCM coded.
 The AC coefficients are counted in a zig-zag manner as shown and are entropy coded
(Huffman coding).

General block diagram is given by

 Pre-processing:
 Image tiling (optional) - for each image component
 DC level shifting - samples of each tile are subtracted the same quantity
 Color transformation (optional) - from RGB to Y Cb Cr
 Discrete Cosine Transform (DCT) is used to decompose each tilecomponent
into different sub-bands.
 After transformation, all coefficients are quantized using scalar quatization.
 Quantization reduces coefficients in precision.
 The coefficients in a code block are separated into bit-planes. The
individualbit-planes are coded in 1-3 coding passes
 Each of these coding passes collects contextual information about the
bitplane data. The contextual information along with the bit-planes are used
by thearithmetic encoder to generate the compressed bit-stream.
Module #6
Title: Image compression: Dictionary based compression

Explanation:
LZW coding:
 General purpose compression technique proposed by Lempel-Ziv-Welch (LZW).
 LZW uses fixed-length codewords to represent variable-length strings of
symbols/characters that commonly occur together, e.g., words in English text.
 LZW encoder and decoder build up the same dictionary dynamically while receiving
the data.
 LZW places longer and longer repeated entries into a dictionary, and then emits the
code for an element, rather than the string itself, if the element has already been placed
in the dictionary.
 It is conceptually very simple. At the onset of the coding process, a codebook or
dictionary containing the source symbols to be coded is constructed.
 With 8-bit image data, an LZW coding method could employ 10-bit words. The
corresponding string table would then have 210 = 1024 entries
 This table consists of the original 256 entries, corresponding to the original 8-bit data,
and allows 768 other entries for string codes
 The string codes are assigned during the compression process, but the actual string
table is not stored with the compressed data
 During decompression the information in the string table is extracted from the
compressed data itself
 For the GIF (and TIFF) image file format the LZW algorithm is specified, but there
has been some controversy over this, since the algorithm is patented by Unisys
Corporation
 Since these image formats are widely used, other methods similar in nature to the
LZW algorithm have been developed to be used with these, or similar, image file
formats
 LZW (Lempel-Ziv-Welch) coding, assigns fixed-length code words to variable length
sequences of source symbols, but requires no a priori knowledge of the probability of
the source symbols.
 LZW is used in:
o Tagged Image file format (TIFF)
o Graphic interchange format (GIF)
o Portable document format (PDF)
 LZW was formulated in 1984
 A codebook or “dictionary” containing the source symbols is constructed.
 For 8-bit monochrome images, the first 256 words of the dictionary are assigned to the
gray levels 0-255
 Remaining part of the dictionary is filled with sequences of the gray levels
 Special features are
o The dictionary is created while the data are being encoded. So encoding can be
done “on the fly”
o The dictionary is not required to be transmitted. The dictionary will be built up
in the decoding
o If the dictionary “overflows” then we have to reinitialize the dictionary and add
a bit to each one of the code words.
o Choosing a large dictionary size avoids overflow, but spoils compressions

 LZW coding example


 Let the bit stream received be:
39 126 126 256 258 260 259 257 126
 In LZW, the dictionary which was used for encoding need not be sent with the image.
A separate dictionary is built by the decoder, on the “fly”, as it reads the received code
words.
Recognized Encoded pixels Dic. Dic.
value address entry

39 39

39 39 39 256 39-39

39 126 126 257 39-126

126 126 126 258 126-126

126 256 39-39 259 126-39

39-39-
256 258 126-126 260 126

39-39- 126-
258 260 261 126-39
126

39-39-
260 259 126-39 262 126-126

126-39-
259 257 39-126 263 39

39-126-
257 126 126 264 126
Module #6
Title: Image compression: Vector quantization

Explanation:
Vector quantization
 Vector quantization (VQ) is a classical quantization technique from signal processing that
allows the modeling of probability density functions by the distribution of prototype
vectors. It was originally used for data compression. It works by dividing a large set of
points (vectors) into groups having approximately the same number of points closest to
them. Each group is represented by its centroid point, as in k-means and some
other clustering algorithms and is known as index.
 The density matching property of vector quantization is powerful, especially for
identifying the density of large and high-dimensional data. Since data points are
represented by the index of their closest centroid, commonly occurring data have low
error, and rare data high error. This is why VQ is suitable for lossy data compression. It
can also be used for lossy data correction and density estimation.
 Vector quantization is based on the competitive learning paradigm, so it is closely related
to the self-organizing map model and to sparse coding models used in deep learning
algorithms such as autoencoder.
 Group into vectors (non-overlapping) and “quantize” each vector.
o For time signals… we usually form vectors from temporally-sequential samples.
o For images… we usually form vectors from spatially-sequential samples.
 The source symbols are grouped into vectors and each vector is entered
into the codebook and assigned an index in the index table or the look-up
table. Thus the encoder is created and the decoder has the same table as
that in the encoder.
 When a query comes, the encoder finds the closest match and transmit the
index to the decoder side.
 While receiving the index, the best match in the lookup table is find out and
the corresponding data from the codebook is decoded. Then the message is
unblocked for getting the actual message.
Module #6
Title: Image compression: Wavelet- based image compression

Explanation:
Wavelet- based image compression
 It utilizes an unconditional basis function that decreases the size of the expansion
coefficients to a negligible value as the index values increase.
 The wavelet expansion allows for a more precise and localized isolation and
description of the signal characteristics. This ensures that DWT is very much effective
in image compression applications.
 Secondly, the inherent flexibility in choosing a wavelet gives scope to design wavelets
customized to fit individual requirements.
 The basis functions employed by the Wavelet Transform are called wavelets. Wavelets
are mother waves used for transformation. These waves are scaled and shifted
according to the variations in the signal to be analysed. Mother wavelet Ψ(t):by
scaling and translating the mother wavelet, we can obtain the rest of the function for
the transformation (child wavelet, Ψa,b(t))

1 t b
 a ,b (t )  ( )
a a
 Wavelet analysis can be used to represent the image in terms of two sub-signals.
Firstly, the approximation sub-signal that captures the general trends in the image
samples and the detail sub-signal that contains the high frequency vertical, horizontal
and diagonal information.
 No need to block the input image and its basis functions have variable length to avoid
blocking artifacts.
 More robust under transmission and decoding errors.
 Better matched to the HVS characteristics
 Good frequency resolution at lower frequencies, good time resolution at higher
frequencies – good for natural images.
 LL – approximation coefficients.
 HL – Horizontal edges
 LH – Vertical edges
 HH – Diagonal edges
 The second figure shows the 3 level decomposition of an image.
 Advantages of DWT over DCT
o Blocking artifacts are avoided since it has higher compression ratios
o The input need not be partitioned into non-overlapping blocks while coding.
o It allows good localization both in time and spatial frequency domain.
o It introduces inherent scaling and performs the transformation of the whole
image.
o Since it can identify more accurately the data relevant to human perception, it
achieves higher compression ratio
o Higher flexibility: the inherent flexibility in choosing a wavelet gives scope to
design wavelets customized to fit individual requirements.

You might also like