You are on page 1of 5

Page 1 of 5

Data Compression
1. Image data properties
a. Coding redundancy (Based on intensity value) Code word: a sequence of symbols use to represent a piece of information or event Code length: the number of symbols If there is a set of code word to represent the original data with fewer bits, the original data is said to have coding redundancy. Coding redundancy is present when the codes do not take full advantage of the probabilities of the events Variable length coding -assign fewer bits to the more probable gra y level than less probable ones achieve data compression b. Spatial and temporal/ Interpixel redundancy Remove unnecessarily replicated in the representations of the correlated pixel Interpixel redundancy is resulted from the correlation between neighboring pixels Because the value of any given pixel can be reasonably predicted from the value of its neighbors, the information carried by individual pixels is relatively small. To reduce interpixel redundancy, the original image will be transformed to a more efficient and nonvisual format. This transformation is called mapping Variable Length Coding can be used to reduce the coding redundancy that would result from a straight or natural binary coding of their pixels The coding would not alter the level of correlation between the pixels within the images 2-D pixel array used for human viewing and interpretation must transformed into a more efficient format Mapping represent an image with difference between adjacent pixels Reversible mapping -the original image elements can be reconstructed from the transformed data set

Page 2 of 5

Map the pixels along each scan line f(x, 0) f(x, 1) f(x, N-1) into a sequence of pair. The thresholded image can be more efficiently represented by the values and length of its constant gray-level runs than by a 2-D array of binary pixels c. Psychovisual redundancy /Irrelevant information Certain information simply has less relative importance than other information in normal visual processing; this information is called Psychovisual redundant information. This information can be eliminated without significantly impairing the quality of image perception Human perception of the information in an image does not involve quantitative analysis of every pixel value Unlike the other 2 redundancy it Is associated with real or quantifiable visual information. Its elimination is possible only because the information itself is not essential for normal visual processing. Since the elimination of Psychovisual redundant data results in loss of quantitative information, it is commonly referred to as quantization . IGS (improved gray-scale quantization) Expense of some additional but less objectionable grain ness . Using break edges by adding to each pixel a pseudo-random number.

2. Lossless image compression


In lossless data compression, the integrity of the data is preserved. The original data and the data after compression and decompression are exactly the same because, in these methods, the compression and decompression algorithms are exact inverses of each other: no part of the data is lost in the process. Redundant data is removed in compression and added during decompression. Lossless compression methods are normally used when we cannot afford to lose any data. Lossless Image Compression is required or highly desired by many applications, such as Medical imaging, Remote sensing, Image archiving.

Page 3 of 5

a. Run length coding: Run-length encoding is probably the simplest method of compression. It can be used to compress data made of any combination of symbols. It does not need to know the frequency of occurrence of symbols and can be very efficient if data is represented as 0s and 1s. The general idea behind this method is to replace consecutive repeating occurrences of a symbol by one occurrence of the symbol followed by the number of occurrences. Sampled images and audio and video data streams often contain sequences of identical bytes by replacing these sequences with the byte pattern to be repeated and providing the number of its occurrence, data can be reduced substantially.

b. Shannon-Fanon coding

Page 4 of 5

c. Huffman coding Huffman coding assigns shorter codes to symbols that occur more frequently and longer codes to those that occur less frequently. 1. Rank all symbols in order of probability of occurrence. 2. Locate the two symbols with the smallest probabilities. 3. Replace these two symbols by a new composite symbol, whose probability is the sum of the individual probability. 4. Create a node with children of 2 symbols. 5. Repeat steps 1, 2, 3 until there is only 1 symbol. Eventually, we will have a tree where each node is the sum of the probabilities of all leaf nodes beneath it. 6. Traverse the tree from the root to each leaf, recording 0 for a left branch and 1 for a right branch. For example, imagine we have a text file that uses only five characters (A, B, C, D, E). Before we can assign bit patterns to each character, we assign each character a weight based on its frequency of use. In this example, assume that the frequency of the characters is as shown below:

Page 5 of 5

A characters code is found by starting at the root and following the branches that lead to that character. The code itself is the bit value of each branch on the path, taken in sequence. Important property of Huffman coding is: No code forms a prefix of any other.

d. LZW coding e. Linear/Lossless prediction coding 3. Lossy Image compression a. Lossy predictive coding Quantization

You might also like