You are on page 1of 43

Chapter 1

INTRODUCTION
Digital image processing is an area characterized by the need for extensive experimental work to establish the viability of proposed solutions to a given problem. An important characteristic underlying the design of image processing systems is the significant level of testing & experimentation that normally is required before arriving at an acceptable solution. This characteristic implies that the ability to formulate approaches &quickly prototype candidate solutions generally plays a major role in reducing the cost & time required to arrive at a viable system implementation.

1.1 What is DIP?


An image may be defined as a two-dimensional function f(x, y), where x & y are spatial coordinates, & the amplitude of f at any pair of coordinates (x, y) is called the intensity or gray level of the image at that point. When x, y & the amplitude values of f are all finite discrete quantities, we call the image a digital image. The field of DIP refers to processing digital image by means of digital computer. Digital image is composed of a finite number of elements, each of which has a particular location & value. The elements are called pixels. Vision is the most advanced of our sensor, so it is not surprising that image play the single most important role in human perception. However, unlike humans, who are limited to the visual band of the EM spectrum imaging machines cover almost the entire EM spectrum, ranging from gamma to radio waves. They can operate also on images generated by sources that humans are not accustomed to associating with image. There is no general agreement among authors regarding where image processing stops & other related areas such as image analysis& computer vision start. Sometimes a distinction is made by defining image processing as a discipline in which both the input & output at a process are images. This is limiting & somewhat artificial boundary. The area of image analysis (image understanding) is in between image processing & computer vision. There are no clear-cut boundaries in the continuum from image processing at one end to complete vision at the other. However, one useful paradigm is to consider three 1

types of computerized processes in this continuum: low-, mid-, & high-level processes. Low-level process involves primitive operations such as image processing to reduce noise, contrast enhancement & image sharpening. A low- level process is characterized by the fact that both its inputs & outputs are images. Mid-level process on images involves tasks such as segmentation, description of that object to reduce them to a form suitable for computer processing & classification of individual objects. A mid-level process is characterized by the fact that its inputs generally are images but its outputs are attributes extracted from those images. Finally higher- level processing involves Making sense of an ensemble of recognized objects, as in image analysis & at the far end of the continuum performing the cognitive functions normally associated with human vision. Digital image processing, as already defined is used successfully in a broad range of areas of exceptional social & economic value. Images are an everyday aspect of computers now. Web sites on the internet are generally made up of many pictures. A large proportion of transmission bandwidth and storage facilities are taken up with computer images. Reducing the storage requirements of the image while retaining the quality is very important, otherwise systems would become completely clogged. Since 1990, the JPEG1 picture format has been adopted as the standard for photographic images on the internet. This project looks at another method for compressing images using the Singular Value Decomposition (SVD).

1.2 What is an Image?


An image is represented as a two dimensional function f(x, y) where x and y are spatial coordinates and the amplitude of f at any pair of coordinates (x, y) is called the intensity of the image at that point.

1.3 Coordinate Convention


The result of sampling and quantization is a matrix of real numbers. We use two principal ways to represent digital images. Assume that an image f(x, y) is sampled so that the resulting image has M rows and N columns. We say that the image is of size M X N. The values of the coordinates (xylem) are discrete quantities. For notational clarity and convenience, we use integer values for these discrete coordinates. In many image 2

processing books, the image origin is defined to be at (xylem)=(0,0).The next coordinate values along the first row of the image are (xylem)=(0,1).It is important to keep in mind that the notation (0,1) is used to signify the second sample along the first row. It does not mean that these are the actual values of physical coordinates when the image was sampled. Following figure shows the coordinate convention. Note that x ranges from 0 to M-1 and y from 0 to N-1 in integer increments. The coordinate convention used in the toolbox to denote arrays is different from the preceding paragraph in two minor ways. First, instead of using (xylem) the toolbox uses the notation (race) to indicate rows and columns. Note, however, that the order of coordinates is the same as the order discussed in the previous paragraph, in the sense that the first element of a coordinate topples, (alb), refers to a row and the second to a column. The other difference is that the origin of the coordinate system is at (r, c) = (1, 1); thus, r ranges from 1 to M and c from 1 to N in integer increments. IPT documentation refers to the coordinates. Less frequently the toolbox also employs another coordinate convention called spatial coordinates which uses x to refer to columns and y to refers to rows. This is the opposite of our use of variables x and y.

1.4 Image as Matrices


The preceding discussion leads to the following representation for a digitized image function f (0,0) f (1,0) f(xylem) = . . . f(0,1) f(1,1) . . . .. f(0,N-1)

f(1,N-1) . . .

f (M-1,0) f(M-1,1) f(M-1,N-1)

The right side of this equation is a digital image by definition. Each element of this array is called an image element, picture element, pixel or pel. The terms image and pixel are used throughout the rest of our discussions to denote a digital image and its element. A digital image can be represented naturally as a MATLAB matrix:

f(1,1) f(1,2) . f(1,N) f(x, y) = f(2,1) f(2,2) .. f(2,N) . . . f (M, 1) f(M,2) .f(M,N)

Where f(1,1) = f(0,0) (note the use of a monoscope font to denote MATLAB quantities). Clearly the two representations are identical, except for the shift in origin. f(p ,q) denotes the element located in row p and the column q. Matrices in MATLAB are stored in variables with names such as A, a, RGB, real array and so on. Variables must begin with a letter and contain only letters, numerals and underscores. As noted in the previous paragraph, all MATLAB quantities are written using monoscope characters. We use conventional Roman, italic notation such as f(x ,y), for mathematical expressions.

1.5 Image Types


The toolbox supports four types of images: Intensity images Binary images Indexed images RGB images

Most monochrome image processing operations are carried out using binary or intensity images, so our initial focus is on these two image types. Indexed and RGB color images.

1.5.1 Intensity Images An intensity image is a data matrix whose values have been scaled to represent intentions. When the elements of an intensity image are of class unit8, or class unit 16, they have integer values in the range [0,255] and [0, 65535], respectively. 1.5.2 Binary Images Binary images have a very specific meaning in MATLAB.A binary image is a logical array of 0s and1s.Thus, an array of 0s and 1s whose values are of data class, say unit8, is not considered as a binary image in MATLAB .A numeric array is converted to binary using function logical. Thus, if A is a numeric array consisting of 0s and 1s, we create an array B using the statement. B=logical (A) If A contains elements other than 0s and 1s.Use of the logical function converts all nonzero quantities to logical 1s and all entries with value 0 to logical 0s. Using relational and logical operators also creates logical arrays. To test if an array is logical we use the I logical function: is logical(c). If c is a logical array, this function returns a 1.Otherwise returns a 0. Logical array can be converted to numeric arrays using the data class conversion functions. 1.5.3 Indexed Images An indexed image has two components: A data matrix integer, x. A color map matrix, map. Matrix map is an m*3 arrays of class double containing floating_ point values in the range [0, 1]. The length m of the map are equal to the number of colors it defines. Each row of map specifies the red, green and blue components of a single color. An indexed images uses direct mapping of pixel intensity values color map values. The color of each pixel is determined by using the corresponding value the integer matrix x as a pointer in to map. If x is of class double ,then all of its components with values less than or equal to 1 point to the first row in map, all components with value 2 point to the second row and so on. If x is

of class units or unit 16, then all components value 0 point to the first row in map, all components with value 1 point to the second and so on. 1.5.4 RGB Images An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet corresponding to the red, green and blue components of an RGB image, at a specific spatial location. An RGB image may be viewed as stack of three gray scale images that when fed in to the red, green and blue inputs of a color monitor. Produce a color image on the screen. Convention the three images forming an RGB color image are referred to as the red, green and blue components images. The data class of the components images determines their range of values. If an RGB image is of class double the range of values is [0, 1]. Similarly the range of values is [0,255] or [0, 65535].For RGB images of class units or unit 16 respectively. The number of bits use to represents the pixel values of the component images determines the bit depth of an RGB image. For example, if each component image is an 8bit image, the corresponding RGB image is said to be 24 bits deep. Generally, the number of bits in all component images is the same. In this case the number of possible color in an RGB image is (2^b) ^3, where b is a number of bits in each component image. For the 8bit case the number is 16,777,216 colors.

1.6 Need for Image Compression


One of the important aspects of image storage is its efficient Compression. To make this fact clear let's see an example. An image, 1024 pixel x 1024 pixel x 24 bit without compression would require 3 MB of storage and 7 minutes for transmission, utilizing a high speed 64 Kbits/s ISDN line. If the image is compressed at a 10:1 compression ratio, the storage requirement is reduced to 300 KB and the transmission time drops to under 6 seconds. Seven 1 MB images can be compressed and transferred to a floppy disk in less time than it takes to send one of the original files, uncompressed, over an AppleTalk network. In a distributed environment large image files remain a major bottleneck within systems. Compression is an important component of the solutions available for creating file

sizes of manageable and transmittable dimensions. Increasing the bandwidth is another method, but the cost sometimes makes this a less attractive solution.

1.7 Data Compression


Data compression is similar in concept to lossless image compression; however it does not try to interpret the data as an image. Data compression searches for patterns in a data stream. Common data compression methods are deflated and LZW 6. The lossless compression file format GIF7 simply turns the image into a long string of data (all the horizontal lines appended) and applies LZW data compression. The lossy file format JPEG uses Huff-man encoding to compress the data stream that is output from the Discrete Cosine Transformation process. This project will not deal with data compression. However as the main goal of this project is to compare SVD against JPEG, a general purpose data compression algorithm is applied to the output data stream to give meaningful comparisons.

Chapter 2

VIDEO COMPRESSION
The digital video compression technology has been boomed for many years. Today, when people chat with their friends through a visual telephone, when people enjoy the movie broadcasting through Internet or the digital music such as mp3, the convenience that the digital video industry brings to us cannot be forgotten. All of these should attribute to the enhancement on mass storage media or streaming video/audio services which has influenced our daily life deeply. In this project we will be implementing in simulink, Simulink is a platform for multidomain simulation and Model-Based Design for dynamic systems. It provides an interactive graphical environment and a customizable set of block libraries, and can be extended for specialized applications. This video compression using motion compensation and Discrete Cosine Transform (DCT) techniques with the Video and Video Processing Block set. The demo calculates motion vectors between successive frames and uses them to reduce redundant information. Then it divides each frame into sub matrices and applies the discrete cosine transform to each sub matrix. Finally, the demo applies a quantization technique to achieve further compression. The Decoder subsystem performs the inverse process to recover the original video.

2.1 Why is Digital Video Compressed?


Digital video is compressed because it takes up a staggering amount of room in its original form. By compressing the video, you make it easier to store. Digital video can be compressed without impacting the perceived quality of the final product because it affects only the parts of the video that humans can't really detect. Compressing video is essentially the process of throwing away data for things we can't perceive. Standard digital video cameras compress video at a ratio of 5 to 1, and there are formats that allow you to compress video by as much as 100 to 1. But too much compression can be a bad thing. The more you compress the more data you throw away. 8

Throw away too much, and the changes become noticeable. With heavy compression you can get video that's nearly unrecognizable. When you compress video, always try several compression settings. The goal is to compress as much possible until the data loss becomes noticeable and then notch the compression back a little. That will give you the right balance between file size and quality. And remember that every video is different.

2.2 Categories of Data Compression Algorithms


Two categories of data compression algorithm can be distinguished: lossless and lossy Lossy techniques cause image quality degradation in each compression/decompression step. Careful consideration of the human visual perception ensures that the degradation is often unrecognizable, though this depends on the selected compression ratio. In general, lossy techniques provide far greater compression ratios than lossless techniques. Here we'll discuss the roles of the following data compression techniques: 2.2.1 Lossless Coding Techniques Lossless coding guaranties that the decompressed image is absolutely identical to the image before compression. This is an important requirement for some application domains, e.g. medical imaging, where not only high quality is in demand, but unaltered archiving is a legal requirement. Lossless techniques can also used for the compression of other data types where loss of information is not acceptable, e.g. text documents and program executables. Some compression methods can be made more effective by adding a 1D or 2D delta coding to the process of compression. These deltas make more effective use of run length encoding, have (statistically) higher maxima in code tables (leading to better results in Huffman and general entropy coding), and build greater equal value areas usable for area coding. Some of these methods can easily be modified to be lossy. Lossy element fits perfectly into 1D/2D run length search. Also, logarithmic quantization may be inserted to provide better or more effective results. Run Length Encoding: Run length encoding is a very simple method for compression of sequential data. It takes advantage of the fact that, in many data streams, consecutive single tokens are often identical. Run length encoding checks the stream for this fact and inserts a 9

special token each time a chain of more than two equal input tokens are found. This special input advises the decoder to insert the following token n times into his output stream. The effectivity of run length encoding is a function of the number of equal tokens in a row in relation to the total number of input tokens. This relation is very high in undeterred two tone images of the type used for facsimile. Obviously, effectively degrades when the input does not contain too many equal tokens. With a rising density of information, the likelihood of two following tokens being the same does sink significantly, as there is always some noise distortion in the input. Run length coding is easily implemented, either in software or in hardware. It is fast and very well verifiable, but its compression ability is very limited. Huffman Encoding: This algorithm, developed by D.A. Huffman, is based on the fact that in an input stream certain tokens occur more often than others. Based on this knowledge, the algorithm builds up a weighted binary tree according to their rate of occurrence. Each element of this tree is assigned a new code word, whereat the length of the code word is determined by its position in the tree. Therefore, the token which is most frequent and becomes the root of the tree is assigned the shortest code. Each less common element is assigned a longer code word. The least frequent element is assigned a code word which may have become twice as long as the input token. The compression ratio achieved by Huffman encoding uncorrelated data becomes something like 1:2. On slightly correlated data, as on images, the compression rate may become much higher, the absolute maximum being defined by the size of a single input token and the size of the shortest possible output token (max. compression = token size[bits]/2[bits]). While standard palletized images with a limit of 256 colors may be compressed by 1:4 if they use only one color, more typical images give results in the range of 1:1.2 to 1:2.5. Entropy Coding: The typical implementation of an entropy coder follows J. Ziv/A. Lempel's approach. Nowadays, there is a wide range of so called modified Lempel/Ziv coding. These algorithms all have a common way of working. The coder and the decoder both build up an equivalent dictionary of met symbols, each of which represents a whole sequence of input tokens. If a sequence is repeated after a symbol was found for it, then only the symbol becomes part of the coded data and the sequence of tokens referenced by the symbol becomes part of the decoded data later. As the dictionary is build up based on the data, it is not necessary to put it into the coded data, as it is with the tables in a Huffman

10

coder. Entropy coders are a little tricky to implement, as there are usually a few tables, all growing while the algorithm runs. Area Coding: Area coding is an enhanced form of run length coding, reflecting the two dimensional character of images. This is a significant advance over the other lossless methods. For coding an image it does not make too much sense to interpret it as a sequential stream, as it is in fact an array of sequences, building up a two dimensional object. Therefore, as the two dimensions are independent and of same importance, it is obvious that a coding scheme aware of this has some advantages. The algorithms for area coding try to find rectangular regions with the same characteristics. These regions are coded in a descriptive form as an Element with two points and a certain structure. The whole input image has to be described in this form to allow lossless decoding afterwards. Practical implementations use recursive algorithms for reducing the whole area to equal sized sub rectangles until a rectangle does fulfill the criteria defined as having the same characteristic for every pixel. This type of coding can be highly effective but it bears the problem of a nonlinear method, which cannot be implemented in hardware. Therefore, the performance in terms of compression time is not competitive. 2.2.2 Lossy Coding Techniques In most of applications we have no need in the exact restoration of stored image. This fact can help to make the storage more effective, and this way we get to lossy compression methods. Lossy image coding techniques normally have three components: Image modeling which defines such things as the transformation to be applied to the image. Parameter quantization whereby the data generated by the transformation is quantized to reduce the amount of information. Encoding, where a code is generated by associating appropriate codeword to the raw data produced by the quantization. Each of these operations is in some part responsible of the compression. Image modeling is aimed at the exploitation of statistical characteristics of the image (i.e. high correlation, redundancy). Typical examples are transform coding methods, in which the data is represented in a different domain (for example, frequency in the case of the Fourier 11

Transform [FT], the Discrete Cosine Transform [DCT], the Kahrunen-Loewe Transform [KLT], and so on), where a reduced number of coefficients contains most of the original information. In many cases this first phase does not result in any loss of information. The aim of quantization is to reduce the amount of data used to represent the information within the new domain. Quantization is in most cases not a reversible operation: therefore, it belongs to the so called 'lossy' methods. Encoding is usually error free. It optimizes the representation of the information (helping sometimes to further reduce the bit rate), and may introduce some error detection codes. In the following sections, a review of the most important coding schemes for lossy compression is provided. Some methods are described in their canonical form (transform coding, region based approximations, fractal coding, wavelets, hybrid methods) and some variations and improvements presented in the scientific literature are reported and discussed. Transform Coding (DCT/Wavelets/Gabor): A general transform coding scheme involves subdividing an NxN image into smaller nxn blocks and performing a unitary transform on each sub image. A unitary transform is a reversible linear transform whose kernel describes a set of complete, ortho normal discrete basic functions. The goal of the transform is to decorate the original signal, and this declaration generally results in the signal energy being redistributed among only a small set of transform coefficients. In this way, many coefficients may be discarded after quantization and prior to encoding. Also, visually lossless compression can often be achieved by incorporating the HVS contrast sensitivity function in the quantization of the coefficients. Transform coding can be generalized into four stages: Image subdivision Image transformation Coefficient quantization Huffman encoding For a transform coding scheme, logical modeling is done in two steps: a segmentation one, in which the image is subdivided in bi dimensional vectors (possibly of different sizes) and a transformation step, in which the chosen transform (e.g. KLT, DCT, and Hadamard) is applied. Quantization can be performed in several ways. Most classical approaches use 12

'zonal coding', consisting in the scalar quantization of the coefficients belonging to a predefined area (with a fixed bit allocation), and 'threshold coding', consisting in the choice of the coefficients of each block characterized by an absolute value exceeding a predefined threshold. Another possibility, that leads to higher compression factors, is to apply a vector quantization scheme to the transformed coefficients. The same type of encoding is used for each coding method. In most cases a classical Huffman code can be used successfully. The JPEG and MPEG standards are examples of standards based on transform coding. Vector Quantization: A vector quantize can be defined mathematically as a transform operator T from a K-dimensional Euclidean space R^K to a finite subset X in R^K made up of N vectors. This subset X becomes the vector codebook or more generally, the codebook. Clearly, the choice of the set of vectors is of major importance. The level of distortion due to the transformation T is generally computed as the most significant error (MSE) between the "real" vector x in R^K and the corresponding vector x' = T(x) in X. This error should be such as to minimize the Euclidean distance d. An optimum scalar quantizer was proposed by Lloyd and Max. Later on Linde, Buzo and Gray resumed and generalized this method, extending it to the case of a vector quantizer. The LBG algorithm for the design of a vector codebook always reaches a local minimum for the distortion function, but often this solution is not the optimal one. A careful analysis of the LBG algorithm's behavior allows one to detect two critical points: the choice of the starting codebook and the uniformity of the Voronoi regions' dimensions. For this reason some algorithms have been designed that give better performances. With respect to the initialization of the LBG algorithm, for instance, one can observe that a random choice of the starting codebook requires a large number of iterations before reaching an acceptable amount of distortion. Moreover, if the starting point leads to a local minimum solution, the relative stopping criterion prevents further optimization steps. Segmentation and Approximation Methods: With segmentation and approximation coding methods, the image is modeled as a mosaic of regions, each one characterized by a sufficient degree of uniformity of its pixels with respect to a certain feature (e.g. grey level, texture); each region then has some parameters related to the characterizing feature associated with it. The operations of finding a suitable segmentation and an optimum set of approximating parameters are highly correlated, since the segmentation algorithm must 13

take into account the error produced by the region reconstruction (in order to limit this value within determined bounds). These two operations constitute the logical modeling for this class of coding schemes; quantization and encoding are strongly dependent on the statistical characteristics of the parameters of this approximation. For polynomial approximation regions are reconstructed by means of polynomial functions in (x, y); the task of the encoder is to find the optimum coefficients. In texture approximation, regions are filled by synthesizing a parameterized texture based on some model (e.g. fractals, statistical methods, Markov Random Fields [MRF]). It must be pointed out that, while in polynomial approximations the problem of finding optimum coefficients is quite simple (it is possible to use least squares approximation or similar exact formulations), for texture based techniques this problem can be very complex. Fractal Compression: It is a form of vector quantization and it is a lossy compression. Compression is performed by locating self-similar sections of an video, then using a fractal algorithm to generate the sections. Like DCT, discrete wavelet transform mathematically transforms an video into frequency components. The process is performed on the entire video, which differs from the other methods (DCT) that work on smaller pieces of the desired data. The result is a hierarchical representation of a video, where each layer represents a frequency band.

2.3 Compression Standards


MPEG stands for the Moving Picture Experts Group. MPEG is an ISO/IEC working group, established in 1988 to develop standards for digital audio and video formats. There are five MPEG standards being used or in development. Each compression standard was designed with a specific application and bit rate in mind, although MPEG compression scales well with increased bit rates. They include: MPEG-1: Designed for up to 1.5 Mbit/sec Standard for the compression of moving pictures and audio. This was based on CD-ROM video applications, and is a popular standard for video on the Internet, transmitted as .mpg files. In addition, level 3 of MPEG-1 is the most popular standard for digital compression of audio--known as MP3. MPEG-1 is the standard of compression for VideoCD, the most popular video distribution format throughout much of Asia.

14

MPEG-2: Designed for between 1.5 and 15 Mbit/sec Standard on which Digital Television set top boxes and DVD compression is based. It is based on MPEG-1, but designed for the compression and transmission of digital broadcast television. The most significant enhancement from MPEG-1 is its ability to efficiently compress interlaced video. MPEG-2 scales well to HDTV resolution and bit rates, obviating the need for an MPEG-3. MPEG-4: It is a Standard for multimedia and Web compression. MPEG-4 is based on object-based compression, similar in nature to the Virtual Reality Modeling Language. Individual objects within a scene are tracked separately and compressed together to create an MPEG4 file. This results in very efficient compression that is very scalable, from low bit rates to very high. It also allows developers to control objects independently in a scene, and therefore introduce interactivity JPEG: JPEG stands for Joint Photographic Experts Group. It is also an ISO/IEC working group, but works to build standards for continuous tone video coding. JPEG is a lossy compression technique used for full-color or gray-scale videos, by exploiting the fact that the human eye will not notice small color changes JPEG 2000 is an initiative that will provide an video coding system using compression techniques based on the use of wavelet technology.

2.4 Transforms
There are several common transforms being used in signal processing, such as the Discrete Fourier Transform (DFT), Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT) and some other as well. The (DCT) is the most common transform being used when processing videos and video. The (DWT) is used in the video compression standard JPEG2000, and will be used in this application as well. Both the (DCT) and (DWT) will be more thoroughly described. The basic idea of using transforms when processing for example a video is to decorrelate the pixels to one another. By doing so compression is achieved since the amount of redundant information is minimized. A transform can be seen as a projection onto orthonormal bases, separated in time and/or frequencies. By transforming a signal the energy is separated into sub bands. By describing each sub band with dierent precisions, higher precision within high energy sub bands and less precision in low energy sub bands, the signal can be compressed.

15

1/N C (k, n) = Ncos ((2n+1) k/2N)

k=0, 0 n N-1 1 k N-1, 0 n N-1

To transform a matrix Y the transform matrix, c, is multiplied with with Y and gives the transformed matrix X =CY. The Cosine transform is real-valued and orthogonal which means that X has the properties as in X=X* X =XT The DCT is also excellent in energy compaction which means that the energy of the matrix is concentrated to a small region of the transformed matrix. It has also good decorrelation properties. These properties are very suitable for video and video processing and are therefore widely used,(i.e. JPEG, MPEG and H.263.). The two-dimensional DCT of Figure 2.1(a) can be seen. Note that the compaction of energy is concentrated to the upper left corner.

2.5 DCT and Video Compression


In the JPEG video compression algorithm, the input video is divided into 8-by-8 or 16-by16 blocks, and the two-dimensional DCT is computed for each block. The DCT coefficients are then quantized, coded, and transmitted. The JPEG receiver (or JPEG file reader) decodes the quantized DCT coefficients, computes the inverse two-dimensional DCT of each block, and then puts the blocks back together into a single video. For typical videos, many of the DCT coefficients have values close to zero; these coefficients can be discarded without seriously affecting the quality of the reconstructed video. The example code below computes the two-dimensional DCT of 8-by-8 blocks in the input video, discards (sets to zero) all but 10 of the 64 DCT coefficients in each block, and then reconstructs the video using the two-dimensional inverse DCT of each block. The transform matrix computation method is used. I = imread ('cameraman.tif'); I = im2double (I); T = dctmtx (8); B = blkproc (I,[8 8],'P1*x*P2',T,T'); 16

Mask =

B2 = blkproc (B,[8 8],'P1.*x',mask); I2 = blkproc (B2,[8 8],'P1*x*P2',T',T); Although there is some loss of quality in the reconstructed video, it is clearly recognizable, even though almost 85% of the DCT coefficients were discarded. To experiment with discarding more or fewer coefficients, and to apply this technique to other videos, try running the demo function DCT demo.

2.6 General Description


The Discrete Cosine Transform (DCT) is a technique that converts a spatial domain waveform into its constituent frequency components as represented by a set of coefficients. The process of reconstructing a set of spatial domain samples is called the Inverse Discrete Cosine Transform (IDCT). For data compression of video/video frames, usually a block of data is converted from spatial domain samples to another domain (usually frequency domain), which offers more compact representation. The optimal transform is the Karhunen-Loeve Transform (KLT), as it packs most of the block energy into a fewer number of frequency domain elements, it minimizes the total entropy of the block, and it completely de-correlates its elements. However, its main Disadvantage is that its basis functions are video-dependent. This complicates the digital implementation. The Discrete Cosine Transform introduced by Ahmed in 1974, has the next best performance in compaction Efficiency, while also having video-independent basis functions. Hence DCT is used to provide the necessary transform and the resultant data is then compressed using quantization and various coding techniques to offer lossless as well as lossy compression.

17

Chapter 3

SINGULAR VALUE DECOMPOSITION (SVD)


Singular Value Decomposition (SVD) is said to be a significant topic in linear algebra by many renowned mathematicians. SVD has many practical and theoretical values; Special feature of SVD is that it can be performed on any real (m, n) matrix. In this presentation we will demonstrate how to use Singular Value Decomposition (SVD) to factorize and approximate large matrices, specifically images. Singular value decomposition takes a rectangular mn matrix A and calculates three matrices U, S, and V. S is a diagonal mn matrix (the same dimensions as A). U and V are unitary or orthogonal matrices with sizes mm and nn respectively. The matrices are related by the equation A=USVH Calculating the SVD consists of finding the Eigen values and eigenvectors of AAH and AHA. The eigenvectors of AHA make up the columns of V; the eigenvectors of AAH make up the columns of U. The eigen values of AHA or AAH are the squares of the singular values for A. The singular values are the diagonal entries of the S matrix and are arranged in descending order. The singular values are always real numbers. If the matrix A is a real matrix, then U and V are also real. Equation (2) can be expressed as

The matrix A can be approximated by matrix with rank k b

The matrix U contains one orthonormal basis. U is also known as the left singular vectors. The matrix V contains another orthonormal basis. V is also known as the right singular vectors. The diagonal matrix S contains the singular values.

18

3.1 Factoring V and S


First we will find V. To eliminate U from the equation A = USV T you simply multiply on the left by AT : ATA= (USVT) T (USVT)= VSTU T USVT Since U is an orthogonal matrix, UTU = I which gives ATA= VS2V T Notice that this is similar to the digitalization of a matrix A, where A = QQT . But now the symmetric matrix is not A it is ATA. To find V and S we need to diagonalize ATA by finding the eigenvalues and eigenvectors. The eigenvalues are the square of the elements of S (the singular values) and the eigenvectors are the columns of V (the right singular vectors).

3.2 Factoring U
Eliminating V from the equation is very similar to eliminating U. Instead of multiplying on the left by AT we will multiply on the right by AT.This gives A AT= (USVT) (USVT) T = USVT VSTU T Since V TV = I, this gives A AT =VS2V T Again we will find the eigenvectors, but this time for AAT. These are the columns of U (the left singular vectors).

3.3 Properties of the SVD


There are many properties and attributes of SVD, here we just present parts of the properties that we used in this project. The singular value 1, 2.n are unique, however, the matrices U and V Since ATA = VS T SV T, so V diagonalizes ATA, it follows that the vi s are Since AAT = USS T U T, so it follows that U diagonalizes AAT and that the are not unique; the Eigenvector of ATA. uis are the eigenvectors of AAT.

19

If A has rank of r then vj, vj, , vr form an orthonormal basis for range space The rank of matrix A is equal to the number of its nonzero singular values.

of AT ,R(AT ), and uj, uj, , ur form an orthonormal basis for .range space A, R(A).

3.4 Using SVD for Image Compression


Image compression deals with the problem of reducing the amount of data required to represent a digital image. Compression is achieved by the removal of three basic data redundancies: Coding redundancy, which is present when less than optimal; Interpixel redundancy, which results from correlations between the pixels; Psycho visual redundancies, which is due to data that is ignored by the human visual. To illustrate the SVD image compression process, consider the following equation

That is A can be represented by the outer product expansion: A=S1U1V1T + S2U2V2T +..+SrUrVrT When compressing the image, the sum is not performed to the very last SVs; the SVs with small enough values are dropped. (Remember that the SVs are ordered on the diagonal.) The closet matrix of rank k is obtained by truncating those sums after the first k terms: A=S1U1V1T + S2U2V2T +..+SkUkVkT The total storage for Ak will be K (m+n+1). The integer k can be chosen confidently less then n, and the digital image corresponding to Ak still have very close the original image. However, choosing the different k will have a different corresponding image and storage for it. For typical choices of the k, the storage required for Ak will be less the 20 percentage.

20

3.5 Splitting Image into Smaller Blocks


The SVD process has order n3, which makes it very slow for large pictures. However if the picture is broken up into smaller sections and each handled separately, the overall processing time is much lower. This is not a trade-off situation and in fact, as will be seen, is necessary for good rates of compression. The key to SVD compression is using low rank approximations to the image. The less complicated an image the lower the rank necessary to accurately represent it. For example a picture that is a single color block can be perfectly represented by a rank 1 SVD. Let X be an nn matrix with every value as some constant c R. Then X=cjjT Where j=(1,1,,1)T now if u=v=1/((n) and s=cn it can be seen that X=cijT=usvT A realistic photo (for example take a figure shown below) however is generally complex overall, but may contain sections of simpler images. With the test image Frogrock there are areas with simple images. For example the sky has very little detail. Also the side of the hut is fairly monotone. While other sections such as the person are more complicated. It would make sense to break up this photo so that the simple sections can be represented with low rank approximations, while the complicated sections have higher rank to include the detail.

Figure 3.1: Frog Rock Test Image A human can quickly look at a photograph and isolate the sections of high detail from low detail. However this can be a difficult task for a computer, requiring a lot of 21

processing. Ideally the picture would be perfectly split into separate regions based on complexity, but in practice this would be too time consuming and require too much overhead information to keep track of the regions. A simple approach is to break the image into smaller blocks of the same size. Although the blocks wont perfectly align with the different regions of complexity, if there are enough blocks then the blocks will generally match regions of complexity. This is the approach used by JPEG; pictures are divided into blocks of 8 8 (the JPEG specification allows block sizes of 16 16 but this is rarely used). The second approach used in this project is to have adaptive block sizes. Initially the picture is broken up into a series of large blocks. Then each block is split into four quarter size blocks. If less storage is required when the block is split into quarters, then these new blocks are accepted, otherwise the original block is left. This process can be repeated on the new blocks, getting smaller and smaller each time.

3.6 Adaptive Block Sizes


For a given rank, the larger the block size the more efficient the storage. For example, a 100 100 matrix approximated with rank k requires (100 + 100) k = 200k elements. If the matrix was split into four 5050 matrices also approximated with rank k, then each matrix would require (50+50)k = 100k elements. Therefore all four of them would require 400k elements, which is twice the amount required for the single block. However it is to be hoped the smaller blocks are simpler and require a lower rank to represent them. An n n block with rank k requires 2nk elements. If this block is split into four quarter blocks of size 1/2n 1/2n, with each sub-block having rank k1, k2, k3, k4 respectively, the number of storage elements is n (k1+k2+k3+k4) So the decision as to whether a block is to be subdivided should be based on n (k1+k2+k3+k4) < 2nk k1+k2+k3+k4 <2k Unfortunately, to calculate the values k1, k2, k3, k4, the block has to be divided and a SVD applied to each sub-block. As a result many more SVDs are performed than are used in the output. This results in a much slower compression time, however does not affect the

22

speed of decompression. The advantage of this adaptive block size technique is that it can better map the regions of complexity of the picture.

3.7 Mixed Mode


With the dividing block technique we could start with a block the size that fits the whole picture. The block size needs to be chosen so that it evenly divides into quarters at each step. So preferably the dimensions should be a power of 2. The picture is therefore padded out so that the whole image is a square with the extra pixels being set to zero. The zero sections of the picture will not require much storage, and in fact any sub square consisting entirely of zeros will have rank 0. This technique has the disadvantage that the compression can take a very long time, as the first few SVDs to perform are on very large blocks. When this technique was used on the test images, none of the first few blocks were accepted. The blocks had to be reduced to a small enough size before the algorithm determined that it was not worth subdividing further. A combination of the fixed block and the subdividing block techniques solved this problem. First the picture is divided into moderate size fixed blocks. Then the adaptive block size technique is applied to each fixed block separately. So there is an upper limit on the block sizes which saves a lot of unnecessary processing time. Similarly the algorithms never accepted a block size that was too small, so a lower limit on the block size could be used. The test images used, sensible upper and lower bounds were found to be 6464 and 88. Therefore only four block sizes were allowed: 6464, 3232, 16 16, and 88. So the extra processing required for rejected blocks was not too great.

3.8 Picture Quality


Image Compression Measures: In order to measure the performance of the SVD image compression method, we can computer the compression factor and the quality of the compressed image. Image compression factor can be computed using the Compression ratio CR = m*n/ (k (m + n + 1)) Measuring Picture Quality: The original image is represented by the matrix A. The approximating image is matrix . It is necessary to have a measure of image quality. 23

Unfortunately image quality as perceived by the eye is a very subjective measurement. A human can quickly look at an image and determine that the quality is acceptable or not acceptable but it is difficult to mathematically represent this. The most common measurement used in image processing is the Peak to Peak Signal to Noise Ratio (PSNR) measured in decibels (db). Although not a great model of the human eye, it is simple to calculate. PSNR = 10log10 ((max range)2/RMSE)

RMSE= Max range is the allowed value range of the pixels. For convenience pixels will be in the range [0.. 1]. Hence max range = 1. RMSE is the Root Mean Square Error. Higher Order SVD: Tensor decomposition was studied in psychometric data analysis during the 1960s, when data sets having more than two dimensions (generally called threeway data sets) became widely used. A fundamental achievement was brought by Tucker (1963), who proposed to decompose a 3-D signal using directly a 3-D principal component analysis (PCA) instead of unfolding the data on one dimension and using the standard SVD. This three-way PCA is also known as Tucker3 decomposition. In the 1980s, such multidimensional techniques were also applied to chemometrics analysis. The signal processing community only recently showed interest in the Tucker3 decomposition. The work of Lathauwer et al. (2000) proved that this decomposition is a multilinear generalization of the SVD to multidimensional data. Studying its properties with a notation more familiar to the signal processing community, the authors highlighted its properties concerning the rank, oriented energy, and best reduced-rank approximation. As the decomposition can have higher dimensions than 3, they called it higher order SVD (HOSVD). In the following, we consider the notation of and define the HOSVD decomposition. Multiple-Level Decomposition: The decomposition process can be iterated, with successive approximations being decomposed in turn, so that one signal is broken down into many lower resolution components. This is called the wavelet decomposition tree. 24

S cA1 cA2 cD2 cD1

cA3

cA4 Figure 3.2: Multiple Level Decomposition

Number of Levels: Since the analysis process is iterative, in theory it can be continued indefinitely. In reality, the decomposition can proceed only until the individual details consist of a single sample or pixel. In practice, youll select a suitable number of levels based on the nature of the signal, or on a suitable criterion such as entropy. Recently, the parametric model proposed by Doretto et al. was shown to be a valid approach for analysis/synthesis of dynamic textures. Each video frame is unfolded into a column vector and constitutes a point that follows a trajectory as time evolves. The analysis consists in finding an appropriate space to describe this trajectory and in identifying the trajectory using methods of dynamical system theory. The first part is done by using singular value decomposition (SVD) to perform dimension reduction to a lower dimensional space. The point trajectory is then described using a multivariate autoregressive (MAR) process of order 1. Dynamic textures are, thus, modeled using a linear dynamic system and synthesis is obtained by driving this system with white noise. In this model, the SVD exploits the temporal correlation between the video frames but the unfolding operations prevent the possibility of exploiting spatial and chromatic correlations. We use the parametric approach of but perform the dynamic texture analysis with a higher order SVD, which permits to simultaneously decompose the temporal, spatial, and chromatic components of the video sequence. This approach was proposed by the authors in [10] and here it is described in detail. Our scheme is depicted in Fig. 1. SVD in the analysis is substituted by HOSVD.

25

Figure 3.3: Schematic Representation of the Tensor-Based Linear Model Approach for Analysis and Synthesis. HOSVD is an extension of the SVD to higher order dimensions. It is not an optimal tensor decomposition in the sense of least squares data fitting and has not the truncation property of the SVD, where truncating the first singular values permits to find the best -rank approximation of a given matrix. Despite this, the approximation obtained is not far from the optimal one and can be computed much faster. In fact, the computation of HOSVD does not require iterative alternating least squares algorithms, but needs standard SVD computation only. The major advantage of the HOSVD is the ability of simultaneously considering the spatial, temporal, and chromatic correlations. This allows for a better data modeling than a standard SVD, since dimension reduction can be performed not only in the time dimension but also separately for spatial and chromatic content. The separate analysis of each signal component allows adapting the signal compression given by the dimension reduction to the characteristics of each dynamic texture. For comparable visual synthesis quality, we, thus, obtain a number of model coefficients that is on average five times smaller than those obtained using standard SVD. 26

Creating more compact models is also addressed in, where dynamic texture shape and visual appearance are jointly addressed, thus enabling the modeling of complex video sequences containing sharp edges. Their and our approach is both characterized by a more computationally expensive analysis, but also a fast synthesis. In our case, synthesis can be done in real-time. This makes our technique very appropriate for applications with memory constraints, such as mobile devices. We believe that HOSVD is a very promising technique for other video analysis and approximation applications. Recently, it has been successfully used in image based texture rendering, face super resolution, and in face analysis and recognition. In the framework of video compression and transmission, it is useful to find a way to analyze/synthesize dynamic textures. An efficient compression, in fact, would open the possibility of having access to realistic video animations on devices that have strong constraints in the available bandwidth. This is the case of mobile phones, for instance. The approaches used to model dynamic textures can be classified into non-parametric and parametric. In the first case, the analysis and synthesis is conducted directly from a given representation of the image (the pixel values or a description in an transformed domain obtained using certain bases, as wavelets for instance). In the second case, researchers aim to describe the dynamic texture using dynamical models. An interesting approach is to consider a linear dynamic model (LDS). In fact, if some simplifications are considered, a close solution for the estimation of the model's parameters can be found for such systems. Unfortunately, the synthesized sequences obtained using this method are not visually appealing, if compared to the original sequence, where periodicity (oscillation) has been introduced in the model by forcing the poles of the dynamic system to lay on the unit circle. This solution permits to obtain more realistic sequences, but still is based on the same assumptions used for the construction. A dynamic texture can be considered as a multidimensional signal. In the case of a grayscale image video, it can be represented with a 3-D tensor by assigning spatial information to the first two dimensions and time to the third. In a color video sequence, chromatic components add another dimension. The input signal then becomes 4-D. The analysis is done by first decomposing the input signal using the HOSVD and then by considering the orthogonal matrix derived from the decomposition along the time dimension. This matrix contains the dynamics of the video sequences, since its columns, 27

ordered along the time axis, correspond to the weights that control the appearance of the dynamic texture as time evolves.

3.9 Dynamic Texture Synthesis


Dynamic textures are textures that change over time. Videos including fire, water, smoke, and so on, are typical example of dynamic textures. Dynamic textures synthesis is the process of creating artificial textures. This can be achieved starting either from a description (model) of a physical phenomenon or from existing video sequences. The first approach is called physics-based and leads to a description of the dynamic texture that usually requires few parameters. This approach has been extensively adopted for the reproduction of synthetic flames or fire, since they are often used in gaming applications or digital movies. Even though parameter tuning is not always straightforward, the synthesis results are impressive, but computationally extremely expensive. This limits the use of this type of model to cases where synthesis can be done offline, such as during editing in the movie making process. The second approach is called image-based, as it does not aim at modeling the physics underlying the natural process, but at replicating existing videos. This can be done in two ways. In the first, synthesis is done by extracting different clips from the original video and patching them together to obtain a longer video, ensuring that the temporal joints are not noticeable and that the dynamic appearance is maintained. This type of synthesis is called nonparametric or patch-based, since it is not based on a model and reduces the synthesis to a collage of patches. It has the advantage of ensuring high visual quality because the synthetic video is composed of the original video frames, marginally modified by morphing operations only along clips discontinuities. However, the entire synthetic video has to be created in one step and stored in memory, thus not allowing for on-the-fly synthesis. In addition, this technique is not flexible, since it permits to modify the appearance of single frames, but not the texture dynamics. In the second, a parametric image-based approach is used to build a model of dynamic textures. The dynamic texture is analyzed and model parameters are computed. The visual quality of the synthesis is generally less good than for patch-based techniques, but the parametric approach is more flexible, more compact in terms of memory

28

occupation, and usually permits on-the-fly synthesis. Moreover, it can also be used for other applications, such as segmentation, recognition, and editing. The term specificity indicates if a given approach is specific to a certain type of dynamic texture, such as fire, water, or smoke, or can be used for all kinds of dynamic textures. The term flexibility indicates if the characteristics of the generated texture can easily be changed during the synthesis. The physics-based approaches have high flexibility, but also high specificity, since a model for fire cannot be used for the generation of water or smoke, for instance. They have high flexibility since the visual appearance of the synthetic texture can be modified by tuning the model parameters

3.10 Tensor
Tensor is a general name for multilinear mappings over a set of vector spaces, i.e. a vector is a 1mode tensor, a matrix is a 2mode tensor. The tensor T is an N mode tensor where the dimensionality of the mode i is di. In the same way as a matrix can be premultiplied (mode 1 multiplication) or post-multiplied (mode2 multiplication) with another matrix, a matrix can be multiplied with a higher order tensor with respect to different modes. The mode multiplication of a matrix MIndn with a tensor T is denoted as TnM and results in a tensor U with the same number of modes. The elements of the tensor U is computed in the following way: Ud1....dn1indn+1...dN =Xdn td1...dN mindn Tensor Decomposition: Principal Component Analysis (PCA) is a version of Singular Value Decomposition (SVD) which is a 2mode tool, commonly used in signal processing to reduce the dimensionality of the space and reduce noise. SVD decomposes a matrix into three other matrices, such that: A = US VT Where, the matrix U spans the row space of A, the matrix V spans the column space of A and S is a diagonal matrix of singular values. The column eigenvectors vectors of matrices U (likewise for V ) are orthonormal to each other, describing a new orthonormal coordinate system for the space spanned by matrix A. Nmode SVD or Higher Order SVD (HOSVD) is a generalization of the matrix SVD to tensors. It decomposes a tensor T , by

29

orthogonolazing its modes, yielding a core tensorand matrices spanning the vector spaces in each mode of the tensor, i.e.: T = S 1 U1 2 U2.... n Un The tensor S is called the core tensor and is analogous to the diagonal singular value matrix in the traditional SVD. However, for HOSVD, the tensor S is not a diagonal tensor but coordinates the interaction of matrices to produce the original tensor. Matrices Ui are again orthonormal and the column vectors of Ui spans the space of tensor T , flattened with respect to mode i. The row vectors of Ui are the coefficient sets describing each dimension in mode i. These coefficients can be thought as the coefficients extracted from PCA but there are different sets of coefficients for each mode in HOSVD analysis. Dimensionality Reduction: After decomposing the original data tensor to yield the core tensor and mode matrices, we are able to reduce the dimensionality with respect to the mode we want, unlike PCA where the dimensionality reduction is only based on the variances. By reducing the number of dimensions in one mode and keeping the other intact, we can have more control over the noise reduction, classification accuracies and complexity of the problem. The dimensionality reduction is achieved by deleting the last mcolumn vectors from the desired mode matrix and deleting the corresponding m hyper planes from the core tensor. It is also defined that the error after dimensionality reduction is bounded by the Frobeniusnorm of the hyperplanes deleted from the core tensor.

Chapter 4

INTRODUCTION to MATLAB
30

MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Typical uses include Math and computation Algorithm development Data acquisition Modeling, simulation, and prototyping Data analysis, exploration, and visualization Scientific and engineering graphics Application development, including graphical user interface building. MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. This allows you to solve many technical computing problems, especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar non interactive language such as C or FORTRAN. The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects. Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the state of the art in software for matrix computation. MATLAB has evolved over a period of years with input from many users. In university environments, it is the standard instructional tool for introductory and advanced courses in mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-productivity research, development, and analysis. MATLAB features a family of add-on application-specific solutions called toolboxes. Very important to most users of MATLAB, toolboxes allow you to learn and apply specialized technology. Toolboxes are comprehensive collections of MATLAB functions (M-files) that extend the MATLAB environment to solve particular classes of problems. Areas in which toolboxes are available include signal processing, control systems, neural networks, fuzzy logic, wavelets, simulation, and many others.

31

4.1 The MATLAB System


The MATLAB system consists of five main parts Development Environment Matlab Mathematical Function Matlab Language Graphics Matlab Application Program Interface Development Environment: This is the set of tools and facilities that help you use MATLAB functions and files. Many of these tools are graphical user interfaces. It includes the MATLAB desktop and Command Window, a command history, an editor and debugger, and browsers for viewing help, the workspace, files, and the search path. The MATLAB Mathematical Function: This is a vast collection of computational algorithms ranging from elementary functions like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like matrix inverse, matrix eigen values, Bessel functions, and fast Fourier transforms. The MATLAB Language: This is a high-level matrix/array language with control flow statements, functions, data structures, input/output, and object-oriented programming features. It allows both "programming in the small" to rapidly create quick and dirty throwaway programs, and "programming in the large" to create complete large and complex application programs. Graphics: MATLAB has extensive facilities for displaying vectors and matrices as graphs, as well as annotating and printing these graphs. It includes high-level functions for twodimensional and three-dimensional data visualization, image processing, animation, and presentation graphics. It also includes low-level functions that allow you to fully customize the appearance of graphics as well as to build complete graphical user interfaces on your MATLAB applications. The MATLAB Application Program Interface (API): This is a library that allows you to write C and FORTRAN programs that interact with MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking), calling MATLAB as a computational engine, and for reading and writing MAT-files.

32

4.2 MATLAB Working Environment


Getting Help Using the Matlab Editor to Create M-Files MATLAB Desktop MATLAB Desktop: Matlab Desktop is the main Matlab application window. The desktop contains five sub windows, the command window, the workspace browser, the current directory window, the command history window, and one or more figure windows, which are shown only when the user displays a graphic. The command window is where the user types MATLAB commands and expressions at the prompt (>>) and where the output of those commands is displayed. MATLAB defines the workspace as the set of variables that the user creates in a work session. The workspace browser shows these variables and some information about them. Double clicking on a variable in the workspace browser launches the Array Editor, which can be used to obtain information and income instances edit certain properties of the variable. The current directory tab that is present above the workspace tab shows the contents of the current directory, whose path is shown in the current directory window. For example, in the windows operating system the path might be as follows: C:\MATLAB\Work, indicating that directory work is a subdirectory of the main directory MATLAB; which is installed in Drive C. clicking on the arrow in the current directory window shows a list of recently used paths. Clicking on the button to the right of the window allows the user to change the current directory. MATLAB uses a search path to find M-files and other MATLAB related files, which are organize in directories in the computer file system. Any file run in MATLAB must reside in the current directory or in a directory that is on search path. By default, the files supplied with MATLAB and math works toolboxes are included in the search path. It is the easiest way to see which directories are on the search path. The easiest way to see which directories are soon the search paths, or to add or modify a search path, is to select set path from the File menu the desktop, and then use the set path dialog box. It is good practice to add any commonly used directories to the search path to avoid repeatedly having the change the current directory. 33

The Command History Window contains a record of the commands a user has entered in the command window, including both current and previous MATLAB sessions. Previously entered MATLAB commands can be selected and re-executed from the command history window by right clicking on a command or sequence of commands. This action launches a menu from which to select various options in addition to executing the commands. This is useful to select various options in addition to executing the commands. This is a useful feature when experimenting with various commands in a work session. Using the Matlab Editor to Create M-Files: The MATLAB editor is both a text editor specialized for creating M-files and a graphical MATLAB debugger. The editor can appear in a window by itself, or it can be a sub window in the desktop. M-files are denoted by the extension .m, as in pixelup.m. The MATLAB editor window has numerous pull-down menus for tasks such as saving, viewing, and debugging files. Because it performs some simple checks and also uses color to differentiate between various elements of code, this text editor is recommended as the tool of choice for writing and editing M-functions. To open the editor, type edit at the prompt opens the M-file filename.m in an editor window, ready for editing. As noted earlier, the file must be in the current directory, or in a directory in the search path. Getting Help: The principal way to get help online is to use the MATLAB help browser, opened as a separate window either by clicking on the question mark symbol (?) on the desktop toolbar, or by typing help browser at the prompt in the command window. The help Browser is a web browser integrated into the MATLAB desktop that displays a Hypertext Markup Language (HTML) documents. The Help Browser consists of two panes, the help navigator pane, used to find information, and the display pane, used to view the information. Self-explanatory tabs other than navigator pane are used to perform a search.

4.3 Commands
Uigetfile: Open standard dialog box for retrieving files Description 34

Uigetfile displays a modal dialog box that lists files in the current directory and enables the user to select or type the name of a file to be opened. If the filename is valid and if the file exists, uigetfile returns the filename when the user clicks Open. Otherwise uigetfile displays an appropriate error message from which control returns to the dialog box. The user can then enter another filename or click Cancel. If the user clicks Cancel or closes the dialog window, uigetfile returns 0. Aviinfo: Information about Audio/Video Interleaved (AVI) file Description Fileinfo = aviinfo (filename) It returns a structure whose fields contain information about the AVI file specified in the string filename. If filename does not include an extension, then .avi is used. The file must be in the current working directory or in a directory on the MATLAB path. Aviread: Read Audio/Video Interleaved (AVI) file Description Mov = aviread (filename) reads the AVI movie filename into the MATLAB movie structure mov. If filename does not include an extension, then .avi is used. Use the movie function to view the movie mov. frame2im: Convert movie frame to indexed image Description [X, Map] = frame2im (F) converts the single movie frame F into the indexed image X and associated colormap Map. The functions getframe and im2frame create a movie frame. If the frame contains true-color data, then Map is empty.

im2frame: Convert image to movie frame Description

35

f = im2frame(X, map) converts the indexed image X and associated colormap map into a movie frame f. If X is a truecolor (m-by-n-by-3) image, then map is optional and has no effect. Imwrite: Write image to graphics file Description Imwrite(X, map, filename, fmt) writes the indexed image in X and its associated colormap map to filename in the format specified by fmt. If X is of class uint8 or uint16, imwrite writes the actual values in the array to the file. If X is of class double, imwrite offsets the values in the array before writing, using uint8(X1). Map must be a valid MATLAB colormap. Note that most image file formats do not support colormaps with more than 256 entries.When writing multiframe GIF images, X should be an 4-dimensional M-by-N-by-1by-P array, where P is the number of frames to write. Imread: Read image from graphics file Description A = imread (filename, fmt) reads a grayscale or color image from the file specified by the string filename. If the file is not in the current directory, or in a directory on the MATLAB path, specify the full pathname. Movie: Play recorded movie frames Description Movie plays the movie defined by a matrix whose columns are movie frames (usually produced by getframe).movie (M) plays the movie in matrix M once, using the current axes as the default target. If you want to play the movie in the figure instead of the axes, specify the figure handle (or gcf) as the first argument: movie (figure_handle...). M must be an array of movie frames (usually from getframe).

Chapter 5

RESULTS and CONCLUSION


36

By Higher Order SVD Analysis for Dynamic Texture Synthesis Videos like Flame, Pond & Grass are given as input, so that the obtained Output video is 3 times compressed of Input video.

Figure 5.1: Output Frame for Input Flame Video Description: It is one of the output frames that are obtained from the given input video after compression. The following are the parameters that are obtained from the compressed video. input_file_size = output_file_size = Compression = 20505600 6835200 3

Output file size is 3 times compression of input file size Compression _ratio = 0.3333 compression_ratio = 33.3333 Psnr = 37.2036

37

Figure 5.2: Output Frame for Input Pond Video Description: It is one of the output frames that is obtained from the given input video after compression. The following are the parameters that are obtained from the compressed video. input_file_size = output_file_size = Compression = 3 0.3333 39744000 13248000

Output file size is 3 times compression of input file size compression_ratio = Psnr = 40.8908 compression_ratio = 33.3333

38

Figure 5.3: Output Frame for Input Grass Video Description: Figure 5.3 is one of the output frames that are obtained from the given input video after compression. The following are the parameters that are obtained from the compressed video. input_file_size = output_file_size = Compression = 3 0.3333 9676800 3225600

Output file size is 3 times compression of input file size compression_ratio = Psnr = 45.4285 compression_ratio = 33.3333

39

Conclusion
Here it is proposed to decompose the multidimensional signal that represents a dynamic texture by using a tensor decomposition technique. As opposed to techniques that unfold the multi dimensional signal on a 2-D matrix, our method analyzes data in their original dimensions. This decomposition, only recently used for applications in image and video processing, permits to better exploit the spatial, temporal, and chromatic correlation between the pixels of the video sequence, leading to an important decrease in model size. Compared to algorithms where the unfolding operations are performed in 2-D or where the spatial information is exploited by considering the analysis in the Fourier domain, this method results in models with on average five times less coefficients, still ensuring the same visual quality. Despite being a suboptimal solution for the tensor decomposition, the HOSVD ensures close-to-optimal energy compaction and approximation error. The sub optimality derives from the fact that the HOSVD is computed directly from the SVD, without using expensive iterative algorithms, such as done for the optimal solution. This is an advantage, since the analysis can be done faster and with less computational power. The few model parameters permit to perform synthesis in real-time. Moreover, the small memory occupancy favours the use of the HOSVD based model in architectures characterized by constraints in memory and computational power complexity, such as PDAs or mobile phones.

40

APPENDIX
Source Code
clear all; clc; [filename, pathname]=uigetfile('*.avi'); str2='.bmp'; file=aviinfo(filename); for i=1:frm_cnt frm(i)=aviread(filename,i); frm_name=frame2im(frm(i)); filename1=strcat(strcat(num2str(i)),str2); imwrite(frm_name,filename1); end str3='.png'; for j=1:frm_cnt filename_1=strcat(strcat(num2str(i)),str2); D=imread(filename_1); [u1,s1,v1]=svd(double(filename_1)); im = (u1 * s1 * transpose(v1)); file_2=strcat(strcat(num2str(j)),str3); imwrite(im,file_2); end for k=1:frm_cnt file_2=strcat(num2str(k),'.bmp'); v=imread(file_2); [Y, map] = rgb2ind(v,255); F(:,:,k)=im2frame(flipud(Y),map); save F F end 41 % Write image file % read the Video file % to get inforamtaion abt video file frm_cnt=file.NumFrames ; % No.of frames in the video file

mov=aviread(filename); [h, w, p] = size(mov(1).cdata); hf = figure('Name','INPUT VIDEO '); set(hf, 'position', [150 150 w h]); movie(gcf,mov); [h, w, p] = size(F(1).cdata); hf = figure('Name','HOSVD COMPRESSED VIDEO '); set(hf, 'position', [150 150 w h]); movie(gcf,F); input_file_size = frm_cnt * size(frm(1).cdata,1)* size(frm(1).cdata,2) * size(frm(1).cdata,3) output_file_size=frm_cnt * size(F(1).cdata,1)* size(F(1).cdata,2) * size(F(1).cdata,3) compression = (input_file_size / output_file_size) fprintf('output file size is %d times compression of input file size',compression); compression_ratio = output_file_size/input_file_size compression_ratio = compression_ratio * 100 mse=(sum(mov(1).cdata(:,:,1)-F(1).cdata).*sum(mov(1).cdata(:,:,1)F(1).cdata))/input_file_size; psnr=20*log10(255/sqrt(max(mse)))

REFERENCES
42

[1] Doretto.G, Chiuso. A, Wu.Y, and Soatto.S, Dynamic textures, Int.J. Comput. Vis., vol.51, no.2, pp. 91109, 2003. [2] Doretto.G, Cremers.D, Favaro.P, and Soatto.S, Dynamic texture segmentation, in Proc. IEEE Int. Conf. Image Processing, 2003, pp.12361242. [3] Kwatra.V, Schdl.A, Essa.I, Turk.G, and Bobick.B, Graphcut textures: Image and Video synthesis using graph cuts, in Proc. Siggraph, 2003, pp. 277286. [4] Rafael C. Gonzalez, Richard E.Woods. Digital Image Processing Second Edition [5] Schdl, Szeliski.R, Salesin.D, and Essa.I, Video textures, in Proc .ACM Siggraph, 2000, pp. 48998. URLS

http://rr.epfl.ch/15/3/04389813.pdf http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4389813 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=212766 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5346016 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1639573

43

You might also like