You are on page 1of 35

Extraction of Motion Vectors from an MPEG Stream

Technical report 1999

JOSEPH GILVARRY School of Electronic Engineering Dublin City University

Abstract
In 1997, a project was started to capture, compress, store, and index up to 24 hours of digital TV broadcasts. The work in this report is to help implement this. In the first chapter of this report, the overall project is introduced and also the motivation behind particular focus of work. The second chapter deals with the theory behind digital video compression. In the third chapter a report is given on how the program to extract the motion vectors from the MPEG stream was developed. It also reports on the further development of the program, so the motion from frame to frame can be calculated. Chapter four explains why knowledge of the motion vectors is not sufficient information to calculate the motion from frame to frame. It gives the extra information that is needed and how all the information is used to calculate the motion from frame to frame.

ii

Table Of Contents
Abstract................................ ................................ ................................ ............................... ii Table Of Contents ................................ ................................ ................................ .............. iii Table of Figures................................ ................................ ................................ .................. iv Chapter 1................................ ................................ ................................ ............................. 1 1. Introduction................................ ................................ ................................ ..................... 1 Chapter 2................................ ................................ ................................ ............................. 3 2. Digital Video Compression................................ ................................ .............................. 3 2.1 The MPEG-1 bit stream ................................ ................................ ......................... 3 2.1.1 Description of a frame................................ ................................ ..................... 5 2.1.2 Bit stream order and display order of frames................................ .................... 6 2.1.3 Description of a macroblock................................ ................................ ............ 7 2.2 Types of macroblock present in a frame................................ ................................ .. 8 2.2.1 Types of macroblock in an I frame................................ ................................ ... 8 2.2.2 Types of macroblock in a P frame................................ ................................ .... 8 2.2.3 Types of macroblock in a B frame................................ ................................ ... 9 2.3 Motion estimation and compensation ................................ ................................ ...... 9 2.3.1 Encoding the motion vectors ................................ ................................ ......... 11 Summary................................ ................................ ................................ ................ 12 Chapter 3................................ ................................ ................................ ........................... 13 3 Extraction of the motion vectors ................................ ................................ ..................... 13 3.1 Choosing a decoder................................ ................................ .............................. 13 3.1.1 The Berkeley Decoder................................ ................................ ................... 13 3.1.2 The Java Decoder................................ ................................ .......................... 13 3.2 Description of the source code ................................ ................................ ............. 13 3.3 Storage of the Motion Vectors ................................ ................................ ............. 15 3.3.1 Reordering the bit stream order to the display order ................................ ...... 15 3.3.2 Storing the motion vectors ................................ ................................ ............ 16 3.3.3 Operation of program................................ ................................ .................... 17 3.3.4 Alterations made to the decoder ................................ ................................ .... 18 Summary ................................ ................................ ................................ ................... 20 Chapter 4................................ ................................ ................................ ........................... 21 4 Finding the motion from frame to frame................................ ................................ .. 21 4.1 Considerations that have to be taken into account - Frame level........................ 22 4.2Considerations that have to be taken into account - macroblock level................ 23 Summary ................................ ................................ ................................ ................... 24 Conclusion ................................ ................................ ................................ ........................ 25 References................................ ................................ ................................ ......................... 25 Appendix A................................ ................................ ................................ ....................... 26

iii

Table of Figures
Figure 2.1 The layered structure of the MPEG bit stream . . .4 Figure 2. 2 P frames use only forward prediction .. ..5 Figure 2.3 B frames use both forward and backward prediction . .6 Figure 2.4 A single frame divided up into slices .. . . ..6 Figure 2.5 Only one set of chrominance components is needed for every four luminance components . .7 Figure 2.6 Structure of a macroblock, and the blocks numbering convention .7 Figure 2.7 A forward predicted motion vector 10 Figure 3.1 Converting from bit stream order to Display order 16 Figure 3.2 Diagram of where the motion vectors for the different frames are stored .18 Figure3.3 Flow chart of the operational program 19 Figure 4.1 Motion vectors associated with a moving picture ..21 Figure 4.2 Realistic version of vectors associated with a moving picture 23

iv

Chapter 1 1. Introduction
With the arrival of Digital TV in America and Great Britain recently it is only a short time before its use will be standard. Recent years have also brought huge advances in: Networking - High bandwidth networks not only in the workplace, but reaching many homes also; Data storage - Today we talk only in Gigabytes; Video compression - Modern techniques allow compression rates of up to one in fifty (this topic is discussed in detail in Chapter 2) The combination of these developments will bring the wide spread usage of digital video over the next few years. Following the launch of this new technology will be the launch of many new services, we could see the introduction of the local video server instead of the local video store where connected residents can select a video from a huge multimedia server. A recording of all television broadcasts for the past week may be stored, allowing subscribers to catch up on any missed viewing. Searching through such large archives will see the need of a navigation tool. There is an on going project in DCU at the moment to develop such a tool of which this project is only a part[1]. When complete, the tool will allow the user to pick a category to search through (sport, drama, action, soap). Clicking on a category will display a list of key frames, each frame representing a program. Clicking on one of these frames will display another list of key frames and using this hierarchical approach, the user can narrow the search down to a single shot of video. One of the challenges of the project is to choose a frame to best represent a clip of film. It has been found that the frame after a sequence of frames with a lot of action is sometimes a good representation for that shot. This is one area where the motion vectors may come in useful. To allow navigation, the material has first to be broken up into elements. For video these elements are shots and scenes. A shot is defined as the continuous recording of a single camera, a scene is made up of multiple shots, while a television broadcast consists of a collection of scenes. For studio broadcasts (take for example the news) it is fairly easy to break the program up as the boundaries between shots are hard. However most television programs and films use special techniques to soften the boundaries, this makes them less detectable. There are four different types of boundaries between shots:
A cut. This is a hard boundary and occurs when there is a complete change of picture

over two consecutive frames. A fade. There are two types of fade, a fade out and a fade in. A fade out occurs when the picture gradually fades to a dot or black screen, while a fade in occurs when the

picture is gradually displayed from a black screen, both these effects occur over a few frames. A dissolve. This is the simultaneous occurrence of a fade out and a fade in, the two pictures are superimposed on each other. A wipe. This effect is like a virtual line going across the screen clearing one picture as it brings in another, again this occurs over a few frames. There are a lot of techniques (Pixel based difference method, Colour histogram method, Detection of macroblocks and Edge detection[5]) which can reliably detect a cut. However, only Edge detection is any way effective in detecting Fades, Dissolves and Wipes. There is another ongoing project in DCU at the moment that uses edge detection to find shot boundaries. The program takes two consecutive frames, uses special techniques to leave just a black & white outline of any objects in the frames, and then compares the two outlines. If there are a lot differences in them, it concludes a shot cut has occurred. One of the flaws of this method is that it only allows for relatively small movements of the objects from frame to frame. If something large suddenly moves across the screen, it may interpret this as a cut. To illustrate where this may happen, take the example where a journalist is giving a TV report from outside some building, and suddenly a bus goes by in the background. The inclusion of the bus in the frame could confuse the program into thinking a cut has occurred. This is another case where motion vectors could come in useful, as, associated with a lot of movement in a frame is a lot of motion vectors. These motion vectors can be used to compensate for the movement of the bus. Here is a history of the events that lead up to the creation of this project.
Develop a system to capture, compress, store and index up to 24 hours of TV

broadcasts in digital format. Eight hour recording of television broadcasts in MPEG1 format. This eight hours was broken into twenty minute segments for easier handling . A baseline was created by manually going through the entire recording and labelling where a cut, fade, dissolve and wipe occurred. A note of the frame number and the time the boundary occurred was taken. The results of any program written to find these boundaries can be easily compared to the baseline in order to determine its accuracy. A program was written using edge detection to find the shot boundaries but it was found that a lot of motion in a frame caused the program to falsely detect a cut. The use motion vectors to compensate for the motion should rectify the result. It is hoped that the motion vectors can also be used to enhance the programs performance in detecting fades, dissolves and wipes. Another area where the motion vectors may be used is in the choice of key frame for a shot, [choose a frame after a lot of action?]

Chapter 2 2. Digital Video Compression


In this chapter the techniques used to compress digital video are discussed with a special emphasis on the factors that that need to be considered when finding the motion from one frame to another. Digital video has the advantages of high quality sound and pictures, but its disadvantage is it can t be easily transmitted or stored; it needs to be transmitted at a minimum of 100Mbps which is impractical for to-day s infrastructure. To combat this problem, a lot of work was put into video compression. In 1988 the International Standards Organisation (ISO) set up the Moving Picture Expert Group (MPEG) to standardise this compression. Its first standard, IS 11172 (known as MPEG-1) came in five parts: 1. System (11172-1). This was concerned with the multiplexing and synchronisation of the multiple audio video streams. 2. Video (11172-2). This dealt with the encoding of the video stream. 3. Audio (11172-3). This part dealt with the encoding of the audio stream. 4. Compliance testing (11172-4) 5. Software for MPEG-1 coding (11172-5) Parts 1, 2 and 3 were approved in November 1992 and parts 4 and 5 in November 1994. This project is only concerned with the second part, the Video encoding. A summary of the standard is given in Table 2.1 Table 2.1 Summary of the constrained parameters of MPEG-1[2] Horizontal picture size less than or equal to 768 pels Vertical picture size less than or equal to 576 lines Picture area less than or equal to 396 macroblocks Pel rate less than or equal to 396x25 macroblocks per second Picture rate less than or equal to 30 Hz Motion vector range -64 to 63.5 pels (half pel precision) Input buffer size less than or equal to 327.68 Kb Bitrate less than or equal to 1.856 Mbps (constant bitrate) The aim of MPEG-1 was to achieve coding of full motion video at a rate of around 1.5Mbps, this rate was chosen as it would be suitable for transmission over any modern network and also it is nearly the same rate as a CD (1.412Mbps). To allow for greater flexibility and ingenuity in compression techniques, MPEG-1 does not specify a standard for the encoding of video. What it does specify is a standard for the decoding process and the video bit stream.

2.1 The MPEG-1 bit stream


The bit stream is in a layered format as shown inFigure 2.1, a brief description of the function of each layer is giver in Table 2.2.

Sequence layer

GOP

GOP

GOP

GOP

GOP

GOP

GOP

GOP

GOP

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Frame

Slice

Slice

Slice

Slice

Slice

Slice

Slice

Slice

Slice

Macro Macro Macro Macro Macro Macro Macro Macro Macro Block Block Block Block Block Block Block Block Block

Y0 Block

Y1 Block

Y2 Block

Y3 Block

Cb Block

Cr Block

Figure 2.1 The layered structure of the MPEG bit stream Table 2.2 Function of each layer of the bit stream[2] Layer Sequence layer Group of pictures (GOP) Picture Slice Macroblock Block Function One or more groups of pictures Random access into the sequence Primary coding unit Resynchronisation unit Motion compensation unit DCT unit

Firstly, each layer is briefly described, and then a more thorough description of the units in the layers is given. 1. The sequence layer contains general information about the video: the vertical and horizontal size of the frames, height/width ratio, picture rate, VBV Buffer size, Intra and non-intra quantizer default tables. 2. Group of pictures (GOP) layer: Pictures are grouped together to support greater flexibility and efficiency in the encoder/decoder [2]. 3. The frame layer (picture layer) is the primary coding unit, it contains information regarding the picture s position in the display order (pictures do not come in the same order as they are displayed), what type of picture it is (Intra, Predicted or Bi-directionally predicted) and the precision and range of any motion vectors present in the frame. 4. The Slice layer is important in the handling of errors. If the decoder comes across a corrupted slice, it skips it and goes straight to the start of the next slice. 5. The Macroblock layer is the basic coding unit It is within this unit that the motion vectors are stored. Each macroblock may have one or associated with it. 6. The Block layer is the smallest coding unit and it contains information on the coefficients of the pixels.

2.1.1 Description of a frame


As mentioned above there are three types of picture/frame:
Intra (I-type) . These frames are encoded using only information from itself. Predicted (P-type). These frames are encoded using a past I or P frame as a reference,

as illustrated in figure 2.2. This is known as forward prediction.

I B

Figure 2. 2 P frames use only forward prediction

Bi-directionally predicted (B-type). These frames are encoded using a past (forward

predicted) and a future (backward predicted) I or P frame as a reference, as illustrated in figure 2.3 (a B-type frame is never used as a reference).

Figure 2.3 B frames use both forward and backward prediction Each frame is divided up into arbitrary sized slices. A slice may contain just one macroblock or all the macroblocks in the frame. As shown in Figure 2.4, a slice is not confined to a single row. Slice 1 Slice 3 Slice 4 Slice 5 Slice 6 Slice 7 Slice 9 Slice 2

Slice 8

Figure 2. 4 A single frame divided up into slices

2.1.2 Bit stream order and display order of frames


A typical sequence of frames in the display order is shown below. I B B B P B B B B P I B B B I 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 However, this is not the order in which they are transmitted! The P frame numbered five is needed for the decoding of B frames two, three, and four. Therefore five has to be decoded before two, three and four and hence transmitted before them. Similarly P9 is transmitted before B6, B7, B8 and B9 also I15 is transmitted before B12, B13 and B14. The bit stream order is shown below. I P B B B P B B B B I I B B B 1 5 2 3 4 10 6 7 8 9 11 15 12 13 14

2.1.3 Description of a macroblock


The macroblock is the basic unit in the MPEG stream, it is an area of 16 pixels by 16 pixels and it is at this stage that the first compression takes place. Each pixel has a luminance (Y) component and two chrominance (Cb and Cr) components associated with it. The human eye is much more sensitive to luminance than it is to chrominance. Therefore the luminance components must be encoded at full resolution while the chrominance components can be encoded at quarter resolution without any noticeable loss. This gives compression of one in two already. Figure 2.5 shows this compression. Y Cb Cr Y Y Cb Cr Y Y Cb Cr Y Y Cb Cr Y Y Y Y Y Y Y Cb Cr Y Y Y Y Y Cb Cr Y Y Y Y Cb Cr Y Y Y Y Y Cb Cr Y Y Y Y Cb Cr Y Y Y Y Y Cb Cr Y Y Y Y Cb Cr Y Y Y Y Y Y Cb Cr Y Y Y Y Cb Cr Y Y Cb Cr Y Y Y Y Y

Cb Cr Y Y Y Cb Cr Y Y Y

Figure 2.5 Only one set of chrominance components is needed for every four luminance components. A block is an 8 by 8 pixel area and is the smallest unit in the MPEG stream. It contains the Discrete Cosine Transform (DCT) coefficients of the luminance and chrominance components [3]. Six blocks are needed to make up a macroblock (16 pixels by 16 pixels), four for the luminance components but only one for each of the two chrominance components due to their compression. Figure 2.6 shows the blocks of a macroblock and their numbering convention.

0 2

1 3

Cb

Cr

Figure 2.6 Structure of a macroblock, and the blocks numbering convention

2.2 Types of macroblock present in a frame


In a single frame there may be many different types of macroblock (MB). Tables 2.3, 2.4 and 2.5 show the different types of macroblock that can be present in I, P and B frames respectively.

2.2.1 Types of macroblock in an I frame


In an I frame there are only two types of macroblock, Intra-d uses the default quantizer scale while Intra-q uses a scale defined by the buffer status[2].

2.2.2 Types of macroblock in a P frame


A P frame uses motion estimation and compensation to reduce the amount of information needed to play the video, this process is described later in the chapter. There are eight different types of macroblock in a P frame, but for the purpose of this project they can be divided up into three categories. 1. Intra. There are no motion vectors present. These macroblocks don t use any reference frame, and are encoded using only information from itself. 2. Predicted. These macroblocks have motion vectors present. 3. Skipped. These macroblocks are the exact same as the macroblock in the previous frame.

Table 2.3 Macroblock types in an I frame[2]


Type Intra-d Intra-q VLC code 1 01 MB quant 0 1

Table 2.4 Macroblock types in a P frame[2]


Type pred-mc pred-c pred-m Intra-d Pred-mcq Pred-cq Intra-q Skipped VLC 1 01 001 0001 1 0001 0 0000 1 0000 01 Intra MF 1 1 1 1 1 1 1 1 1 1 Coded pattern 1 1 Quant

Table 2.5 Macroblock types in a B frame[2]


Type pred-i pred-ic pred-b pred-bc pred-f pred-fc intra-d pred-icq pred-fcq pred-bcq intra-q skipped VLC 10 11 010 011 0010 0011 0001 1 0001 0 0000 11 0000 10 0000 01 Intra M F 1 1 MB 1 1 1 1 Coded pattern 1 1 1 1 1 1 1 1 1 1 1 1 1 Quant

1 1 1 1 1

Here is the meaning of the abbreviations used in the tables above: VLC - variable length code M F - motion forward M B - motion backward pred - predictive m - motion compensated c - at least one block in the macroblock is coded and transmitted d - default quantizer is used q - quantizer scale is changed i - interpolated. This is a combination of forward prediction and backward prediction. b - backward prediction f - forward prediction

2.2.3 Types of macroblock in a B frame


A B-frame uses two reference frames for prediction and can have twelve different types of macroblock. This leaves it the most complex, but it gives the highest compression rate. For the purpose of this project, they can be categorised in five groups: 1. Forward predicted. Macroblock in encoded using only a past I or P frame. 2. Backward predicted. Macroblock in encoded using only a future I or P frame. 3. Forward and Backward predicted (Interpolated). macroblock is encoded using both a past and future frame as a reference. The two macroblocks are interpolated to form the predicted macroblock. 4. Intra. No reference frame is used. Macroblock is encoded using information from itself. 5. Skipped. Macroblock is the same as the one in the previous frame

2.3 Motion estimation and compensation


MPEG achieves it its high compression rate by the use of motion estimation and compensation. MPEG takes advantage of the fact that from frame to frame there is very little change in the picture (usually only small movements). For this reason macroblock size areas can be 9

compared between frames, and instead of encoding the whole macroblock again the difference between the two macroblocks is encoded and transmitted. Figure 2.7 demonstrates how forward motion compensation is achieved (backward compensation is done in the same way except a future frame in the display order is used as the reference frame.) I or P Reference Frame P or B Frame

Search area

x y

Figure 2.7 A forward predicted motion vector Macroblock x is the macroblock we wish to encode, macroblock y is its counterpart in the reference frame. A search is done around y to find the best match for x. This search is limited to a finite area, and even if there is a perfectly matching macroblock outside the search area, it will not be used. The displacement between the two macroblocks gives the motion vector associated with x. There are many search algorithms to find the best matching macroblock. A full search gives the best match but is computationally expensive. Alternatives to this are the Logarithmic search, One-at-a-time search, Three-step search and the Hierarchical search [3]. The choice of search is decided by the encoder, with the usual trade-off between time and accuracy.

10

2.3.1 Encoding the motion vectors


Once the motion vector is found it has to be encoded for transmission. The first step in the encoding process is to find the differential motion vectors (DMV). In a lot of situations, (e.g. a pan) all motion vectors will be nearly the same. Therefore subtracting the motion vector for a macroblock from the previous motion vector in the slice will reduce a lot of the vectors to zero. Note this differential vector is reset to zero if an I-type macroblock is encountered, and also at the end of a slice. The second step is to make sure all differential vectors are within a permitted range. This range is defined by forward_f_code/backward_f_code and is given in table 2.6. If the vectors are outside this range, a modulus is added/subtracted. Finally the differential vectors are variable length coded and transmitted. The variable length codes are given intable 2.7. To illustrate an example suppose the vectors (full pel precision) in a slice are: 3 1 0 30 30 -14 -16 27 24 All vectors lie in the range -32 to 31, therefore a forward_f_code of 2 is used. The differential vectors are: 3 7 20 0 -44 -2 43 -3 Adding/subtracting the modulus (64 in this case) to any values outside the range gives: 3 7 20 0 20 -2 -21 -3 The variable length codes for these values are[2]: Value 3 7 20 0 VLC 0010 0 0000 1100 0000 0100 101 1 Value 20 -2 -21 -3 VLC 0000 0100 101 0111 0000 0100 0110 0011 0

The code needed to decode these VLC values is given in MPEG standard

Table 2.6 Range of motion vectors and their modulus[2] Forward-f-code or Backward-f-code 1 2 3 4 5 6 7

Half pel precision -8 to 7.5 -16 to 15.5 -32 to 31.5 -64 to 63.5 -128 to 127.5 -256 to 255.5 -512 to 511.5

Full pel precision -16 to 15 -32 to 31 -64 to 63 -128 to 127 -256 to 255 -512 to 511 -1024 to 1023

Modulus 32 64 128 256 512 1024 2048

11

Table 2.7 VLC for the differential motion vectors (DMV) [2] VLC code 0000 0011 001 0000 0011 011 0000 0011 101 0000 0011 111 0000 0100 001 0000 0100 011 0000 0101 11 0000 0101 01 0000 0101 11 0000 0111 0000 1001 0000 1011 0000 11 0001 1 0011 011 1 DMV -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 VLC code 010 0010 0001 0 0000 110 0000 1010 0000 1000 0000 0110 0000 0101 10 0000 0101 00 0000 0100 10 0000 0100 010 0000 0100 000 0000 0011 110 0000 0011 100 0000 0011 010 0000 0011 000 DMV 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Summary
In this chapter the MPEG standard is introduced and described, the layered structure of the bit stream was explained and the concept of a motion vector illustrated. The difference between the bit stream order and the display order of the frames was explained and illustrated. The different types of macroblock present in I, P and B frames was given. To find the motion from frame to frame all of these factors have to be considered.

12

Chapter 3 3 Extraction of the motion vectors


This chapter discusses the steps taken to extract the motion vectors from the MPEG stream. It also describes the alterations made to the source code to allow the calculation of the motion from frame to frame.

3.1 Choosing a decoder


The first step was to choose an MPEG1 decoder. The decoder is used to extract and decode the motion vectors. A search of suitable decoders was undertaken, and this resulted in two candidates; the Berkeley decoder and a Java decoder.

3.1.1 The Berkeley Decoder


The Berkeley decoder can be found at: http://www.bmrc.berkeley.edu:80/frame/research/mpeg/mpeg_play.html Initially it was thought that this would be the best decoder to use as it was written in C. Speed is an important factor in this project due to the size of the files that have to be processed, and C has a superior processing time to Java. However the source code proved impossible to read. It was not commented and there are pointers pointing to pointers pointing to ???

3.1.2 The Java Decoder


The Java decoder can be found at: http://rnvs.informatik.tu-chemnitz.de/~ja/MPEG/MPEG_Play.html The Java program s speed disadvantage over C was compensated for by its well structured and documented style. There are two versions of the decoder available. The default version stores all the frames as it decodes them. This version is impractical to use, as all the memory is used up after only a few frames are decoded. It has to be able to decode thirty thousand frames! By making a small alteration to the source code, we get the Just-in-time version. This version only stores seven or eight frames at a time, which makes it suitable for our purpose.

3.2 Description of the source code


The motion vectors are decoded by the two classes MPEG_video and motion_data. MPEG_video is the main class in the program. It takes in the bit stream and decodes it. A skeleton of the program is given below.

13

public class MPEG_video implements Runnable { MPEG_video () { } public void run() { mpeg_stream.next_start_code(); do { Parse_sequence_header(); } do { Parse_group_of_pictures(); } } } private void Parse_sequence_header() { } private void Parse_group_of_pictures() { do { Parse_picture(); } } private void Parse_picture () { do { Parse_slice(); } } private void Parse_slice() { do { Parse_macroblock(); } } private void Parse_Block() { } }

It is clear how the program first takes in the highest level layer, and parses it. The program then extracts the information in a section of that layer, and moves down to the next level. This process is repeated for all the layers. The motion vector information is contained in the macroblock layer. Once this information is known, it is passed to a method in motion_data called compute_motion_vector. To decode the motion vectors, compute_motion_vector uses another method in motion_data called motion_displacement. The code in these methods is given in Appendix A.

14

The two components of the vector are right_x and down_x. The conventional direction used for the components is right and down, minus components represent left and up respectively. For this project it was decided to use half pixel precision for the vectors, (recon_right_x and recon_down_x). The vectors may not be pointing to a particular pixel, but it is the true vector for that macroblock. The fact that the vector is not pointing at a pixel should not be an issue. If the motion vectors are used for the selection of the key frame in a shot, there is no need for the vector to be pointing at a pixel. If the vectors are used to compensate for movement in a frame, Edge Detection (the process that will be using the vectors) blows up an area around around each pixel when comparing the two frames[5]. By simply halving the extracted (half pixel precision) vector, and using it for any motion compensation, the need for the extra calculations to get the vector pointing to a pixel will be eliminated. This will enhance the speed of the program. Any inaccuracies in the motion vector will be compensated for by the Edge detection s explosion. Besides Edge detection is not an exact science.

3.3 Storage of the Motion Vectors


The motion vectors have to be stored in an order that will allow the motion from frame to frame to be calculated. First, the process of reording the bit stream order to the display order is discused. This is followed by a description of how selective vector storage allows this reordering.

3.3.1 Reordering the bit stream order to the display order


As described in Chapter 2 the frames do not come into the decoder in the same order as they are displayed. To reorder the frames to the display order the following procedure is used (see Figure3): If an I or P frame (lets call it 1) comes in it is put in a temporary storage future. I and P frames always come into the decoder before the B frames that reference them. 1 is left in future until another I or P frame (5) comes in. The arrival of 5 indicates it is 1 s turn in the display order. 1 is taken out of future, put in the display order. 5 is put in future until another I of P frame arrives. All B frames are immediately put in the display order. At the end whatever frame is left in future is taken out and put in the display order. A typical bit stream is shown in figure 3.1, the display order number of each frame is also given. Note this process doesn t use the display order number. It is given to clarify what is happening.

15

Bit stream Order 1I 5P 2B 3B 4B 10P 6B 7B 8B 9B 11I 15P 12B 13B 14B

Display Order

future 1I

1I 2B 3B 4B 5P 6B 7B 8B 9B 10P 11I 12B 13B 14B 15P

5P 5P 5P 5P 10P 10P 10P 10P 10P 11I 15P 15P 15P 15P

Figure 3.1 Converting from bit stream order to Display order

3.3.2 Storing the motion vectors


For ease of handling, it was decided that the motion vectors should be stored in twodimensional arrays. The size of the array corresponds to the frame size (in macroblocks). The position of the entry in the array corresponds to the macroblock s position in the frame. There is a separate array for the two components of the vector, one for the right component and one for the left component. To allow the storage of all the vectors that may be present in a frame, four arrays have to be created. Two arrays are needed for the storage of the forward predicted vectors, and two for backward predicted vectors. To find the motion from one frame to another, a record of the motion vectors in the previous frame has to be kept. This means four more arrays have to be created. Finally the motion vectors in a P frame have to be stored until it is the P frames turn in the display order. As a P frame can only have forward predicted vectors, only two arrays need to be created. The names of all the arrays used in this project are given below:

16

Array name
futureRight futureDown presentForwardRight presentForwardDown presentBackwardRight presentBackwardDown pastForwardRight pastForwardDown pastBackwardRight pastBackwardDown

Function of array . Store the motion vectors in a P frame until it is the P frames turn in the display o rder. Stores the motion vectors of the present frame in the display order Stores the motion vectors of the previous frame in the display order

3.3.3 Operation of program


If an I frame comes into the decoder, all the vectors in future are reset to zero (after the values that were in it are taken out and put in present), as an I frame has no motion vectors. If a P frame comes in all its vectors have to be stored in future (after the values that were in it are taken out and put in present). The problem with the program is that compute_motion_vector (the method that decodes the motion vector) doesn t know what type of frame is in the decoder, or even what type of predicted vector it has to decode. It could be a forward predicted vector in a P or B frame, or a backward predicted vector in a B frame. To over come this problem, an extra variable, Pic_Type, is also passed. Pic_Type determines what type of frame is present in the decoder, Pic_Type = 2 means it is a P frame, and the vectors are put in future. If a B frame comes in, all its vectors have to be stored in present. However present has two types of vector; presentForward and presentBackward. If it is forward predicted vectors that are to be calculated compute_motion_vector is called from the same place as it was for the P frame. This time Pic_Type = 3 (for a B frame), and the vectors are stored in presentForward. If backward predicted vectors are to be calculated, compute_motion_data is called from a different place. The arbitrary value four is passed, to indicate the vectors are to be put in presentBackward. A diagram of where the vectors are stored is given in Figure 3.2.

17

Frame comes in

Is it an I, P or B frame?

Reset future

Forward or backward prediction

All vectors put in future

forward All vectors put in presentForward

backward All vectors put in presentBackward

Figure 3.2 Diagram of where the motion vectors for the different frames are stored

3.3.4 Alterations made to the decoder


The processes of inputting the motion vectors into the correct arrays and reordering the frames into the display order were incorperted into the decoder. The end result was that the motion vectors for the present frame in display order are in presentForward and presentBackward. While the motion vectors for the previous frame in the display order are in pastForward and pastBackward. A flow chart of the program is given in Figure 3.3. A skeleton of the two files MPEG_video and motion_data, (after the changes were made to then) is given in Appendix A. Also in Appendix A is the new class,Array, that had to be created.

18

Frame comes in

Put all the vectors from present into past

Reset present

Is frame I or P Type?

Yes

No

Take Vectors out of future and put in present Reset future Is frame I or P type ?

No vectors, all vectors in future remain zero

All vectors put in future

All vectors put in present

Figure3.3 Flow chart of the operational program

19

To bring in theses changes it was decided it would be best to creat a new class. This was done for a few reasons: 1. MPEG_video.java is a large file. It seemed unsuitable to make it any bigger. 2. Even though MPEG_video is very large, there is a logical flow to it. The bit stream is decoded from top to bottom, introducing new code would only disturb this natural flow and leave the program difficult to read. 3. At some time in the future the MPEG2 standard may be used instead of the MPEG1 standard that is being used at the moment. However most of the code developed for this project may still be relevent. Having the code developed in a single class will leave it easier to make the transition from MPEG1 to MPEG2.

Summary
There is a program developed which extracts the motion vectors from the bit stream. These vectors are stored in a fashion that allows the motion from frame to frame to be easily calculated. However additional information is needed to calculate this motion. The reasons we need this additional information are explained in the next chapter. Note the source code for the decoder has not been minimised. The code used to calculate the Inverse Discrete Cosine Transform, and also the code used to display the picture can be deleted.

20

Chapter 4 4 Finding the motion from frame to frame


To find the motion from frame to frame, the motion vectors in the present frame are subtracted from the vectors in the previous frame. However, depending on what type of frame (I, P, or B) is in present and past, not all of the arrays can be used. An explication of this is given below. A vector defines a distance and a direction, it does not define a position. We have to know the vector s inital position (reference point) to find all the motion from frame to frame. Only vectors with the same reference point can be subtracted from each other. To illustrate, lets take the simple example of x moving across a portion of the screen as shown in Figure 4.1 1 x x I B B B Figure 4.1 Motion vectors associated with a moving picture. P 2 x 3 4 x 5 x

The arrow represents a forward vector and represents a backward vector [Note a forward vector doesn t have to be pointing forward, and a backward vector pointing backward. It is just the nameing convention for whither the reference frame in the past (forward) or future (backward).] The values for the vectors are given below: In the first frame there are no motion vectors In frame 2: forwardRight = 2; forwardDown = -3; (2, -3) backwardRight = -7; backwardDown = 5; (-7, 4) Frame 3: forward = (4, -6) backward = (-5, 1) Frame 4: forward = (7, -7) backward = (-2, 0) Frame 5: forward = (9, -7) Transition 1 To find the motion in the transition from frame one to frame two, we can only use the forward vector. The backward vector has no reference in the I frame. The motion is just (2, -3) Transition 2 Here, the forward and backward vectors can be used as both forward vectors have the same reference point and both backward vectors have the same reference point.

21

presentForward - pastForward = forward motion (4, -6) - (2, -3) = (2, -3) presentBackward - pastBackward = backward motion (-5, -1) - (-7, 4) = (2, -3) To find the total motion avarage the two results motionRight = (2+2)/2 = 2 motionDown = (-3+-3)/2 = -3 Total motion = (2, -3) Note in this example the forward motion will always equal the backward motion but this is not usually the case in video. Transition 3 forward (7, -7) - (4, -6) = (3, -1) backward (-2, 0) - (-5, 1) = (3, -1) Total motion (3, -1) Transition 4 Both the forward and backward vectors can be used here. Both forward vectors are referenced to the same point and, as the B frames backward vector is referenced to the P frame. The P frame is said to have a zero backward vector. forward (9 ,-7) - (7, -7) = (2, 0) backward (0, 0) - (-2, 0) = (2, 0) Total motion (2, 0) The motion for the sequence is: (2,-3), (2, -3), (3, -1), (2, 0)

4.1 Considerations that have to be taken into account - Frame level


Table 4.1 shows which types of vector can be subracted depending on what type of frame is in past and present.

Table 4.1 Vector types that can be used in the transition from frame to frame past I I P P B B present B or P I B or P I B or P I Vector types that can be subtracted forward only None forward only None forward and backward backward only

I Frame to B or P Frame : When going from an I frame to a B or P frame only the forward motion vectors can be used. The P frame will only have forward vectors, the B frames backward vectors can t be used as they have no reference in the I frame. 22

I Frame to I frame: There are no vectors present in either frame. P Frame to P or B frame: None of the backward vectors in the B frame have a reference in the P frame. Therefore only forward vectors can be used. P Frame to I Frame: The forward vectors in the P frame do not have a reference in the I frame. No motion can be found. B Frame to B or P Frame: Both forward and backward vectors can be used as both have the same reference point from frame to frame. B Frame to I Frame: Only the backward vectors are referenced in the I frame.

4.2Considerations that have to be taken into account - macroblock level


In Chapter 2, all the different types of macroblock that can be present in a frame were described. Each macroblock in a B frame does not have both forward and backward vectors. Some macroblocks will only have either a forward or backward vector. Other macroblocks will have no vector at all, either because it is an Intra macroblock, or it is a skipped macroblock.This complicates the process of finding the motion from frame to frame even further. It is not a simple matter of subtracting all the values in one array from all the values in its corresponding past array. A more accurate representation of x moving accross a portion of the screen may be as shown in Figure 4.2 1 x x I B B B P Figure 4.2 Realistic version of vectors associated with a moving picture 2 x 3 4 x 5 x

In this example the transition from frame 1 to frame 2 can be calculated as before. If the second transition is calulated as before we get: forward motion: (4, -6) - (2, -3) = (2, -3) backward motion: (-5, 1) - (0, 0) = (-5, 1) Total motion = (-1.5, -1) This result is incorrect. To get the correct result, only the forward motion can be used. Similarly only the backward motion is used for the third transition. The motion for the final transition can not be found

23

because there is only a backward vector in frame 4 and only a forward vector in frame 5. Only similar types of vector can be subtracted from each other. Below are further rules to complement the rules that were established in table 4.1 Only if there is a similar type of vector (forward, backward or both) present in both frames can the motion be found. A reference frame is said to have all vectors equal to (0, 0) If there is a skipped macroblock in the present frame, there is zero motion for that transition. If there is a skipped macroblock in the previous frame, the motion for that transition can t be calulated. An exception to this is if there is also a skipped macroblock in the present frame in which case the motion will be zero. If there is an Intra macroblock in either the present or previous frame, the motion for that transition can t be calculated.

Summary
In this chapter the extra information needed to find the motion from frame to frame is described. A set of rules is established on how to find the motion. Note this set of rules is not rigid. By keeping track of other information, more vectors can be found. For example, if a record of the vector for a macroblock before a skipped macroblock is kept, the motion, in the transition between that skipped macroblock (or the final skipped macroblock in a series of skipped macroblocks) and a non intra macroblock can also be calculated. However, this will only further complicate the program. For a starting point, the rules created in this chapter should be sufficient. If the program does not perform satisfactorily this extra motion can be calculated.

24

Conclusion
This project set out to extract the motion vectors from an MPEG stream. This information was to be used to calculate the motion of all objects from one frame to another. The first step of the project was to choose an MPEG1 decoder to extract and decode the motion vectors. The choice came down to a Java decoder and a C decoder. Two issues had to be taken into account when choosing the decoder; how fast the decoder could run, and how easily it could be modified. The Java decoder was chosen because although the MPEG bit stream is quite complicated, it is very well structured. Java s superior ability to deal with the complexity of the bit stream in an easy to follow manner outweighed the C decoder s superior processing time. Using the decoder, the motion vectors were extracted and decoded. The decoder was modified to allow the subtraction of all the motion vectors in the present frame (display order) from all the motion vectors in the previous frame (display order). All the modifications were put in a separate class. This means minimal alterations to the decoder s well structured code. The creation of a separate class with all the new code is important because, at some time in the future the MPEG2 standard may be used instead of the MPEG1 standard (the standard we are using at the moment). All the relevant code developed for the MPEG1 standard can be easily taken and used for the MPEG2 standard. On completion of the program, it was realised that in order to find the motion from one frame to another, it is not a simple matter of subtracting all the vectors in the present frame from all the vectors in the previous frame. A set of rules have to be followed. The rules were developed in two stages. First a general set of rules that only take into account what type of frame (I, P or B) the vectors are in were written. Then at a lower level, the macroblock types present in the frames were taken into consideration and a comprehensive set of rules were written. These rules give the true motion from frame to frame. The next step in this project is to incorporated the rules into the program. Finally, to enhance the program s performance, some of the decoder s source code can be deleted. The code which deals with decoding the pixel coefficients is irrelevant. Also, the code used to display the video can be omitted. To conclude, on accomplishing the task presented in this project (to extract the motion vectors from the MPEG stream) it was discovered that more information is needed in order to achieve the ultimate goal of finding the motion of objects from one frame to another. This extra information is identified. Also, a description is given on how to use this information to find the motion from frame to frame.

References
[1] http://www.compapp.dcu.ie/~asmeaton/Video-Proj-summary.html [2] ISO/IEC 11172-2, Genve, 1993 [3] K.R. Rao and J.J.Hwang, Techniques & Standards For Image Video & Audio Coding, Prentice Hall PTR, New Jersey, 1996 [4] http://rnvs.informatik.tu-chemnitz.de/~ja/MPEG/MPEG_Play.html [5] Aidan Totterdell, An Algorithm for detecting and classifying scene breaks in an MPEG1 video bit stream, Dublin City University, 1998. 25

Appendix A
Code for the two methods, compute_motion_vector and motion_displacement [4] private int motion_displacement(int motion_code, int PMD, int motion_r) { int dMD, MD; if (x_ward_f == 1 || motion_code == 0) { dMD = motion_code; } else { dMD = 1 + x_ward_f * (Math.abs(motion_code) 1); dMD += motion_r; if (motion_code < 0) dMD = -dMD; } MD = PMD + dMD; if (MD > max) MD -= range; else if (MD < min) MD += range; return MD; } public void compute_motion_vector(int motion_horiz_x_code, int motion_verti_x_code,int motion_horiz_x_r, int motion_verti_x_r) { recon_right_x_prev = recon_right_x = motion_displacement(motion_horiz_x_code, recon_right_x_prev, motion_horiz_x_r); if (Full_pel_x_vector) recon_right_x <<= 1; recon_down_x_prev = recon_down_x = motion_displacement(motion_verti_x_code, recon_down_x_prev, motion_verti_x_r); if (Full_pel_x_vector) recon_down_x <<= 1; right_x = recon_right_x >> 1; down_x = recon_down_x >> 1; right_half_x = (recon_right_x & 0x1) != 0; down_half_x = (recon_down_x & 0x1) != 0; right_x_col = recon_right_x >> 2; down_x_col = recon_down_x >> 2; right_half_x_col = (recon_right_x & 0x2) != 0; down_half_x_col = (recon_down_x & 0x2) != 0; }

26

MPEG_video
/*This is a skeleton structure of MPEG_video, just to document some of /*the things that have been added in. Once the resolution of the video /*is known Arrays is called and the size of all the arrays can be set. /*If the frame is I or P type the future vectors will become the present /*vectors in display order, if the frame is P type any vectors present /*in the frame are stored in future until its turn in display order comes /*(when another I or P frame comes in) /* When compute_motion_vector is called some added information is /*passed to it, the macroblockes address (row and column), what type of /*frame it is if compute_motion_vector is to calculate forward motion /*vectors, if it is to calculate backward motion vectors the arbitrary /*value 4 (don't confuse this 4 with a D_type frame) is passed, just /*to indicate the vectors are backward. import java.io.InputStream; import java.applet.Applet; public class MPEG_video implements Runnable { private Array VideoArray = new Array(); MPEG_video () { } public void run() { mpeg_stream.next_start_code(); do { Parse_sequence_header(); } do { Parse_group_of_pictures(); } } } private void Parse_sequence_header() { Width = mpeg_stream.get_bits(12); Height = mpeg_stream.get_bits(12); mb_width = (Width + 15) / 16; mb_height = (Height + 15) / 16; VideoArray.setDimensions(mb_height, mb_width); } private void Parse_group_of_pictures() { do { VideoArray.pastEqualsPresent();//Store vectors of previos frame. VideoArray.resetPresent();();// All Vectors are reset for the new frame Parse_picture(); VideoArray.printArray(1);// Optional } } private void Parse_picture () { if (Pic_Type == P_TYPE || Pic_Type == I_TYPE) { VideoArray.futureEqualsPresent(); // Take what is in future and put in present VideoArray.resetFuture(); // Reset future for new values } do { Parse_slice(); */ */ */ */ */ */ */ */ */ */ */ */ */

27

} } private void Parse_slice() { do { Parse_macroblock(); } } private void Parse_macroblock() { if (macro_block_motion_forward) { Forward.compute_motion_vector(motion_horiz_forw_code, motion_verti_forw_code, motion_horiz_forw_r, motion_verti_forw_r, mb_row, mb_column, Pic_Type); } if (macro_block_motion_backward) { // motion vector for backward prediction exists b = 4; Backward.compute_motion_vector(motion_horiz_back_code, motion_verti_back_code, motion_horiz_back_r, motion_verti_back_r, mb_row, mb_column, b); } } }

motion_data
/*This is a skeleton of the class motion_data there is very little added to it. */ /*In the method compute_motion vector some extra information is passed, as was */ /*documented in MPEG_video. All this extra information is passed straight to */ /*Arrays along with the values of the motion vectors (in half pixels) */ public class motion_data { private Array MotionArray = new Array(); //Create instance of The class Array public void init () { } public void set_pic_data() { } public void reset_prev() { } /* The internal method "motion_displacement" computes the difference of the */ /* actual motion vector in respect to the last motion vector. Refer to */ /* ISO 11172-2 to understand tho coding of the motion displacement. */ private int motion_displacement(int motion_code, int PMD, int motion_r) { int dMD, MD; if (x_ward_f == 1 || motion_code == 0) { dMD = motion_code; } else {

28

dMD = 1 + x_ward_f * (Math.abs(motion_code) - 1); dMD += motion_r; if (motion_code < 0) dMD = -dMD; } MD = PMD + dMD; if (MD > max) MD -= range; else if (MD < min) MD += range; return MD; } /* The method "compute_motion_vector" computes the motion vector according to the */ /* values supplied by the "ScanThread". It uses the method "motion_displacement". */ /* The result are the motion vectors for the luminance and the chrominance blocks.*/ public void compute_motion_vector(int motion_horiz_x_code, int motion_verti_x_code, int motion_horiz_x_r, int motion_verti_x_r, int mr, int mc, int chooseArray) { recon_right_x_prev = recon_right_x = motion_displacement(motion_horiz_x_code, recon_right_x_prev, motion_horiz_x_r); if (Full_pel_x_vector) recon_right_x <<= 1; recon_down_x_prev = recon_down_x = motion_displacement(motion_verti_x_code, recon_down_x_prev, motion_verti_x_r); if (Full_pel_x_vector) recon_down_x <<= 1; /* The motion vectors(in half pixels) is sent to Arrays, along with information on*/ /* which array they are to go.*/ MotionArray.fillArray(mr, mc, recon_right_x, recon_down_x, chooseArray); } public void get_area() { } public void copy_area() { } public void copy_unchanged() { } public void put_area() { } public void put_area() { } }

Array
/*The class Array is used for the storage of the Motion Vectors*/ /*Two instances of the class Array will be created, one in the class*/ /*MPEG-video called VideoArray. This instance will be used to first*/ /*create the size of the arrays, depending on the reselusion of the video clip.*/ /*This instance will also pass information regarding which arrray*/ /*the Motion Vectors should be in (Past or Present).*/

29

/* The second instance, created in the class motion_data is called*/ /*MotionArray. This instance passes the values of the Motion Vectors*/ /*to the arrays and also information regarding which array they go into*/ /*(futureRight, futureDown, presentForwardRight, presentForwardDown, */ /*presentBackwardRight or presentBackwardDown) */ class Array { public Array() { } /*All arrays are created as static because we want both instances */ /*of the class Array to be able to see them*/ static static static static static static static static static static public public public public public public public public public public int[][] int[][] int[][] int[][] int[][] int[][] int[][] int[][] int[][] int[][] futureRight; futureDown; presentForwardRight; presentForwardDown; presentBackwardRight; presentBackwardDown; pastForwardRight; pastForwardDown; pastBackwardRight; pastBackwardDown;

/*Sets the dimensions of all the arrays*/ public void setDimensions(int mb_h, int mb_w){ futureRight = new int[mb_h][mb_w]; futureDown = new int[mb_h][mb_w]; presentForwardRight = new int[mb_h][mb_w]; presentForwardDown = new int[mb_h][mb_w]; presentBackwardRight = new int[mb_h][mb_w]; presentBackwardDown = new int[mb_h][mb_w]; pastForwardRight = new int[mb_h][mb_w]; pastForwardDown = new int[mb_h][mb_w]; pastBackwardRight = new int[mb_h][mb_w]; pastBackwardDown = new int[mb_h][mb_w]; } /*fill_array takes the values from the method compute_motion_vectors*/ /*in the class motion_data and puts them in the appropriate array.*/ /*Note all values are in half pixels*/ public void fillArray(int mr, int mc, int right, int down, int chooseArray) { if(chooseArray == 2){ futureRight[mr][mc] = right; futureDown[mr][mc] = down; } if(chooseArray == 3){ presentForwardRight[mr][mc] = right; presentForwardDown[mr][mc] = down; } if(chooseArray == 4){ presentBackwardRight[mr][mc] = right; presentBackwardDown[mr][mc] = down; } } /*This method is only used to print out the values!*/ public void printArray(int printWhich){ if (printWhich == 1){ for (int j = 0; j < futureDown.length; j++){ for (int i = 0; i < futureDown[j].length; i++){ System.out.print(""+pastBackwardRight[j][i]+"\t"); }

30

System.out.print("\n"); } System.out.print("\n"); } } /*As each new frame comes in all values have first to be set to zero*/ public void resetPresent(){ for (int j = 0; j < futureDown.length; j++){ for (int i = 0; i < futureDown[j].length; i++){ presentForwardRight[j][i] = 0; presentForwardDown[j][i] = 0; presentBackwardRight[j][i] = 0; presentBackwardDown[j][i] = 0; } } } /*When an I or P picture comes in we have to take all the motion vectors that*/ /*are in future and put them in present*/ public void futureEqualsPresent(){ for (int j = 0; j < futureDown.length; j++){ for (int i = 0; i < futureDown[j].length; i++){ presentForwardRight[j][i] = futureRight[j][i]; presentForwardDown[j][i] = futureDown[j][i]; } } } /*After all the values are taken out of future, future has to be reset*/ /*before any more values can be put in.*/ public void resetFuture(){ for (int j = 0; j < futureDown.length; j++){ for (int i = 0; i < futureDown[j].length; i++){ futureRight[j][i] = 0; futureDown[j][i] = 0; } } } /*We need to store all the Motion Vectors from the previous frame*/ /*so the net movement from frame to frame can be calculated*/ public void pastEqualsPresent(){ for (int j = 0; j < futureDown.length; j++){ for (int i = 0; i < futureDown[j].length; i++){ pastForwardRight[j][i] = presentForwardRight[j][i]; pastForwardDown[j][i] = presentForwardDown[j][i]; pastBackwardRight[j][i] = presentBackwardRight[j][i]; pastBackwardDown[j][i] = presentBackwardDown[j][i]; } } } }

31

You might also like