You are on page 1of 5

Ha Noi University of Technology

Electronic and Telecommunication faculty


First project report about
Motion Estimation in ITU-T H.264 Standard
Constructor: Dr.Thang Nguyen Vu
Team: Lý Hoài Nam (Leader)
Nguyễn Đắc Trung
Minh
Đức

Motion Estimation and Compensation


Basic Knowledge
Successive video frames may contain the same objects (still or moving). Motion
estimation examines the movement of objects in an image sequence to try to obtain
vectors representing the estimated motion. Motion compensation uses the knowledge of
object motion so obtained to achieve data compression. In interframe coding, motion
estimation and compensation have become powerful techniques to eliminate the temporal
redundancy due to high correlation between consecutive frames.

In real video scenes, motion can be a complex combination of translation and rotation.
Such motion is difficult to estimate and may require large amounts of processing.
However, translational motion is easily estimated and has been used successfully for
motion compensated coding.

Most of the motion estimation algorithms make the following assumptions:


1. Objects move in translation in a plane that is parallel to the camera plane, i.e., the
effects of camera zoom, and object rotations are not considered.
2. Illumination is spatially and temporally uniform.
3. Occlusion of one object by another, and uncovered background are neglected.

There are two mainstream techniques of motion estimation: pel-recursive algorithm


(PRA) and block-matching algorithm (BMA).

• PRAs are iterative refining of motion estimation for individual pels by gradient
methods.
• BMAs assume that all the pels within a block has the same motion activity.

BMAs estimate motion on the basis of rectangular blocks and produce one motion vector
for each block. PRAs involve more computational complexity and less regularity, so they
are difficult to realize in hardware. In general, BMAs are more suitable for a simple
hardware realization because of their regularity and simplicity.

Figure 1 illustrates a process of block-matching algorithm. In a typical BMA, each frame


is divided into blocks, each of which consists of luminance and chrominance blocks.
Usually, for coding efficiency, motion estimation is performed only on the luminance
block. Each luminance block in the present frame is matched against candidate blocks in
a search area on the reference frame. These candidate blocks are just the displaced
versions of original block. The best (lowest distortion, i.e., most matched) candidate
block is found and its displacement (motion vector) is recorded. In a typical interframe
coder, the input frame is subtracted from the prediction of the reference frame.
Consequently the motion vector and the resulting error can be transmitted instead of the
original luminance block; thus interframe redundancy is removed and data compression
is achieved. At receiver end, the decoder builds the frame difference signal from the
received data and adds it to the reconstructed reference frames. The summation gives an
exact replica of the current frame. The better the prediction the smaller the error signal
and hence the transmission bit rate.
Figure 1

1. Search Procress:
In motion estimation, the search process can be modified to suit the needs of a particular
algorithm. The search area is typically restricted to lower the computational cost
associated with block matching. Also, in many cases, the objects in the scene do not to
have large translational movements between a frame and the next. That is, the fact that
frames in a video sequence are taken at small intervals of time is exploited. Many
techniques have been proposed to solve the problem of determining the best match at the
lowest computational cost.

2. Reconstruction:
During reconstruction, the reference frame is used to predict the current frame using the
motion vectors. This technique is known as motion compensation. During motion
compensation, the macroblock in the reference frame that is referenced to by the motion
vector, is copied into the reconstructed frame.
3. Compute Error:
For any of the types of errors, a comparison is made pixel by pixel with the luminace
value. These errors are summed over the macroblock and if this error is less than the
previous error, the location of the macroblock in the reference picture is saved. Once all
macroblocks in the search space have been examined, the motion vector is determined
based on the macroblock with the lowest error measure.

Different error measures can been used for motion estimation. Among others, the sum of
absolute differences (SAD) and the minimum squared error (MSE) are commonly used.
Motion estimation can be done on the luminance component of the frames, or can include
the chrominance components as well.

There are also several types of error measures that can be used. The most common type is
the Mean Absolute Error (MAE). This error is computed by taking the difference of the
luminance pixels and dividing by the size of the macroblock. Another method is to use
Mean Square Error (MSE). The results might be similar, but psychovisual studies have
shown that the MSE measure is not how the eye perceives errors in images. Something
better is needed. The MAE measure is not the optimum measure, either

A Definition of Key Motion Estimation Terms

Reference Frame--The frame that is used to make a prediction of another frame. The
other frame may be a future or a previous frame.

Current Frame--The frame that is being predicted from a reference frame. A set of
motion vectors results from the prediction.

Motion Vector-A pair of numbers (a vector) representing the displacement between a


macroblock in the current frame and a macroblock in the reference frame.

Error Measure-The measure of how different one macroblock is to another. Some


examples are Mean Absolute Error and Mean Square Error.

Macroblock--A group of 16x16 pixels of the image.

Search Space--The area of the reference frame that is searched when motion estimation
is performed. Examples are a 64 x 64 square block with the macroblock being compared
in the center of the block, or the entire reference frame.
Luminace--This is the black and white content of the image or how light or dark a pixel
is.

Chrominance--This is the color information for the pixel. In many applications, the
luminance and chrominance are combined and displayed as RGB (red, green, blue)
format rather than YUV (luminance and two chrominance components). RGB, YUV and
others are known as color spaces.

References
1. http://cobweb.ecn.purdue.edu
2. www.google.com
3. http://en.wikipedia.org/
4. http://ieeexplore.ieee.org

Conclusion: This is our first report about Motion Estimation,thus there’re


many errors that we’ve tried to repair .We hope that Proffessor will help us
mort to complet this Project as Final Term plan.We will try so much to
complet.

Thank Sir.

You might also like