Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133

Multimedia Technology
Introduction (2)
n
Overview
q q q
Introduction Chapter 1: Background of compression techniques Chapter 2: Multimedia technologies

n n n n n n n
Multimedia network
q
JPEG MPEG-1/MPEG -2 Audio & Video MPEG-4 MPEG-7 (brief introduction) HDTV (brief introduction) H261/H263 (brief introduction) Model base coding (MBC) (brief introduction) CATV systems DVB systems
Chapter 3: Some real-world systems

n n
Chapter 4: Multimedia Network

Nguyen Chan Hung Hanoi University of Technology 1 4/2/2003
The Internet was designed in the 60s for low-speed internetworks with boring textual applications High delay, high jitter. Multimedia applications require drastic modifications of the INTERNET infrastructure. Many frameworks have been being investigated and deployed to support the next generation multimedia Internet. (e.g. IntServ, DiffServ) In the future, all TVs (and PCs) will be connected to the Internet and freely tuned to any of millions broadcast stations all over the World. At present, multimedia networks run over ATM (almost obsolete), IPv4, and in the future IPv6 should guarantee QoS (Quality of Service) !!
4/2/2003
Nguyen Chan Hung Hanoi University of Technology
Introduction
n
The importance of Multimedia technologies: Multimedia everywhere !! q On PCs: n Real Player, QuickTime, Windows Media. n Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg, mov, ra, ram, mid, DIVX, etc) n Video/Audio Conferences. n Webcast / Streaming Applications n Distance Learning (or Tele-Education) n Tele-Medicine n Tele-xxx (Lets imagine !!) q On TVs and other home electronic devices: n DVB-T/DVB-C/DVB-S (Digital Video Broadcasting Terrestrial/Cable/Satellite) shows MPEG -2 superior quality over traditional analog TV !! n Interactive TV Internet applications (Mail, Web, E -commerce) on a TV !! No need to wait for a PC to startup and shutdown !! n CD/VCD/DVD/Mp3 players q Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!
Nguyen Chan Hung Hanoi University of Technology 2
Chapter 1: Background of compression techniques

n
Why compression ?
q
For communication: reduce bandwidth in multimedia network applications such as Streaming media, Video-onDemand (VOD), Internet Phone Digital storage (VCD, DVD, tape, etc) Reduce size & cost, increase media capacity & quality. Ratio between the source data and the compressed data. (e.g. 10:1) Lossless compression Lossy compression
Compression factor or compression ratio

q
2 types of compression:
q q
4/2/2003
4/2/2003
Information content and redundancy

n
Lossy Compression
n
Information rate q Entropy is the measure of information content.

n
Expressed in bits/source output unit (such as bits/pixel).
The more information in the signal, the higher the entropy. q Lossy compression reduce entropy while lossless compression does not. Redundancy q The difference between the information rate and bit rate. q Usually the information rate is much less than the bit rate. q Compression is to eliminate the redundancy.
q
The data from the expander is not identical to the source data but the difference can not be distinguished auditorily or visually.
q q
Suitable for audio and video compression. Compression factor is much higher than that of lossless. (up to 100:1)
Based on the understanding of psychoacoustic and psychovisual perception. Can be forced to operate at a fixed compression factor.
4/2/2003
4/2/2003
Lossless Compression
n
Process of Compression
n
The data from the decoder is identical to the source data.

q
Communication (reduce the cost of the data link)

q
Example: archives resulting from utilities such as pkzip or Gzip Compression factor is around 2:1.
Data ? Compressor (coder) ? transmission channel ? Expander (decoder) ? Data'
Can not guarantee a fix compression ratio The output data rate is variable problems for recoding mechanisms or communication channel.
Recording (extend playing time: in proportion to compression factor

q
Data ? Compressor (coder) ? Storagedevice (tape, disk, RAM, etc.) ? Expander (decoder) ? Data
4/2/2003
4/2/2003
Sampling and quantization

n
Statistical coding: the Huffman code

n
Why sampling?
q
Computer can not process analog signal directly. Sample the analog signal at a constant rate and use a fixed number of bits (usually 8 or 16) to represent the samples. bit rate = sampling rate * number of bits per sample Map the sampled analog signal (generally, infinite precision) to discrete level (finite precision). Represent each discrete level with a number.
PCM
q
Quantization
q
Assign short code to the most probable data pattern and long code to the less frequent data pattern. Bit assignment based on statistic of the source data. The statistics of the data should be known prior to the bit assignment.
q
4/2/2003
4/2/2003
11
Predictive coding
n
Drawbacks of compression
n
Prediction
q
Sensitive to data error

q
Use previous sample(s) to estimate the current sample. For most signal, the difference of the prediction and actual values is small. We can use smaller number of bits to code the difference while maintaining the same accuracy !! Noise is completely unpredictable
n
Compression eliminates the redundancy which is essential to making data resistant to errors. Error correction code is required, hence, adds redundancy to the compressed data. Artifacts appear when the coder eliminates part of the entropy. The higher the compression factor, the more the artifacts.
Concealment required for real time application

q
Artifacts
q
Most codec requires the data being preprocessed or otherwise it may perform badly when the data contains noise.
4/2/2003
10
4/2/2003
12
A coding example: Clustering color pixels

n n n
Motion Compensated Prediction

n
In an image, pixel values are clustered in several peaks Each cluster representing the color range of one object in the image (e.g. blue sky) Coding process:
1. Separate the pixel values into a limited number of data
clusters (e.g., clustered pixels of sky blue or grass green) 2. Send the average color of each cluster and an identifying number for each cluster as side information. 3. Transmit, for each pixel:
n n
The number of the average cluster color that it is close to. Its difference from that average cluster color. ( can be coded to reduce redundancy since the differences are often similar !!) Prediction
n n
More data in Frame-Differential Coding can be eliminated by comparing the present pixel to the location of the same object in the previous frame. ( not to the same spatial location in the previous frame) The encoder estimates the motion in the image to find the corresponding area in a previous frame. The encoder searches for a portion of a previous frame which is similar to the part of the new frame to be transmitted. It then sends (as side information) a motion vector telling the decoder what portion of the previous frame it will use to predict the new frame. It also sends the prediction error so that the exact new frame may be reconstituted See top figure without motion compensation Bottom figure With motion compensation
4/2/2003
13
4/2/2003
15
Frame-Differential Coding
n n
Unpredictable Information
n
Frame-Differential Coding = prediction from a previous video frame. A video frame is stored in the encoder for comparison with the present frame causes encodinglatency of one frame time. For still images:
q q q
Unpredictable information from the previous frame:

Scene change (e.g. background landscape change) 2. Newly uncovered information due to object motion across a background, or at the edges of a panned scene. (e.g. a soccer s face uncovered by a flying ball)
1.
Data can be sent only for the first instance of a frame All subsequent prediction error values are zero. Retransmit the frame occasionally to allow receivers that have just been turned on to have a starting point.
FDC reduces the information for still images, but leaves significant data for moving images (e.g. a movement of the camera)
4/2/2003
4/2/2003
16
Dealing with unpredictable Information

n
Types of picture transform coding

n
Scene change
q
Types of picture coding:

q q q q q q
An Intra-coded picture (MPEG I picture) must be sent for a starting point require more data than Predicted picture (P picture) I pictures are sent about twice per second Their time and sending frequency may be adjusted to accommodate scene changes Bi-directionally coded type of picture, or B picture. There must be enough frame storage in the system to wait for the later picture that has the desired information. To limit the amount of decoders memory, the encoder stores pictures and sends the required reference pictures before sending the B picture.
Uncovered information
q q
Discrete Fourier (DFT) Karhonen-Loeve Walsh-Hadamard Lapped orthogonal Discrete Cosine (DCT) used in MPEG-2 ! Wavelets New ! The degree of concentration of energy in a few coefficients The region of influence of each coefficient in the reconstructed picture The appearance and visibility of coding noise due to coarse quantization of the coefficients
The differences between transform coding methods:

q q
4/2/2003
4/2/2003
Transform Coding
n
DCT Lossy Coding

n
Convert spatial image pixel values to transform coefficient values the number of coefficients produced is equal to the number of pixels transformed. Few coefficients contain most of the energy in a picture coefficients may be further coded by lossless entropy coding The transform process concentrates the energy into particular coefficients (generally the low frequency coefficients )
Lossless coding cannot obtain high compression ratio (4:1 or less) Lossy coding = discard selective information so that the reproduction is visually or aurally indistinguishable from the source or having least artifacts. Lossy coding can be achieved by:
q q
Eliminating some DCT coefficients Adjusting the quantizing coarseness of the coefficients better !!
4/2/2003
4/2/2003
Masking
n
Run-Level coding
n
Masking make certain types of coding noise invisible or inaudible due to some psycho-visual/acoustical effect.
q
"Run-Level" coding = Coding a run-length of zeros followed by a nonzero level.

q
In audio, a pure tone will mask energy of higher frequency and also lower frequency (with weaker effect). In video, high contrast edges mask random noise.
q q
Instead of sending all the zero values individually, the length of the run is sent. Useful for any data with long runs of zeros. Run lengths are easily encoded by Huffman code
Noise introduced at low bit rates falls in the frequency, spatial, or temporal regions
4/2/2003
21
4/2/2003
23
Variable quantization
n
Key points:
n n n
Variable quantization is the main technique of lossy coding greatly reduce bit rate. Coarsely quantizing the less significant coefficients in a transform ( less noticeable / low energy / less visible/audible) Can be applied to a complete signal or to individual frequency components of a transformed signal. VQ also controls instantaneous bit rate in order to:
q q
Compression process Quantization & Sampling Coding:

q q q q q
Match average bit rate to a constant channel bit rate. Prevent buffer overflow or underflow.
n
Lossless & lossy coding Frame-Differential Coding Motion Compensated Prediction Variable quantization Run level coding
Masking
4/2/2003
22
4/2/2003
Chapter 2: Multimedia technologies

q
JPEG Zig-zag scanning
Roadmap
n n n n n n n n
JPEG MPEG-1/MPEG-2 Video MPEG-1 Layer 3 Audio (mp3) MPEG-4 MPEG-7 (brief introduction) HDTV (brief introduction) H261/H263 (brief introduction) Model base coding (MBC) (brief introduction)
4/2/2003
25
4/2/2003
27
JPEG (Joint Photographic Experts Group)

n
JPEG - DCT
n n
JPEG encoder
q q q
q q q
Partitions image into blocks of 8 * 8 pixels Calculates the Discrete Cosine Transform (DCT) of each block. A quantizer roundsoff the DCT coefficients according to the quantizationmatrix. lossy but allows for large compression ratios. Produces a series of DCT coefficients using Zig-zag scanning Uses a variablelengthcode(VLC) on these DCT coefficients Writes the compressed data stream to an output file (*.jpg or *.jpeg). File input data stream Variable length decoder IDCT (Inverse DCT) Image
DCT is similar to the Discrete Fourier Transform transforms a signal or image from the spatial domain to the frequency domain. DCT requires less multiplications than DFT
JPEG decoder
q
Input image A:
q q
The input image A is N2 pixels wide by N1 pixels high; A(i,j) is the intensity of the pixel in row i and column j; B(k1,k2) is the DCT coefficient in row k1 and column k2 of the DCT matrix
Output image B:
q
4/2/2003
26
4/2/2003
JPEG - Quantization Matrix

n
MPEG (Moving Picture Expert Group)

n
n n
The quantization matrix is the 8 by 8 matrix of step sizes (sometimescalled quantums) - one element for each DCT coefficient. Usually symmetric. Step sizes will be:
q q q
MPEG is the heart of:

q q q q q
Small in the upper left (low frequencies), Large in the upper right (high frequencies) A step size of 1 is the most precise. n
n n n
The quantizer divides the DCT coefficient by its corresponding quantum, then rounds to the nearest integer. Large quantums drive small coefficients down to zero. The result:
q q
Digital television set-top boxes HDTV decoders DVD players Video conferencing Internet video, etc MPEG-1, MPEG-2, MPEG-4, MPEG-7 (MPEG-3 standard was abandoned and became an extension of MPEG-2)
MPEG standards:
q q
Many high frequency coefficients become zero remove easily. The low frequency coefficients undergo only minor adjustment.
4/2/2003
4/2/2003
JPEG Coding process illustrated

1255 -15 11 -49 27 -16 3 -4 -4 -65 37 -50 21 -14 -1 2 43 80 -87 29 58 -12 1 -1 6 13 -21 16 -9 6 -4 -5 10 -6 9 -8 5 -1 -6 1 8 5 -6 4 -1 3 78 1 -4 -1 -5 3 -3 1 0 0 0 4 6 -5 1 0 0 0 0 4 -4 0 0 0 0 0 0 -1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -73 -27 8 13 12 3 10 -14 12 -7
MPEG standards
n
MPEG-1 (Obsolete)
q q
A standard for storage and retrieval of moving pictures and audio on storage media application: VCD (video compact disk) A standard for digital television Applications: DVD (digital versatile disk), HDTV (high definition TV), DVB (European Digital Video Broadcasting Group), etc.
MPEG-2 (Widely implemented)

q q
2 -1 0 0 0
-11 -10 0 8 -2 14 -13 6
MPEG-4 (Newly implemented still being researched)

q q
DCT Coefficients Quantization result Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB
A standard for multimedia applications Applications: Internet, cable TV, virtual studio, etc. Content representation standard for information search ( Multimedia Content Description Interface) Applications: Internet, video search engine, digital library
MPEG-7 (Future work ongoing research)

q q
Easily coded by Run-length Huffman coding

4/2/2003 Nguyen Chan Hung Hanoi University of Technology 30
4/2/2003
MPEG-2 formal standards

n
Pixel & Block

n
The international standard ISO/IEC 13818-2 "Generic Coding of Moving Pictures and Associated Audio Information ATSC (Advanced Television Systems Committee) document A/54 "Guide to the Use of the ATSC Digital Television Standard
Pixel = "picture element".

q q
A discrete spatial point sample of an image. A color pixel may be represented digitally as a number of bits for each of three primary color values = 8 x 8 array of pixels. A block is the fundamental unit for the DCT coding (discrete cosine transform).
Block
q q
4/2/2003
33
4/2/2003
35
MPEG video data structure

n
Macroblock
n n
The MPEG 2 video data stream is constructed in layers from lowest to highest as follows:
q q q
q q q q
q q
PIXEL is the fundamental unit BLOCK is an 8 x 8 array of pixels MACROBLOCK consists of 4 luma blocks and 2 chroma blocks Field DCT Coding and Frame DCT Coding SLICE consists of a variable number of macroblocks PICTURE consists of a frame (or field) of slices GROUP of PICTURES (GOP) consists of a variable number of pictures SEQUENCE consists of a variable number of GOPs PACKETIZED ELEMENTARY STREAM (opt)
A macroblock = 16 x 16 array of luma (Y) pixels ( = 4 blocks = 2 x 2 block array). The number of chroma pixels (Cr, Cb) will vary depending on the chroma pixel structure indicated in the sequence header (e.g. 4:2:0, etc) The macroblock is the fundamental unit for motion compensation and will have motion vector(s) associated with it if is predictively coded. A macroblock is classified as
Field coded ( An interlaced frame consists of 2 field) Frame coded depending on how the four blocks are extracted from the macroblock.
q q
4/2/2003
4/2/2003
36
Slice
n n
I, P, B Pictures
Encoded pictures are classified into 3 types: I, P, and B. n I Pictures = Intra Coded Pictures
q q
Pictures are divided into slices. A slice consists of an arbitrary number of successive macroblocks (going left to right), but is typically an entire row of macroblocks. A slice does not extend beyond one row. The slice header carries address information that allows the Huffman decoder to resynchronize at slice boundaries
All macroblocks coded without prediction Needed to allow receiver to have a "starting point" for prediction after a channel change and to recover from errors Macroblocks may be coded with forward prediction from references made from previous I and P pictures or may be intra coded Macroblocks may be coded with forward prediction from previous I or P references Macroblocks may be coded with backward prediction from next I or P reference Macroblocks may be coded with interpolated prediction from past and future I or P references Macroblocks may be intra coded (no prediction)
P Pictures = Predicted Pictures

q
B Pictures = Bi-directionally predicted pictures

q
q 4/2/2003 Nguyen Chan Hung Hanoi University of Technology 37 4/2/2003
Picture
n n
Group of pictures (GOP)

n n n
n n
A source picture is a contiguous rectangular array of pixels. A picture may be a complete frame of video ("frame picture") or one of the interlaced fields from an interlaced source ("field picture"). A field picture does not have any blank lines between its active lines of pixels. A coded picture (also called a video access unit) begins with a start code and a header. The header consists of:
q q q q
n n
picture type (I, B, P) temporal reference information motion vector search range optional user data a frame of a progressive source or a frame (2 spatially interlaced fields) of an interlaced source
The group of pictures layer is optional in MPEG-2. GOP begins with a start code and a header The header carries q time code information q editing information q optional user data First encoded picture in a GOP is always an I picture Typical length is 15 pictures with the following structure (in display order): q I B B P B B P B B P B B P B B Provides an I picture with sufficient frequency to allow a decoder to decode correctly Forward motion compensation
A frame picture consists of:

q q
B B
B P
B Time
Bidirectional motion compensation

4/2/2003 Nguyen Chan Hung Hanoi University of Technology 38 4/2/2003 Nguyen Chan Hung Hanoi University of Technology 40
Sequence
n n
Transport stream
n
A sequence begins with a unique 32 bit start code followed by a header. The header carries:
q q q q q q q
Transport packets (fixedlength) are formed from a PES stream, including:

q q q q
picture size aspect ratio frame rate and bit rate optional quantizer matrices required decoder buffer size chroma pixel structure optional user data
The PES header Transport packet header. Successive transport packets payloads are filled by the remaining PES packet content until the PES packet is all used. The final transport packet is filled to a fixed length by stuffing with 0xFF bytes (all ones).
n n
The sequence information is needed for channel changing. The sequence length depends on acceptable channel change delay.
Each PES packet header includes: q An 8-bit stream ID identifying the source of the payload. q Timing references: PTS (presentation time stamp), the time at which a decoded audio or video access unit is to be presented by the decoder q DTS (decoding time stamp) the time at which an access unit is decoded by the decoder q ESCR (elementary stream clock reference).
4/2/2003
41
4/2/2003
43
Packetized Elementary S tream (PES)

n n n
Intra Frame Coding

n n
n n
Video Elementary Stream (video ES), consists of all the video data for a sequence, including the sequence header and all the subparts of a sequence. An ES carries only one type of data (video or audio) from a single video or audio encoder. A PES, consists of a single ES which has been split into packets, each starting with an added packet header. A PES stream contains only one type of data from one source, e.g. from one video or audio encoder. PES packets have variable length, not corresponding to the fixed packet length of transport packets, and may be much longer than a transport packet.
Intra coding only concern with information within the current frame, (not relative to any other frame in the video sequence) MPEG intra-frame coding block diagram (See bottom Fig) Similar to JPEG (Lets review JPEG coding mechanism !!) Basic blocks of Intra frame coder:
q q q q
Video filter Discrete cosine transform (DCT) DCT coefficient quantizer Run-length amplitude/variable length coder (VLC)
4/2/2003
42
4/2/2003
44
Video Filter
n
MPEG Profiles & levels

n n
Human Visual System (HVS) is

q q
Most sensitive to changes in luminance, Less sensitive to variations in chrominance.
MPEG-2 is classified into several profiles. Main profile features:

q q q
MPEG uses the YCbCr color space to represent the data values instead of RGB, where:
q q q
4:2:0 chroma sampling format I, P, and B pictures Non-scalable MP@ML (Main Profile Main Level):
n n n n
Main Profile is subdivided into levels.

q
Y is the luminance signal, Cb is the blue color difference signal, Cr is the red color difference signal. 4:4:4 is full bandwidth YCbCr video each macroblock consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks waste of bandwidth !! 4:2:0 is most commonly used in MPEG-2
What is 4:4:4, 4:2:0, etc, video format ?

q q
Designed with CCIR601 standard for interlaced standard digital video. 720 x 576 (PAL) or 720 x 483 (NTSC) 30 Hz progressive, 60 Hz interlaced Maximum bit rate is 15 Mbits/s Upper bounds: 1152 x 1920, 60Hz progressive 80 Mbits/s
MP@HL (Main Profile High Level):

n n n
4/2/2003
4/2/2003
Applications of chroma formats

chroma_for mat 4:2:0 (6 blocks) Multiplex order (time) within macroblock Application Main stream television, Consumer entertainment. Studio production environments Professional editing equipment,
MPEG encoder/ decoder
YYYYCbCr
4:2:2 (8 blocks)
YYYYCbCrCbCr
4:4:4 (12 blocks)

4/2/2003
YYYYCbCrCbCrCbCrCbCr
Computer graphics
46
4/2/2003
48
Prediction
n n
DCT and IDCT formulas

Backward prediction is done by storing pictures until the desired anchor picture is available before encoding the current stored frames. The encoder can decide to use:
q
DCT:
q q
Eq 1 Normal form Eq 2 Matrix form Eq 3 Normal form Eq 4 Matrix form F(u,v) = two-dimensional NxN DCT. u,v,x,y = 0,1,2,...N-1 x,y are spatial coordinates in the sample domain. u,v are frequency coordinates in the transform domain. C(u), C(v) = 1/(square root (2)) for u, v = 0. C(u), C(v) = 1 otherwise.
IDCT:
q q
Forward prediction from a previous picture, Backward prediction from a following picture, or Interpolated prediction
Where:
q q q q q q
to minimize prediction error. The encoder must transmit pictures in an order differ from that of source pictures so that the decoder has the anchor pictures before decoding predicted pictures. (See next slide) The decoder must have two frame stored.
4/2/2003
49
4/2/2003
51
I P B Picture Reordering
n n n n
DCT versus DFT

n
Pictures are coded and decoded in a different order than they are displayed. Due to bidirectional prediction for B pictures. For example we have a 12 picture long GOP: Source order and encoder input order:
q
The DCT is conceptually similar to the DFT, except:

q q
I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11) B(12) I(13) I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11) B(12)
q
DCT concentrates energy into lower order coefficients better than DFT. DCT is purely real, the DFT is complex (magnitude and phase). A DCT operation on a block of pixels produces coefficients that are similar to the frequency domain coefficients produced by a DFT operation.
n n
Encoding order and order in the coded bitstream:

q
Decoder output order and display order (same as input):

q
An N-point DCT has the same frequency resolution as a 2Npoint DFT. The N frequencies of a 2N point DFT correspond to N points on the upper half of the unit circle in the complex frequency plane.
I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11) B(12) I(13)
Nguyen Chan Hung Hanoi University of Technology 50 4/2/2003
Assuming a periodic input, the magnitude of the DFT coefficients is spatially invariant (phase of the input does not matter). This is not true for the DCT.
4/2/2003
Quantization matrix
n
MPEG scanning
Note DCT coefficients are:
Small in the upper left (low frequencies), q Large in the upper right (high frequencies) Recall the JPEG mechanism !!
q
n n
Left Zigzag scanning (like JPEG) Right Alternate scanning better for interlaced frames !
Why ?
q
HVS is less sensitive to errors in high frequency coefficients than it is for lower frequencies higherfrequencies should be more coarsely quantized !!
53 4/2/2003 Nguyen Chan Hung Hanoi University of Technology 55
4/2/2003
Result DCT matrix (example)

n
Huffman/ Run-Level Coding

n
After adaptive quantization, the result is a matrix containing many zeros.
Huffman coding in combination with Run-Level coding and zig-zag scanning is applied to quantized DCT coefficients. "Run-Level" = A run-length of zeros followed by a non-zero level. Huffman coding is also applied to various types of side information. A Huffman code is an entropy code which is optimally achieves the shortest average possible code word length for a source. This average code word length is >= the entropy of the source.
4/2/2003
54
4/2/2003
Huffman/ Run-Level coding illustrated

Zero Run-Length N/A 0 0 0 0 0 0 0 0 0 12 EOB
4/2/2003
MPEG Data Transport

n n n
Amplitude 8 (DC Value) 4 4 2 2 2 1 1 1 1 1 EOB
MPEG Code Value 110 1000 00001100 00001100 01000 01000 01000 110 110 110 110 0010 0010 0 10
Using the DCT output matrix in previous slide, after being zigzag scanned the output will be a sequence of number: 4, 4, 2, 2, 2, 1, 1, 1, 1, 0 (12 zeros), 1, 0 (41 zeros) These values are looked up in a fixed table of variable length codes
q
MPEG packages all data into fixed-size 188-byte packets for transport. Video or audio payload data placed in PES packets before is broken up into fixed length transport packet payloads. A PES packet may be much longer than a transport packet Require segmentation:
q q q
q q
The most probable occurrence is given a relatively short code, The least probable occurrence is given a relatively long code.
The PES header is placed immediately following a transport header Successive portions of the PES packet are then placed in the pay loads of transport packets. Remaining space in the final transport packet payload is filled with stuffing bytes = 0xFF (all ones). Each transport packet starts with a sync byte = 0x47. In the ATSC US terrestrial DTV VSB transmission system, sync byte is not processed, but is replaced by a different sync symbol especially suited to RF transmission. The transport packet header contains a 13-bit PID (packet ID), which corresponds to a particular elementary stream of video, audio, o r other program element. PID 0x0000 is reserved for transport packets carrying a program association table (PAT). The PAT points to a Program Map Table (PMT) points to particular elements of a program
57
4/2/2003
Huffman/ Run-Level coding illustrated (2)

n
MPEG Transport packet
The first run of 12 zeroes has been efficiently coded by only 9 bits The last run of 41 zeroes has been entirely eliminated, represented only with a 2-bit End Of Block (EOB) indicator. The quantized DCT coefficients are now represented by a sequence of 61 binary bits (See the table). Considering that the original 8x8 block of 8-bit pixels required 512 bits for full representation, the compression rate is approx. 8,4:1.
Adaptation Field:
q q q q q
q q q q q q q
8 bits specifying the length of the adaptation field. The first group of flags consists of eight 1-bit flags: discontinuity_indicator random_access_indicator elementary_stream_priority_in dicator
PCR_flag OPCR_flag splicing_point_flag transport_private_data_flag adaptation_field_extension_flag The optionalfields are present if indicated by one of the preceding flags. The remainder of the adaptation field is filled with stuffing bytes (0xFF, all ones).
60
4/2/2003
4/2/2003
Demultiplexing a Transport Stream (TS)

n
Timing - Synchronization
n n
Demultiplexing a transport stream involves:

1. 2. 3.
4.
q
Finding the PAT by selecting packets with PID = 0x0000 Reading the PIDs for the PMTs Reading the PIDs for the elements of a desired program from its PMT (for example, a basic program will have a PID for audio and a PID for video) Detecting packets with the desired PIDs and routing them to the decoders
The decoder is synchronized with the encoder by time stamps The encoder contains a master oscillator and counter, called the System Time Clock (STC). (See previous block diagram.)
q
The STC belongs to aparticular program and is the master clock of the video and audio encoders for that program. Multiple programs, each with its own STC, can also be multiplexed into a single stream.
n n n
A MPEG-2 transport stream can carry:
Video stream Audio stream Any type of data MPEG-2 TS is the packet format for CATV downstream data communication.
A program component can even have no time stamps but can not be synchronized with other components. At encoder input, (Point A), the time of occurrence of an input video picture or audio block is noted by sampling the STC. A total delay of encoder and decoder buffer (constant) is added to STC, creating a Presentation Time Stamp (PTS),
q
PTS is then inserted in the first of the packet(s ) representing that picture or audio block, at Point B .
4/2/2003
63
Timing & buffer control
Point A: Encoder input Constant/specifi edrate Point B: Encoder output Variable rate Point C: Encoderbuffer output Constant rate Point D: Communication channel + decoderbuffer Constant rate Point E: Decoder input Variable rate Point F: Decoderoutput Constant/specifi edrate
62
Timing Synchronization (2)

n
Decode Time Stamp (DTS) can optionally combined into the bit stream represents the time at which the data should be taken instantaneously from the decoder buffer and decoded.
q q q q
DTS and PTS are identical except in the case of picture reordering for B pictures. The DTS is only used where it is needed because of reordering. Whenever DTS is used, PTS is also coded. PTS (or DTS) inserted interval = 700 m S. In ATSC PTS (or DTS) must be inserted at the beginning of each coded picture (access unit ).
In addition, the output of the encoder buffer (Point C) is time stamped with System Time Clock (STC) values, called:
q q
System Clock Reference (SCR) in a Program Stream. Program Clock Reference (PCR) in a Transport Stream.
n n n
PCR time stamp interval = 100mS. SCR time stamp interval = 700mS. PCR and/or the SCR are used to synchronize the decoder STC with the encoder STC.
4/2/2003
4/2/2003
Timing Synchronization (3)

n n
HDTV (2)
n
n n n
All video and audio streams included in a program must get their time stamps from a common STC so that synchronization of the video and audio decoders with each other may be accomplished. The data rate and packet rate on the channel (at the multiplexer output) can be completely asynchronous with the System Time Clock (STC) PCR time stamps allows synchronizations of different multiplexed programs having different STCs while allowing STC recovery for each program. If there is no buffer underflow or overflow delays in the buffers and transmission channel for both video and audio are constant. The encoder input and decoder output run at equal and constant rates . Fixedend-to-end delay from encoder input to decoder output If exact synchronization is not required, the decoder clock can be free running video frames can be repeated / skipped as necessary to prevent buffer underflow / overflow , respectively.
HDTV proposals are for a screen which is wider than the conventional TV image by about 33%. It is generally agreed that the HDTV aspect ratio will be 16:9, as opposed to the 4:3 ratio of conventional TV systems. This ratio has been chosen because psychological tests have shown that it best matches the human visual field. It also enables use of existing cinema film formats as additional source material, since this is the same aspect ratio used in normal 35 mm film. Figure 16.6(a) shows how the aspect ratio of HDTV compares with that of conventional television, using the same resolution, or the same surface area as the comparison metric. To achieve the improved resolution the video image used in HDTV must contain over 1000 lines, as opposed to the 525 and 625 provided by the existing NTSC and PAL systems. This gives a much improved vertical resolution. The exact value is chosen to be a simple multiple of one or both of the vertical resolutions used in conventional TV. However, due to the higher scan rates the bandwidth requirement for analogue HDTV is approximately 12 MHz, compared to the nominal 6 MHz of conventional TV
4/2/2003
4/2/2003
67
HDTV (High definition television)

n
HDTV (3)
n n
High definition television (HDTV) first came to public attention in 1981, when NHK, the Japanese broadcasting authority, first demonstrated it in the United States. HDTV is defined by the ITU-R as:
q
'A system designed to allow viewing at about three times the picture height, such that the system is virtually, or nearly, transparent to the quality or portrayal that would have been perceived in the original scene ... by a discerning viewer with normal visual acuity.'
The introduction of a non-compatible TV transmission format for HDTV would require the viewer either to buy a new receiver, or to buy a converter to receive the picture on their old set. The initial thrust in Japan was towards an HDTV format which is compatible with conventional TV standards, and which can be received by conventional receivers, with conventional quality. However, to get the full benefit of HDTV, a new wide screen, high resolution receiver has to be purchased. One of the principal reasons that HDTV is not already common is that a general standard has not yet been agreed. The 26th CCIR plenary assembly recommended the adoption of a single, worldwide standard for high definition television. Unfortunately, Japan, Europe and North America are all investing significant time and money in their own systems based on their own, current, conventional TV standards and other national considerations.
4/2/2003
4/2/2003
68
H261- H263
n n
H261-H263 (3)
n n n
The H.261 algorithm was developed for the purpose of image transmission rather than image storage. It is designed to produce a constant output of p x 64 kbivs, where p is an integer in the range 1 to 30.
q q
This allows transmission over a digital network or data link of varying capacity. It also allows transmission over a single 64 kbit/s digital telephone channel for low quality video-telephony, or at higher bit rates for improved picture quality.
The basic coding algorithm is similar to that of MPEG in that it is a hybrid of motion compensation, DCT and straightforward DPCM (intra-frame coding mode), without the MPEG I, P, B frames. The DCT operation is performed at a low level on 8 x 8 blocks of error samples from the predicted luminance pixel values, with sub-sampled blocks of chrominance data.
n n
H.261 is widely used on 176x 144 pixel images. The ability to select a range of output rates for the algorithm allows it to be used in different applications. Low output rates ( p = 1 or 2) are only suitable for face-to-face (videophone) communication. H.261 is thus the standard used in many commercial videophone systems such as the UK BT/Marconi Relate 2000 and the US ATT 2500 products. Video-conferencing would require a greater output data rate ( p > 6) and might go as high as 2 Mbit/s for high quality transmission with larger image sizes. A further development of H.261 is H.263 for lower fixed transmission rates. This deploys arithmetic coding in place of the variable length coding (See H261 diagram), with other modifications, the data rate is reduced to only 20 kbit/s.
4/2/2003
69
4/2/2003
H261-H263 (2)
Model Based Coding (MBC)

n
At the very low bit rates (20 kbit/s or less) associated with video telephony, the requirements for image transmission stretch the compression techniques described earlier to their limits. In order to achieve the necessary degree of compression they often require reduction in spatial resolution or even the elimination of frames from the sequence. Model based coding (MBC) attempts to exploit a greater degree of redundancy in images than current techniques, in order to achieve significant image compression but without adversely degrading the image content information. It relies upon the fact that the image quality is largely subjective. Providing that the appearance of scenes within an observed image is kept at a visually acceptable level, it may not matter that the observed image is not a precise reproduction of reality.
4/2/2003
70
4/2/2003
72
Model Based Coding (2)

n n
Model based coding (4)

n n
n n
One MBC method for producing an artificial image of a head sequence utilizes a feature codebook where a range of facial expressions, sufficient to create an animation, are generated from sub-images or templates which are joined together to form a complete face. The most important areas of a face, for conveying an expression, are the eyes and mouth, hence the objective is to create an image in which the movement of the eyes and mouth is a convincing approximation to the movements of the original subject. When forming the synthetic image, the feature template vectors which form the closest match to those of the original moving sequence are selected from the codebook and then transmitted as low bit rate coded addresses. By using only 10 eye and 10 mouth templates, for instance, a total of 100 combinations exists implying that only a 6 -bit codebook address need be transmitted. It has been found that there are only 13 visually distinct mouth shapes for vowel and consonant formation during speech. However, the number of mouth sub-images is usually increased, to include intermediate expressions and hence avoid step changes in the image.
n n n
A synthetic image is created by texture mapping detail from an initial full-face source image, over the wire-frame, Facial movement can be achieved by manipulation of the vertices of the wire-frame. Head rotation requires the use of simple matrix operations upon the coordinate array. Facial expression requires the manipulation of the features controlling the vertices. This model based feature codebook approach suffers from the drawback of codebook formation. This has to be done off-line and, consequently, the image is required to be prerecorded, with a consequent delay. However, the actual image sequence can be sent at a very low data rate. For a codebook with 128 entries where 7 bits are required to code each mouth, a 25 frameh sequence requires less than 200 bit/s to code the mouth movements. When it is finally implemented, rates as low as 1 kbit/s are confidently expected from MBC systems, but they can only transmit image sequences which match the stored model, e.g. head and shoulders displays.
4/2/2003
4/2/2003
Model Based Coding (3)

n n
Key points:
n n
n n
n n
Another common way of representing objects in threedimensional computer graphics is by a net of interconnecting polygons. A model is stored as a set of linked arrays which specify the coordinates of each polygon vertex, with the lines connecting the vertices together forming each side of a polygon. To make realistic models, the polygon net can be shaded to reflect the presence of light sources. The wire-frame model [Welch 19911 can be modified to fit the shape of a person's head and shoulders. The wire-frame, composed of over 100 interconnecting triangles, can produce subjectively acceptable synthetic images, providing that the frame is not rotated by more than 30" from the full -face position. The model, (see the Figure) uses smaller triangles in areas associated with high degrees of curvature where significant movement is required. Large flat areas, such as the forehead, contain fewer triangles. A second wire-frame is used to model the mouth interior.
JPEG coding mechanism DCT/ Zigzag Scanning/ Adaptive Quantization / VLC MPEG layered structure:
q
Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice, Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream (PES) Prediction Motion compensation Scanning YCbCr formats (4:4:4, 4:2:0, etc) Profiles @ Level I,P,B pictures & reordering Encoder/ Decoder process & Block diagram
MPEG compression mechanism:

q q q q q q q
n n
MPEG Data transport MPEG Timing & Buffer control

q q
STC/SCR/DTS PCR/PTS
4/2/2003
76
Technical terms
n n n n n n n n n n n n n n
A Brief History:
q
Macro blocks HVS = Human Visual System GOP = Group of Pictures VLC = Variable Length Coding/Coder IDCT/DCT = (Inverse) Discrete Cosine Transform PES = Packetized ElementaryStream MP@ML = Main profile @ Main Level PCR = Program Clock Reference SCR = System Clock Reference STC = System Time Clock PTS = Presentation Time Stamp DTS = Decode Time Stamp PAT = Program Association Table PMT = Program Map Table
CATV appeared in the 60s in the US, where high buildings are the great obstacles for the propagation of TV signal. Old CATV networks
n n n n
Coaxial only Tree-and-Branch only TV only No return path ( high-pass filters are installed in customers houses to block return low frequency noise)
4/2/2003
77
4/2/2003
79
Chapter 3. CATV systems

n
Modern CATV networks

n
Key elements:
q
Overview:
A brief history q Modern CATV networks q CATV systems and equipments
q
q q
q q
CO or Master Headend Headends/ Hub Server complex CMTS TV content provider Optical Nodes Taps Amplifiers (GNA/TNA/L E)
4/2/2003
78
4/2/2003
80
Modern CATV networks (2)

n n
CATV systems and equipments
Based on Hybrid Fiber-Coaxial architecture also referred to as HFC networks The optical section is based on modern optical communication technologies
q q q
Star/ring/mesh, etc topologies SDH/SONET for digital fibers Various architectures digital, analog or mixed fiber cabling systems.
n n
Part of forward path spectrum is used for high-speed Internet access Return path is exploited for Digital data communication the root of new problems !!
q q
5-60 MHz band for upstream 88-860 MHz band for downstream
n n
88-450 MHz for analog/digital TV channels 450-860 MHz for Internet access
FDM
Nguyen Chan Hung Hanoi University of Technology 81 4/2/2003 Nguyen Chan Hung Hanoi University of Technology 83
4/2/2003
Spectrum allocation of CATV networks
Vocabulary
n n
Perception = Su nhan thuc Lap = Phu len
4/2/2003
82
4/2/2003
84

Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multimedia 1 Overview Chapter 1 2 3-Doc Den Trang 4-133

Uploaded by

Copyright:

Available Formats

Multimedia Technology

Introduction Chapter 1: Background of compression techniques Chapter 2: Multimedia technologies

Chapter 3: Some real-world systems

Chapter 4: Multimedia Network

Nguyen Chan Hung Hanoi University of Technology

Chapter 1: Background of compression techniques

Compression factor or compression ratio

Information content and redundancy

Information rate q Entropy is the measure of information content.

Expressed in bits/source output unit (such as bits/pixel).

The data from the decoder is identical to the source data.

Communication (reduce the cost of the data link)

Data ? Compressor (coder) ? transmission channel ? Expander (decoder) ? Data'

Recording (extend playing time: in proportion to compression factor

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

Sampling and quantization

Statistical coding: the Huffman code

Nguyen Chan Hung Hanoi University of Technology

Sensitive to data error

Concealment required for real time application

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

A coding example: Clustering color pixels

Motion Compensated Prediction

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

Unpredictable information from the previous frame:

Nguyen Chan Hung Hanoi University of Technology

Dealing with unpredictable Information

Types of picture transform coding

Types of picture coding:

The differences between transform coding methods:

DCT Lossy Coding

"Run-Level" coding = Coding a run-length of zeros followed by a nonzero level.

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

Compression process Quantization & Sampling Coding:

Nguyen Chan Hung Hanoi University of Technology

Chapter 2: Multimedia technologies

JPEG Zig-zag scanning

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

JPEG (Joint Photographic Experts Group)

Nguyen Chan Hung Hanoi University of Technology

JPEG - Quantization Matrix

MPEG (Moving Picture Expert Group)

MPEG is the heart of:

JPEG Coding process illustrated

MPEG-2 (Widely implemented)

-11 -10 0 8 -2 14 -13 6

MPEG-4 (Newly implemented still being researched)

DCT Coefficients Quantization result Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0

MPEG-7 (Future work ongoing research)

Easily coded by Run-length Huffman coding

MPEG-2 formal standards

Pixel & Block

Pixel = "picture element".

Nguyen Chan Hung Hanoi University of Technology

Nguyen Chan Hung Hanoi University of Technology

MPEG video data structure

Nguyen Chan Hung Hanoi University of Technology

P Pictures = Predicted Pictures