You are on page 1of 9

Frameaccurate Compressed Domain Splicing

Dr Kevin W. Moore (Mediaware International)

Abstract
Is the seamless frameaccurate splicing of compressed digital video a myth or is it madness? Many believe that it is impossible to seamlessly and frameaccurately splice together two streams of long GOP MPEG2 or MPEG4 AVC video without taking both streams back to baseband and reencoding or rendering to all Iframes, since both these formats rely heavily on temporal compression. In this paper we show how splicing can occur on any frame not just Iframes, how to fix frame dependencies which are broken during splicing by frametype conversion and recoding, and how to ensure that video buffer verifier (VBV) constraints are preserved by massaging the video bit rate in a region around the splice point. Compressed domain splicing technology offers many potential benefits to broadcasters including improvements to video and audio quality, streamlining technical architecture and workflow, and a significant reduction in cost when properly implemented. To show how compressed domain splicing can be used in a realworld broadcast network, a case study is presented describing how Prime Television deployed a compressed domain splicing solution to enable them to deliver locally targeted advertising and region specific content in each of their markets for both their SD and HD services.

Introduction
Compressed domain splicing is the process of digitally switching from one compressed digital video signal to another without decoding either signal back to baseband (uncompressed). Seamless frameaccurate compressed domain splicing is desirable in a commercial broadcast environment because it can reduce costs, streamline operations and improve the quality of the digital broadcast. Compressed domain video splicing can occur between two live video streams, between a live stream and a file stored on a server, and between two files stored on a server (during play out). The diagrams below show several basic of the basic configurations.

The perception that MPEG2 and MPEG4 AVC video streams cant be spliced or edited frame accurately at any frame has been a popularly held belief since their adoption despite the fact that

theoretical papers describing how long GOP MPEG2 could be natively edited have been around since 1995, shortly after the MPEG2 standard was published. Theory aside, frameaccurately splicing MPEG video streams seamlessly without going back to base band is not easy and there are many pitfalls. While compressed domain editing has been around commercially since the late 90's, the technology has not become available in main stream products until recent years. Even today, most Digital Program Insertion (DPI) systems on the market require the original content to be encoded with special encoders that include ad insertion markers in the MPEG2 transport stream to trigger splicing at predetermined frames in the video stream. So why is it hard? It is hard because unlike baseband video signals, MPEG2 and MPEG4 employ interframe coding. They exploit the temporal redundancies in the video frames by storing the differences between frames rather than storing every frame in its entirety. These dependencies make it difficult to splice at any frame since by doing so may break the encoding structure and prevent the clean decoding of frames around the splice point. It is hard because the MPEG standards define buffer models that place constraints on the dynamic behavior of the video stream. These buffer models are required because the number of bytes used to encode each frame within the video stream may vary significantly depending on frame type and scene content and therefore must be buffered appropriately by the decoder. When splicing or editing MPEG streams, if the dynamic buffering behavior is not properly handled, downstream decoders may exhibit jerky playback when their buffers overflow or underflow. This paper reviews a number of different solutions for dealing with frame dependencies and managing buffer levels in a hope to convince the reader that frameaccurate compressed domain splicing of MPEG video is possible. To convince the reader that it is also a practical and costeffective alternative to baseband switching, a case study with Prime Television will be presented. This case study will describe how Prime Television integrated a HD compressed domain splicing solution into their existing regional broadcast network.

Frame-accurate Splicing
Unlike baseband video signals where every video frame is independently decodable and discrete, both MPEG2 and MPEG4 encoding employ interframe coding where different frames are stored as difference frames from their past and/or future frames (in display order). In MPEG2, three frame types are defined: Intra Coded Pictures (IPictures) are coded without reference to other pictures and only provide moderate compression. Predictive Coded Pictures (P Pictures) are coded more efficiently using motion compensated prediction from a past I or Pframe.

I B B P

Bidirectionallypredictive Coded Pictures (BPictures) provide the highest degree of compression but require both past and future reference pictures for motion compensation.

I B B P

I B B P

For MPEG2, the typical frame pattern uses a half second GOP (12 frames for 25fps video, 15 for 29.97/30fps video) as shown. GOP (display order)

P B B I B B P B B P B B P B B P B B I
Splicing or cutting the video at either a P or Bframe will break the dependencies. The only safe place to splice in an MPEG stream is at the Iframe beginning a closed1 Group of Picture (GOP). If the splice points are known in advance then a process known as Iframe insertion can be used. In I frame insertion, the encoder is told which frame the splice will occur at ahead of time, and will alter the frame structure of the MPEG Video during encoding to ensure that an Iframe and closed GOP occurs at the splice point. Markers or flags in the transport stream preempt the splice point to ensure the splice takes place at the appropriate time. The main limitation of this approach is that the encoder has to be under the control of the system inserting the interstitial content and that the splice points are known in advance, which is not always the case. When splice points are not known in advance, a common solution is to wait and switch at the next available Iframe. The IFrame switching model is not highly desirable as it does not guarantee that the switch is exactly on a program or scene boundary. While these approaches are able to replace a baseband switching system under some circumstances, their limitations prevent them from being more widely deployed. When splicing two arbitrary but compatible streams together, the most likely case is that the splice points do not line up with Iframes as per the following diagram:

Stream 1

BPBBPBBPBBIBBPBBPBBPBBPBBIB BBIBBPBBPBBPBBPBBIBBPBBPBBP
Splice point

Stream 2

A closed GOP means that frames from the current GOP cannot reference any frames from the previous GOP

If the streams are spliced at this point, then the resulting stream will contain a number of broken frames as shown in the following diagram. Broken Frames

Spliced Stream

BPBBPBBPBBIBBBPBBIBBPBBPBBP
Splice point

There are a number of published approaches that range from straight reencoding the broken frames in the region of the splice point, to smart techniques using frametype conversion that reuse many of the original compression information to minimize image degradation. The following diagram shows several solutions to the splice problem above. Recoded/structured Frames Spliced Stream

BPBBPBBPBBIPPIPBBIBBPBBPBBP B P B B P B B P B B I B B PB B P I B B P B B P B B P BPBBPBBPBBIBBIBBPIBBPBBPBBP

Spliced Stream Spliced Stream

While any decoding and reencoding introduces loss, a properly designed splicing engine will minimize the image degradation around the splice point, and frames outside the region of the splice will suffer no loss of image quality; unlike baseband splicing solutions which require full decode and reencode of the streams. One of the main unspoken issues with recoding splice points is that the coding efficiency of the new region is invariably reduced from the original structure. Any new structure requires the creation of a new Iframe and the creation of shorter GOPs. Since B and Pframes are more efficient than I frames, reducing the ratio of B and Pframes to Iframes reduces the coding efficient of the video in the region of the splice. Increasing the bits used to represent the new GOP will counter the reduction in coding efficiency however, as the next section shows, increasing the bits may cause buffer issues in downstream decoders which result in playing issues.

Splice Point Stream 1 Stream 2

Spliced Stream Shorter less efficient GOPS

Video Buffer Constraints


The MPEG2 standard defines the Video Buffer Verifier (VBV) requirement on the video bit stream and the MPEG4 AVC standard defines a similar Hypothetical Reference Decoder (HRD) model. In both standards, the video bit streams dynamic behavior must conform to the constraints imposed by these buffer models. Bytes are fed into these buffers at a certain rate (bit rate) and all the bytes for each video frame are removed from the buffer at the decoding time. If the buffer never overflows or underflows over the duration of the video sequence then the video is well defined. If overflow or underflow does occur, then downstream decoders may suffer from playback issues. Normally the rate control mechanism of the encoder is in effect for the entire duration of the video sequence to ensure that the buffer model parameters are followed. When splicing two video streams together from possibly different encoders however, the integrity of the original rate control mechanism may be lost. For MPEG2, Iframes are typically several times larger than Pframes, which in turn are several times larger than Bframes. For scenes which very little movement, the best coding quality is obtained by using larger Iframes and smaller P and Bframes. For scenes with lots of movement, using smaller I frames and larger P and B frames produces the best quality. The following diagram shows a typical decoder buffer level for an MPEG2 video stream using an IBP structure and GOP size of 12.
I Frame P Frame

Buffer Level
B Frame

Time

If this sequence is spliced into another sequence mid way through the GOP then the decoder buffer may underflow as shown in the diagram below, if the next sequence starts with a large Iframe.
Second sequence causes underflow of the buffer

Buffer Level

Time By massaging the bit rate of the frames around the splice point, this underflow can be avoided, as shown below.
Massage the bit rate of some neighborhood of frames to prevent underflow

Buffer Level

Time Whether the entire splice region is reencoded or smart framerestructuring is performed, it is critical that the VBV/HRD video buffer levels are taken into account and managed. One additional note is if the size of the GOP resulting from the splice is small, the reduction in coding efficiency may mean that any attempt at reducing the bit rate results in undesirable quantization artifacts. Under these conditions, the short GOP can be merged with the neighbouring GOP (turning that GOPs Iframe into a Pframe) which will improve the coding efficiency and resulting image quality.

Splicing in a commercial environment


For a frameaccurate compressed domain splicing system to be successfully deployed into a broadcast environment it must address a number of engineering issues. If the coding profiles and bit rates of the splicing sources do not match the system has to be able to convert one source. Typical conversions include: Aspect ratio conversion; changing SD to HD or HD to SD

Matching bit rates (transrating) Transcoding and replicating audio streams

If either source carries ancillary streams or service information these must be spliced or passed through as required. For instance, the teletext stream can be passed through from the primary source, follow the spliced source, or both (e.g. the Teletext captions can be spliced from the current spliced source and the remaining Teletext channels spliced from the primary source). It must be able to interface with existing automation systems so they can control precisely when they splice and to receive status. It must be able to support the workflows of existing baseband switching systems.

Prime HD Television Case Study


To demonstrate how compressed domain splicing can be practically used, a case example will be described; a splicing system was installed at Prime Television in late 2008 to provide a cost effective move to regionalised High Definition. The business objectives for Prime were to add high definition services in line with their centralised operation and existing market regionalisation capability; without significantly impacting operational cost or operational workflow.

The key requirements for project success were: To improve the quality of the DirectToHome Service with the addition of regionalised HD services To provide a simple mechanism for the addition of new commercial markets To add minimal additional operational expense

Primes original architecture performed ad insertion by first decoding the content, switching in baseband, and subsequently reencoding for transmission. The major draw back with this approach is that the equipment required is costly; requiring the presence of decoders, encoders, video servers, and baseband upconverters in each market. In addition, the distribution feed has to be encoded at a higher bit rate than necessary for broadcast in order to offset the quality loss due to the decode and reencode cycle. The combined expense of distribution bandwidth, compression hardware and baseband infrastructure limited the use of local ad insertion to only those markets that could justify

the cost. These costs are multiplied for each new SD service introduced and higher again for new HD services. Prime however would not consider a compressed domain splicing architecture a viable option unless the insertion and switching paradigm which their operators were use to working with could be supported. In simple terms, the splicing solution had to replicate their existing baseband operation to be a success. To meet the projects objectives, frameaccurate compressed domain splicers were deployed into all Primes local markets near the transmitters. Conversion servers were deployed to take the ad content from the existing SD Video Server/Library and conform it to match the HD Broadcast profile.

A centrallycontrolled schedule is used to coordinate the splicing of local ads and region specific linear content directly into the main network feed. Individual splicers are controlled through a TCP/IP interface using proprietary protocol (ASCP Asynchronise Splicer Control Protocol) derived from traditional Video Disk Control Protocol (VDCP) & GVG TenXL switcher protocols. The transcoding of the protocols is bdirectional thereby providing feedback into the automation using the traditional protocols without any automation changes required. ASCP provides a command set to initiate splices and return the status and contents of the local library of interstitial content. Splice commands are referenced against the SMPTE Timecode carried in the video, allowing the splicer to frame accurately insert material into the primary stream. ASI IP
HD Service Splicer MPTS SPTS SD to HD Conversion Server HD TS Files Automation Interface

Existing SD Video Server/library

This solution allowed Prime to architect their network in such a way as to deliver simulcast locally targeted advertising and region specific content in each of their markets for HD services matching the SD service enabling them to maintain continuity without compromising their operational flexibility and workflow. For viewers this approach has provided a traditional feel when watching HD with seamless delivery of program and short from material.

Prime, based on the experience of the HD solution, have now commenced deployment and validation of a new platform to replace the SD baseband insertion that will introduce both MPTS and multiple ASI switching. Future automation integration with the splicers will be via direct ASCP, removing the need to have protocol translators from traditional broadcast devices.

Conclusion
This paper has identified and presented solutions to the key challenges in performing seamless frameaccurate compressed domain splicing of MPEG2 and MPEG4 AVC streams. It has shown how frame dependencies which are broken during the splice can be corrected by recoding and restructuring GOPs at the splice point. It has highlighted the potential problems with video buffer levels and shown how these can be addressed by massaging the bit rate in a region around the splice point. To further show that seamless frameaccurate compressed domain splicing is a practical and cost effective solution in a commercial broadcast environment; a case study was presented showing how Prime Television deployed a splicing solution to enable them to deliver locallytargeted advertising and region specific content in each of their markets for both their SD and HD services.

Bibliography
[1] ISO/IEC 13818 MPEG2 Standards [2] ISO/IEC 14496 MPEG4 Standards [3] Jianhao Meng, ShihFu Chang,Buffer Control Techniques for CompressedDomain Video Editing, IEEE Proc. International Symposium on Circuits and Systems, ISCAS Vol.2 pp. 600603, 1996. [4] R. Egawa, A. A. Alatan, and A. N. Akansu, Compressed domain MPEG2 video editing with VBV requirement, IEEE Proc. Internal Conference on Image Processing, ICIP, Vol. 1, pp 10161019,2000 [5] Akio Yoneyama, Yasuhiro Takishima and Yasuyuki Nakajima, A Fast Frameaccurate H.264/MPEG4 AVC Editing Method, IEEE International Conference on Multimedia and Expo, ICME Vol. 6, pp 12981301, 2005 [6] K. Talreja, P. V Rangan, Editing techniques for MPEG multiplexed streams, IEEE International Conference on Multimedia Computing and Systems '97, pp 278285, 1997 [7] P. J. Brightwell, S. J Dancer and M. J Knee, Flexible Switching and Editing of MPEG2 Video Bitstreams, International Broadcasting Convention (IBC 97), IEE Conference Publication pp 547552.

You might also like