You are on page 1of 4

Adaptive Frame Level Rate Control in 3D-HEVC

Songchao Tan1, Junjun Si2, Siwei Ma2, Shanshe Wang3, Wen Gao2
1

School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
2
Institute of Digital Media, Peking University, Beijing 100871, China
3
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150080, China

AbstractIn this paper, we propose a new frame level rate


control algorithm for the high efficiency video coding (HEVC)
based 3D video (3DV) compression standard. In the proposed
scheme, a new initial quantization parameter (QP) decision
scheme is provided, and the bit allocation for each view is
investigated to smooth the bitrate fluctuation and reach accurate
rate control. Meanwhile, a simplified complexity estimation
method for the extended view is introduced to reduce the
computational complexity while improves the coding
performance. The experimental results on 3DV test sequences
demonstrate that the proposed algorithm can achieve better R-D
performance and more accurate rate control compared to the
benchmark algorithms in HTM10.0. The maximum performance
improvement can be up to 12.4% and the average BD-rate gain
for each view is 5.2%, 6.5% and 6.6% respectively.
Index Terms rate control, initial QP, bits allocation, 3D
video coding

I. INTRODUCTION
With todays rapid development of 3D acquisition and
display technologies such as auto-stereoscopic 3D TV [1] and
free-viewpoint video [2], 3D video has gained increasing
interests recently. To better support various stereoscopic or
3D video applications, Motion Picture Experts Group (MPEG)
has started the 3D video (3DV) standardization efforts, which
are currently continued in the Joint Collaborative Team (JCT)
on 3D Video coding (JCT-3V). Both H.264/AVC and HEVC
[3] are utilized to develop 3DV standards by JCT-3V. And as
one of them, 3D-HEVC which targets at developing tools for
both texture views and depth views is based on HEVC.
The typical 3DV is stereo-view video which provides each
eye with one video separately at the same time. The disparity
between these two videos causes the illusion of depth
perception for human. And in 3D-HEVC, due to the need of a
more comprehensive view, it involves a more general case of
n-view multi-view video. By using the depth image-based
rendering (DBIR) technique, we can get large view-angle
scene with fewer amounts of data [4].
Rate Control (RC) is employed in video coding to regulate
the bitrate meanwhile guarantee good video quality. As for
3D-HEVC, it becomes more complicated because multiple
views are involved in coding and within each view there are
two kinds of video sequences (i.e. texture and depth map).
Two RC algorithms Unified Rate Quantization (URQ) [5] and
R-lambda [6] are proposed for 3D-HEVC up to now. In [7],
an inter-view mean absolute difference (MAD) prediction
based on the depth map is introduced to improve the accuracy

of the MAD prediction. However due to the bit allocation for


each view is not suitable for 3D-HEVC, the performance of
PSNR, the dramatic fluctuation of generated bits and bits error
are unsatisfactory. Aiming at solving the problems of the
existing algorithms, a more suitable bit allocation scheme
which gives full consideration to the particularity of the new
video coding standard is used in the proposed RC algorithm.
As an important part of RC algorithm, a new initial
quantization parameter (QP) decision scheme is proposed for
3D-HEVC. Through the introduction of the videos
characteristic, our initial QP decision scheme can provide a
more smooth quality and less bitrate fluctuation. At the same
time, in order to achieve a higher quality and higher coding
efficiency, we improve the R-Q model by introducing interview complexity estimation.
The rest of this paper is organized as follows. In Section II,
the algorithm is briefly described. In II-A, the decision of
initial QP is proposed. The bit allocation for base and
extended view is investigated in II-B. And in II-C, inter-view
complexity estimation is designed for the proposed R-D
model. In Section III, the experimental results are given to
demonstrate the efficiency of the proposed RC algorithm.
Finally, Section IV concludes this paper.
II. PROPOSED RATE CONTROL ALGORITHM FOR HEVC-3DV
A. The decision of Initial QP
The initial QP decision is important for video coding. Bit
per pixel (bpp) based initial QP decision is adopted as
illustrated in (1).

bpp = bitrate

w h f

(1)

where bpp refer to the bit per pixel, w and h are the width and
height of the sequence. f is the frame rate of the video.
However, the bpp-based initial QP strategy widely used in
RC algorithm does not work well in 3D-HEVC. Because bpp
can just reflect the bitrate, but the characteristic of video is
ignored. For instance, for the sequence Poznan_Hall2 and
Undo Dancer, as shown in Table I, the bpp of sequence
Poznan_Hall2 when the anchors initial QP is 25 is same as
the bpp of sequence Undo_Dancer when the anchors initial
QP is 35. Obviously, by utilizing bpp-based initial QP scheme,
the initial QP will be too large for sequence Poznan_Hall2.
In this paper, by a lot of experimental analysis we propose
an improved scheme for the decision of initial QP based on
bpp per variance of gradient (BPVG) as follows.

978-1-4799-6139-9/14/$31.00 2014 IEEE

382

BPVG
= Bpp 105 / VG
VG = D(gradient)

(2)

where VG refer to the variance of each pixels gradient in the


first frame.
As shown in Table I, due to the characteristic of video is
considered reasonably, the proposed initial QP can solve the
above mentioned problem well.
TABLE I INITIAL QP FOR DIFFERENT SEQUENCE

Class

Sequence

Anchor
QP

Kbps

bpp

bpp based
Initial
QP

Proposed
Initial
QP

25

810.69

0.0155

35

25

30

344.89

0.0066

40

30

35

177.24

0.0034

47

35

40

98.51

0.0019

49

40

25

4626.44

0.0923

25

25

30

2005.90

0.0384

30

30

35

916.22

0.0175

35

35

40

426.33

0.0082

40

40

Fig. 1 Proposed Bit Allocation Scheme

To solve the problem, we carried out a series of related


research and find that due to the particularity of I-SLICE
which means the PSNR of a frame have no relation to the
frame before this I-SLICE and in a GOP there is only one ISLICE. Based on that, we proposed the bit allocation which
based on GOP.
As shown in Fig 1, the target bits for a frame is calculated
as follow (4),

Poznan_Hall2

targetn ,i

1920*1088

bpf ( GOP gop + 1) wi , n =


1

= bpf GOP wi ,
n [ 2, m 1] (4)

m
bpf ( frame _ last ) wi , n =

Undo_Dancer

bpf = bitrate

B. Bit allocation for 3D-HEVC


In this section, we first briefly review the bit allocation in
video coding and then detail our proposed method. Bit
allocation plays a key role in video coding. In transform
coding, bit allocation is performed by allocating the bits
budget to different groups of transformed coefficients to
minimize the overall quantization distortion. In [8], Huang
first presents the optimal bit allocation problem. Optimal bit
allocation is incorporated with rate control scheme in video
coding to improve the coding quality.
In [9], the bit allocation is designed to allocate the bits
weighted averagely for each frame and for I-SLICE the target
bits are multiplied by five because the quality of I-SLICE has
a great impact on the next few frames. For the Low Delay (LD)
configuration, the scheme works well, but in the Random
Access (RA) configuration, the performance and the bitrate
error are unsatisfactory. Through the analysis of the bit
allocation, we find the reason is that the scheme is based on
rate gop [10] (in RA, a rate gop is composed of eight frames)
as,

=
Tgop

(R F ) N

framerate

(5)

where targetn,i is the target bits for nth GOP, ith frame. wi is
the weight of different frame. frame_last is the number of the
last GOP.
When overflow or underflow occurs, the difference
between target bits and actual bits in a GOP will be distributed
equally to the rest waiting for coding GOP. At the same time,
the video buffer verifier (VBV) operation model is established
to smooth quality fluctuation.
C. Proposed R-Q Model
The trade-off between the output bitrate (R) and the quality
(D) of compressed video is determined by quantization step
size (Qs), which is indexed by quantization parameter (Q). The
R-Q model has been studied extensively for previous video
coding standards such as H.264/AVC and H.265/HEVC.
Based on experimental analysis, we propose the R-Q model as,

R= X / QP

(6)

where is the model parameter. R is the coding rate. X is the


complexity estimation for the current picture. QP is the
quantization parameter. X is computed as:

(3)
1

n 1
n

where R is the target bitrate, F is the frame rate of the X= ( wi SADi ) / ( wi SADi ) Rn 1 QPn 1
(7)
=
i 0=i 0

sequence and N is the number of frames in a rate gop. And


when a frame is I-SLICE, the target bits for it are multiplied
by five. But the extra bits cost by I-SLICE are just managed n is the current frame number. QPn-1 is the quantization
by overflow/underflow handling strategy instead of parameter of the (n-1) th frame. Rn-1 is the actual bits of the (n1) th frame. is a constant, the recommended value is 0.6. wi
considered carefully. That explains why the scheme can work
is the weight of SAD values of previously encoded frames.
well just in LD which has only one I-SLICE.
Due to the relationship between extended view and base
view as shown in Fig 2, we can see that there is strong

383

the sum of the complexity * 106

50
45

view 1

40

view 2

35

view 0

30
25
20
15
10
5
0
0

10

20

30

40

50

60

70

frame numbers
Fig. 2 The complexity (SAD) between view 0, view 1 and view 2

targetn
,
target0

(n =
1, 2)

(8)

where SADviewn is the complexity of the nth view, targetn is


the target bits of view n.

Rtarget

(9)

100%

where Error is the bit error, Ractual is the total bits used to
encode the depth map and texture map of three views; Rtarget is
the target bits of that.
The errors of proposed, URQ model and R-lambda model
are presented in Table II. We can see that the proposed RC
scheme is accurate enough that the mismatch is less than 1.0%
on overage.
Fig. 3 shows the R-D curves of the two algorithms for
sequences Poznan_Hall2 and GT_Fly. From that, we
can see that the proposed scheme can achieve a larger PSNR
value at each target bit rate for each of the four sequences than
the R-lambda scheme.

III. EXPERIMENTAL RESULTS


The experiments are conducted under 5-view scenario,
where 3 views are coded, and 2 virtual views are synthesized.
The testing sequences include Kendo, Balloons,
Newspaper,
GT_Fly,
Poznan_Hall2,
Poznan_Street, Undo_Dancer and Shark. The texture
and depth map of three views are separately encoded with
HTM10.0 encoder started as I-view, P-view and P-view
respectively. For P-frame, the interview prediction is involved.
300 frames are encoded for each view.

Poznan_Hall2(view 1)

PSNR (dB)

Poznan_Hall2 (view 0)
42.5

42

41.5

41

40.5
39.5

R-lambda

38.5

Proposed

40
39

R-lambda

38

Proposed

37

37.5
0

200

400

600

800

100

Bit Rate (kbps)

300

GT_fly (view 1)

40.5

40

38.5

38

36.5

200

Bit Rate (kbps)

GT_fly (view 0)

PSNR (dB)

A. R-D Performance and Rate Accuracy


In order to evaluate the performance of the proposed RC
algorithm, R-lambda algorithm proposed in [6] is utilized for
comparison. First, the R-D curve between proposed and Rlambda is shown in Fig.3. Then the performance compare
with RC in the latest reference software HTM10.0 will be
shown in Table II.
To evaluate the accuracy of the bit rate achievement, the
following measurement is adopted.

| Ractual - Rtarget |

PSNR (dB)

SADviewn =
SADview0

=
Error

PSNR (dB)

correlation between adjacent viewpoints. Through a lot of


experimental analysis, the complexity of extended views
frame can accurate replaced by base views frame which just
finished coding in the same GOP after appropriate scaling. It
should be noted that I-SLICE should consider independently,
because extended views frame has no I-SLICE. Based on
statistical analysis, we proposed a linear model to estimate the
complexity of the dependent views as follow (8),

R-lambda

36
R-lambda
34

34.5

Proposed

Proposed
32

32.5
0

1000

2000

3000

Bit Rate (kbps)

4000

100

200

300

400

500

600

Bit Rate (kbps)

Fig. 3 The rate distortion curves of the proposed rate control scheme as well
as the rate control in reference software HTM10.0 (R-lambda).

384

TABLE II THE PERFORMANCE BETWEEN R-LAMBDA AND PROPOSED RC SCHEME


Proposed vs R-lambda

Proposed vs URQ

Bitrate Error
R-lambda
Proposed
bitrate
bitrate
error
error

Video 0

Video 1

Video 2

Video
PSNR

Video 0

Video 1

Video 2

Video
PSNR

URQ
bitrate
error

Balloons

-3.2%

-4.5%

-5.2%

-3.9%

-25.3%

-45.0%

-43.4%

-33.0%

0.59%

1.47%

0.26%

Kendo

-4.3%

-4.8%

-5.3%

-4.3%

-33.0%

-50.2%

-50.0%

-39.9%

0.82%

0.58%

0.33%

Newspaper_CC

-1.8%

-2.0%

-2.8%

-2.5%

-27.6%

-55.2%

-42.2%

-37.2%

1.83%

2.61%

0.49%

GT_Fly

-6.8%

-12.5%

-12.3%

-8.4%

-31.1%

-52.4%

-51.8%

-36.4%

0.64%

1.46%

1.58%

Pozan_Hall2

-12.4%

-12.2%

-12.1%

-12.2%

-14.5%

-30.9%

-32.4%

-21.9%

0.69%

2.88%

0.44%

Poznan_Street

-3.9%

-5.2%

-4.3%

-4.1%

-33.4%

-48.7%

-69.4%

-44.7%

0.32%

6.01%

3.53%

Undo_Dancer

-6.6%

-6.1%

-6.5%

-6.3%

-24.8%

-42.2%

-40.9%

-30.4%

0.33%

0.98%

0.31%

Shark

-2.8%

-4.4%

-3.8%

-3.6%

-12.1%

-16.8%

-14.3%

-14.4%

0.51%

1.43%

0.40%

1024*768

-3.1%

-3.7%

-4.4%

-3.6%

-28.7%

-50.1%

-45.2%

-36.7%

1.08%

1.55%

0.36%

1920*1088

-6.5%

-8.1%

-7.8%

-6.9%

-23.2%

-38.2%

-41.7%

-29.6%

0.50%

2.55%

1.25%

average

-5.2%

-6.5%

-6.6%

-5.7%

-25.2%

-42.7%

-43.0%

-32.3%

0.72%

2.18%

0.92%

B. Verification the Smoothness of Quality


In this section, we verify the effectiveness of the proposed
bits allocation scheme in Section II-B. Because of the quality
is unacceptable comparing with the other two algorithms. We
choose the sequence Kendo, initial QP 25, the generated bits
curve between proposed RC algorithm and R-lambda
algorithm in HTM10.0. As shown below in Fig. 4, R-lambda
model occur bit fluctuation more severe, which can make
some transmission delay. Thus, for strict real-time
applications proposed scheme is more suitable than R-lambda
model.

bits *103

REFERENCES
[1]

[2]

[3]

R-lambda

160

ACKNOWLEDGEMENT
This work was supported in part by the National High-tech
R&D Program of China (863 Program, 2012AA010805)
National Science Foundation of China (61322106, 61390515),
which are gratefully acknowledged.

Proposed

120

[4]

80

40

[5]

0
150

160

170

180

190

200

210

220

230

240

250

frame number

[6]

Fig. 4 generated bits curve of Kendo

[7]

IV. CONCLUSIONS
In this paper, we proposed a frame level RC algorithm to
achieve the better quality for 3D-HEVC. Based on our models
for the R-Qs and the bits allocation scheme, an R-D optimized
RC algorithm is derived. Experiments are conducted on
different video sequences and the results demonstrate the
effectiveness of the proposed algorithm.

[8]

[9]

[10]

385

G. Eason, B. Noble, and I. N. Sneddon, On certain integrals of


Lipschitz-Hankel type involving products of Bessel functions, Phil.
Trans. Roy. Soc. London, vol. A247, pp. 529-551, Apr. 1955.
3D-HEVC Test Model Description draft 1, JCT on 3D Video Coding
Extension Development of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC
29/WG 11/JCT2-A1005, Sep. 2012.
B. Bross, W.J. Han, J. Ohm, G. Sullivan and T. Wiegand, High
efficiency video coding (HEVC) text specification draft 8, JCTVCJ1003, Joint Collaborative Team on Video Coding (JCT-VC) of ITUTSG16 WP3 and ISO/IEC JTC1/SC29/WG11, JCTVC-J1003_d2.doc,
10th Meeting: Stockholm, Sweden, 11-20 July, 2012.
Siwei Ma, Shiqi Wang, and Wen Gao, Zero-Synthesis View
Difference Aware View Synthesis Optimization For HEVC Based 3D
Video Compression, Visual Communication and Image Processing,
San Diego, USA, Nov.2012.
H. Choi, J. Nam, J. Yoo, D. Sim, and I. V. Baji, Rate control based
on unified RQ model for HEVC, JCT-VC of ITU-T SG16 WP3 and
ISO/IEC JTC1/SC29/WG11, JCT-VC H0213 (m23088), San Jos, CA,
USA, Feb. 2012.
B. Li, H. Li, L. Li, and J. Zhang, Rate control by R-lambda model for
HEVC, ITU-T SG16 Contribution, JCTVC-K0103, Shanghai, pp.1-5,
Oct. 2012.
W. Lim, D. Sim and I. V. Baji, JCT3V Improvement of the rate
control for 3D multi-view video coding, ISO/IEC JTC1/SC29/WG11,
JCT3V-C0090, Geneva, Switzerland, Jan. 2013.
J. J. Huang and P. M. Schultheiss, Block quantization of correlated
Gaussian random variables, IEEE Trans. Commun. Syst., vol. CS-11,
pp. 289296, Sept. 1963.
J. Si, S. Ma, W. Gao, JCT-VC-Adaptive rate control for HEVC
ISO/IEC JTC1/SC29/WG119, JCTVC-I0433, Geneva, CH, 27 April-7
May 2012.
S. Wang, S. Ma, S. Wang, D. Zhao, and W. Gao, Rate-GOP Based
Rate Control for High Efficiency Video Coding, IEEE Journal of
Selected Topics in Signal Processing, vol.7, no.6, pp.1101-1111,
Dec.2013.