You are on page 1of 4

Real Time Super Resolution Reconstruction for

Video Stream based on GPU

Jie Hu Hailiang Li Ying Li


School of Computer Science, Department of Electronic and School of Computer Science,
Northwestern Polytechnical Information Engineering, The Hong Northwestern Polytechnical
University Kong Polytechnic University University
Xian, China Hong Kong, China Xian, China
jiehu@mail.nwpu.edu.cn Harley.li@connect.polyu.hk lybyp@nwpu.edu.cn

AbstractThis paper proposes an efficient and structure- bandwidth and server storage. The users of smart phone pursue
based image/video (interframe independent) super resolution to get higher quality images and videos by translating their LR
(SR) scheme. Image/video SR is currently a very active area of counterparts online. All of these requirements are promising to
research because it is used in various applications. The basic idea be fulfilled in low cost by using super resolution(SR)
of the proposed scheme is based on the concepts of pattern technique.
redundancy and parallel computing. Global and local motions of
pixels are well estimated by exploiting the repeated and SR is a computationally complex and numerically ill-posed
structured patterns existing in the most nature images. The value problem. In recent years, many researchers have paid much
of a missing pixel in the desired high resolution image is obtained attention on the learning-based SR methods[2,3], in which the
by calculating the weighted average of the selected pixels constraints defined by human in regularization methods are
according to the estimated motions, which is able to preserve the replaced with the learned co-occurrence prior-knowledge for
intrinsic geometric structure of the original low resolution solving the ill-posed problem. However, the learning-based SR
image/video stream. After that, the estimated pixels values will and sparse representation based image SR methods are both
be re-corrected by using iterative back projection (IBP) approach with high complexity in computation, and are strongly
in which auxiliary high frequency information embedded. The dependent on the training set used, these crucial roadblocks
proposed scheme is mainly implemented in parallel badly restrict the applications of these methods in real world.
programming strategy based on Compute Unified Device Some SR methods are based upon interpolation. In fact that the
Architecture (CUDA), to reconstruct desired high resolution
classical methods commonly used in image/video processing
image/video steam from its low resolution counterparts.
Experimental results show that the proposed method has high
software and hardware products are bilinear interpolation,
performance on both the visual quality and real time processing cubic convolution interpolation[4] and cubic spline
speed. interpolation[5]. These common interpolation methods are with
low complexity, but are weak in adapting to varying pixel
Index Termssuper resolution, non-local similarity, motion structures in a scene due to the inherent drawback of scene-
estimation, real time, GPU, CUDA independent interpolators, which often concomitantly produce
jaggy artifacts, blurring, and ringing effects. Zhang and Wu[6]
I. INTRODUCTION proposed a two steps interpolation method including
interpolating a missing pixel in two orthogonal directions
Spatial resolution is one of the most important quality respectively and fusing the two directional estimations under
metrics of digital images. The higher the resolution, the more the minimum mean square error criterion. Dong etc.[7]
image details. It is desired to convert the low resolution(LR) combined the non-local similarity in natural image and IBP
image or video stream to high resolution(HR) ones before they method to produce sharper edges and fewer artifacts than the
are displayed, in real time, on the screens of the users devices, previous interpolation techniques described.
such as television, personal computer and mobile terminals. In
addition, web videos need video sequence enhancement[1] In order to make a good tradeoff between computational
because they are stored in low quality due to the limited complexity and reproduction quality, and promote more and
more implementations of exciting SR technology outside
research labs, we are inspired by the parallel programming
This work was supported by the Research Fund for the Doctoral
Program of Higher Education of China (20126102110041), the technology applied in the image/video processing field with the
Natural Science Basic Research Plan in Shaanxi Province of China development of Graphic Processing Unit (GPU). This paper
(20125153025), the Doctorate Foundation of Northwestern proposes an efficient and structure-based image/video
Polytechnical University and the Technology Research Plan of (interframe independent) SR scheme. The idea of the proposed
Ministry of Public Security (Key Project 2014JSYJA018). scheme is based on the concepts of pattern redundancy and
parallel computing.

978-1-4799-6284-6/14/$31.00
c 2014 IEEE 9
II. PROPOSED METHOD local motion information inherently contained among the
This section presents a novel practical pixel-level video continuous frames. Moreover, searching in a single image
super resolution scheme that is to a large extent implemented instead of the whole image sequence, only in order to make a
on CUDA programming platform. It is motivated from the good use of space-time redundancies of the video.
edge-guided single image super resolution techniques focusing
on preserving edge-like details in the image. The proposed C. Local Motion Estimation
scheme consists of three parts: initial interpolation of image, In the early literatures, many spatial-domain SR algorithms
motion estimation, fusion and adjustment of missing pixels adopted only whole translation or other parametric global
values. In this case, the bicubic interpolator works for SR motion model, which have produced good performance on
initialization, despite any existing interpolation method could some scenes, however, it is very hard to meet more complex
be a choice, which means to demonstrate the efficiency of the motion present in nearly all videos. Thus local motion is
proposed scheme. employed in this proposed SR reconstruction method to depict
x kth xk more accurate motion information. Moreover, this makes a key
kth
contribution to connecting the single image SR problem with
yk
video sequence SR problem.
( L1 , L2 ) +
nk

Fig. 1. Observation model relating LR images to HR images.

space
A. Observation Model
To present a basic concept of SR reconstruction technique,
the observation model for still images is employed in this
article, from which the video sequence observation model can t t+1 t+2 time
be extended straightforward.
(a) (b)
It is a common way to establish a forward relation model Fig. 3. (a).Global motion and local motion in a video sequence.(b).Local
motion vectors in a searching window.
for the solution of SR reconstruction problem, and it could be
represented as the following formula in this paper. Fig. 3(a) illustrates the global motion and local motion
existing simultaneously in a video sequence by a simple way.
y = DBMx + n , (1) In Fig. 3(a), the red patch and green patch random selected in
the image move to new locations with a same motion vector
where M , B , D , n denote warping, blurring, down
from time t to t+1, which can present a global translation
sampling operators and the noise treated commonly as the motion model, and then from time t+1 to t+2, the red, green,
additive Gaussian white noise respectively. y is the observed and blue patches move to next locations with three different
LR image corresponding to the desired HR image x . motion vectors, which is resulted from the local motions of
different objects in the image. Moreover, this behavior often
In general speaking, it is crucial to find M , B , D as appears irregularly in the whole video sequence. How can
accurate as possible in the forward relation observation model. describe this natural motion model becomes a crucial problem.
Non-local similarity, as mentioned above, plays an amazing
B. Non-local Similarity Descriptor role in motion estimation. As shown in Fig. 3(b), a preset
There is no way to get an optimal image representation threshold is defined, it is a good way to estimate or update the
model for all images in the image processing field. It is proper missing pixel value using the weighted fusion value of the
to design an efficient feature descriptor, maybe several searched similar pixels. It has been clearly seen that repeated or
descriptors simultaneously, directed at specific image/video similar patterns denoted by 55 patch in a relative searching
applications. window, not only reflect the structural similarity at the scale of
patch size in a single image, which can be used to preserving
edge-like structural details in most image restoration problems,
but also indirectly describe the motion trend in the range of
searching window, which can be treated as local motion, and
even global motion if the most motions trend keep uniform,
Fig. 2. Examples of image repetitive structures
between neighbor frames in a video sequence captured at a
According to the credible survey and statistic, there are normal frame rate.
often many patterns and structures appearing repeatedly
throughout the nature images. Fig. 2 gives a straightforward Noticeable, the non-local similarity descriptor works on not
demonstration of this phenomenon. Such non-local self- only preserving the edge-like structural details in the still
similarity information is very helpful to improve the quality of image, but also replacing the local motion estimation among
reconstructed images. This paper proposes a method that the neighboring frames, which can be done in just current
constructs the non-local similarity descriptor to present the frame with the similarity measurement in a defined searching
structural features of the image and help predict the global and window. One point needs to extra explain is that we avoid

10
processing the heavy corrupted frames resulted from motion platform gives a reasonable excuse to examine the possibilities
blur effects during the video capture procedure, due to the of using GPU for accelerating SR reconstruction techniques.
percentage of heavy corrupted frames in an acceptable whole
video is certainly low. And an uniform processing for each In order to weaken the negative effects of numerous branch
frame is desirable in order to keep fast or real time processing, instructions which are always the bottleneck for algorithm
which will actually not real weaken the feeling quality of our implementation on CUDA, Trying to uniform the instructions
visual perception. for each pixel is very important and desired for algorithm
design. Corresponding to the main flow diagram shown in Fig.
4, the proposed method in this paper parallelizes the most parts
D. Marked Iterative Back Projection in different degrees on CUDA in the order of initial bicubic
The IBP technique is an SR reconstruction method in interpolation, motion estimation, Gaussian filtering for
spatial domain, it can minimize the reconstruction error by smoothing, and IBP procedure. Especially, the most time-
iterative back projecting the reconstruction error into the consuming partmotion estimationis be done by the
reconstructed image. The following formula demonstrates the strategy adopted in [10]. The other operations such like logical
procedure of IBP. control of the whole algorithm, data stream importing and
exporting are processed by CPU.
Ih(t +1) = I h(t ) + GBP ( Il Il(t ) ) = I h(t ) + G BP ( Il GI h(t ) ) , (2)

where t denotes the iterations, empirically chooses a integer


lies in the interval [5,8] while guarantees convergence that has
BP
been proved in [8], G denotes the back projection operator
and is hard to select in practice, which is relative to the specific
1
algorithm, but often equals G . Obviously, IBP technique is
simple and easy to implement, but it is hard to effectively
utilize the prior knowledge of image, and often produces many
jaggy and ringing artifacts around the edge that mainly
because it plays an isotropy role of each pixel. On the contrary,
this suggests us to focus on anisotropy post processing based Fig. 5. CUDA programming model of proposed SR reconstruction algorithm.
on IBP if we have auxiliary constrained information of the high Fig. 5 depicts a simple but not exact visual procedure of the
frequency in the image, and that is what the improvement has proposed algorithm implementation on CUDA. Therefore, the
been done based on IBP in this paper. Marking all the relative proposed scheme gets a good hand on processing speed while
matching pixels according to groups result from the motion achieving nice visual quality.
estimation in a one-zero binary matrix, value one indicates that
the corresponding pixel works during the IBP procedure, and
IV. EXPERIMENTAL RESULTS
value zero means no contribution.
To evaluate the performance of the proposed SR
The main framework of the proposed method is reconstruction algorithm on CUDA, all the tests are simulated
demonstrated as follows: under the following development environment: (1) Intel
Core2 Duo CPU E8400 @ 3.00GHz with 2.96GB memory,
(2) NVIDIA GeForce GTX460 with 1GB DRAM, (3)
Microsoft Windows XP_32bit sp2, (4) Microsoft Visual Studio
2005, (5) CUDA Toolkit and SDK 4.0, (6) NVIDIA Driver for
Microsoft XP with CUDA Support (259.31).
Because of the motion estimation in the proposed method is
Fig. 4. Main flow diagram of the proposed method the full-search motion estimation strategy, the whole procedure
In order to reduce the tremendous computational cost in is mainly picture-independent. Some popular test sequences,
SR reconstruction procedure, we will discuss how to accelerate such as foreman, silent, coastguard (cif, 352288) and so on,
it on CUDA in the following section. are examined with an empirical configuration of 2121 search
range, 55 image patch, and 5 iterative times. After studying
some existing surveys about image/video quality assessment
III. CUDA IMPLEMENTATION and considering the characteristics of the practical SR
NVIDIA announced a powerful GPU architecture called applications with image/video media play function, we prefer
Compute Unified Device Architecture (CUDA)[9] that to measure the image/video SR reconstruction results by visual
developed very well in a wide range of applications. It is quality directly perceived through the senses in subjective and
basically a single program multiple data (SPMD) computing blind/referenceless image spatial quality evaluator (BRISQUE)
device. In the CUDA programming framework, the GPU can index (the lower value, the better visual quality) in objective
execute simultaneously thousands of threads that organized under the situation of no reference HR image/video[11].
into logical independent blocks, each of which is mapped onto
a multiprocessor in the GPU. High parallelism of CUDA

11
V. CONCLUSION
This paper proposes an efficient and structure-based
image/video (interframe independent) super resolution (SR)
scheme. The proposed scheme digs the time-space redundant
information within the video stream, though it is motivated
from the single image super resolution technique, and fully
(a) (b) (c)
considers the parallelism in the procedure of algorithm design
Fig. 6. An example of visual comparisons. (a) original 1st LR frame in and execution decomposition under CUDA platform. Non-
foreman.cif video sequence, (b) The SR result by using bicubic interpolation local similarity not only keeps a high performance on
method, (c) The SR result by using the proposed method.
interpolating and denoising in a single image, but also
appropriately predicts the motion information within the video
stream, and then to constrain the IBP post processing technique
as auxiliary information. Therefore, the proposed SR
reconstruction algorithm has carried out real time SR
processing from cif format video stream to 4cif video stream
while achieving very high image/video visual quality with
(a) (b) (c) CUDA acceleration, which was indicated by the experimental
results. Future work along this line should address the
Fig. 7. An example of visual comparisons. (a) original 67th LR frame in
silent.cif video sequence, (b) The SR result by using bicubic interpolation following issues. First, current regularization term based on
method, (c) The SR result by using the proposed method. back-progagation is limited, more efficient regularization terms
should be investigated. Second, prior knowledge defined by
It can be clearly seen that the proposed SR reconstruction human is weak at adaptivity for extensive applications in
algorithm, which is based on the relationship between non- image/video processing, more information learned from the
local similarity in time-space redundance domain and motion data themselves by machine learning methods should be
estimation by block matching, and the constrainted IBP post- studied.
processing technique, far outperforms traditional bicubic
interpolation method commonly referenced in all the major
media players in the terms of preserving edge-like structural REFERENCES
details, image/video contrast and sharpness, more specially, on [1] Mallat S, Yu G (2010) Super-resolution with sparse mixing estimators,
the real time processing speed while keep a good visual IEEE Transaction. Image Process, 19:28892900.
quality. The following two tables quantitatively describe the [2] W. T. Freeman, T. R. Jones, and E. C. Pasztor (2002) Example-based
advance of the proposed SR reconstruction algorithm on super-resolution, IEEE Computer Graphics and Applications, 22(2):56-
65.
precessing speed and SR reconstructed image/video quality,
[3] J. Yang, Z. Wang, Z. Lin, and T. Huang (2012) Coupled dictionary
which further indicate the proposed algorithms high training for image super-resolution, IEEE Trans. on Image Process.,
performance and the great possibility in practical SR 21(8):3467-3478.
applications, such as smart phone high resolution video sharing [4] R. G. Keys (1981) Cubic convolution interpolation for digital image
in real time through the internet and media players basic zoom processing, IEEE Trans. Acoust., Speech, Signal Process., 29(6):1153
function updating. 1160.
[5] H. S. Hou and H. C. Andrews (1978) Cubic splines for image
TABLE I. SR Running Speed Comparison on GPU(CUDA) and CPU interpolation and digital filtering, IEEE Trans. Acoust., Speech, Signal
Process., 26(6):508517.
File name YUV420 Video Size (300 frames) Average speed(fps)
[6] L. Zhang and X. Wu (2006) An edge-guided Image interpolation via
Input file size Output file size CPU GPU directional filtering and data fusion, IEEE Trans. Image Process.,
15(8):22262238.
foreman 352288 704576 4 30.3
[7] W. Dong, L. Zhang, G. Shi, and X. Wu (2009) Nonlocal back-projection
silent 352288 704576 4 30.3 for adaptive image enlargement, Proc. IEEE International Conference on
Image Processing, 349-352.
coastguard 352288 704576 4 30.3
[8] M. Irani and S. Peleg (1993) Motion analysis for image enhancement:
resolution, occlusion and transparency, Journal of Visual
Communication and Image Representation, 4(4):324-335.
TABLE II. BRISQUE Index Comparison on bicubic and proposed method [9] NVIDIA CUDATM (2011) NVIDIA CUDA C Programming Guide
Version 4.0.
Bicubic Proposed [10] E. Monteiro, B. Vizzotto, C. Diniz, B. Zatt, and S. Bampi (2011)
Input file Applying cuda architecture to accelerate full search block matching
(Averaged on 300 frames) (Averaged on 300 frames)
algorithm for high performance motion estimation in video encoding,
foreman.cif 38.35 34.25 Computer Architecture and High Performance Computing (SBAC-PAD),
2011 23rd International Symposium on, 128-135.
silent.cif 41.60 27.77 [11] A. Mittal, A. K. Moorthy, and A. C. Bovik (2012) No-reference image
coastguard.cif 52.36 45.83 quality assessment in the spatial domain, IEEE Trans. Image Process.,
21(12): 4695-4708.

12

You might also like