Professional Documents
Culture Documents
AbstractThis paper proposes an efficient and structure- bandwidth and server storage. The users of smart phone pursue
based image/video (interframe independent) super resolution to get higher quality images and videos by translating their LR
(SR) scheme. Image/video SR is currently a very active area of counterparts online. All of these requirements are promising to
research because it is used in various applications. The basic idea be fulfilled in low cost by using super resolution(SR)
of the proposed scheme is based on the concepts of pattern technique.
redundancy and parallel computing. Global and local motions of
pixels are well estimated by exploiting the repeated and SR is a computationally complex and numerically ill-posed
structured patterns existing in the most nature images. The value problem. In recent years, many researchers have paid much
of a missing pixel in the desired high resolution image is obtained attention on the learning-based SR methods[2,3], in which the
by calculating the weighted average of the selected pixels constraints defined by human in regularization methods are
according to the estimated motions, which is able to preserve the replaced with the learned co-occurrence prior-knowledge for
intrinsic geometric structure of the original low resolution solving the ill-posed problem. However, the learning-based SR
image/video stream. After that, the estimated pixels values will and sparse representation based image SR methods are both
be re-corrected by using iterative back projection (IBP) approach with high complexity in computation, and are strongly
in which auxiliary high frequency information embedded. The dependent on the training set used, these crucial roadblocks
proposed scheme is mainly implemented in parallel badly restrict the applications of these methods in real world.
programming strategy based on Compute Unified Device Some SR methods are based upon interpolation. In fact that the
Architecture (CUDA), to reconstruct desired high resolution
classical methods commonly used in image/video processing
image/video steam from its low resolution counterparts.
Experimental results show that the proposed method has high
software and hardware products are bilinear interpolation,
performance on both the visual quality and real time processing cubic convolution interpolation[4] and cubic spline
speed. interpolation[5]. These common interpolation methods are with
low complexity, but are weak in adapting to varying pixel
Index Termssuper resolution, non-local similarity, motion structures in a scene due to the inherent drawback of scene-
estimation, real time, GPU, CUDA independent interpolators, which often concomitantly produce
jaggy artifacts, blurring, and ringing effects. Zhang and Wu[6]
I. INTRODUCTION proposed a two steps interpolation method including
interpolating a missing pixel in two orthogonal directions
Spatial resolution is one of the most important quality respectively and fusing the two directional estimations under
metrics of digital images. The higher the resolution, the more the minimum mean square error criterion. Dong etc.[7]
image details. It is desired to convert the low resolution(LR) combined the non-local similarity in natural image and IBP
image or video stream to high resolution(HR) ones before they method to produce sharper edges and fewer artifacts than the
are displayed, in real time, on the screens of the users devices, previous interpolation techniques described.
such as television, personal computer and mobile terminals. In
addition, web videos need video sequence enhancement[1] In order to make a good tradeoff between computational
because they are stored in low quality due to the limited complexity and reproduction quality, and promote more and
more implementations of exciting SR technology outside
research labs, we are inspired by the parallel programming
This work was supported by the Research Fund for the Doctoral
Program of Higher Education of China (20126102110041), the technology applied in the image/video processing field with the
Natural Science Basic Research Plan in Shaanxi Province of China development of Graphic Processing Unit (GPU). This paper
(20125153025), the Doctorate Foundation of Northwestern proposes an efficient and structure-based image/video
Polytechnical University and the Technology Research Plan of (interframe independent) SR scheme. The idea of the proposed
Ministry of Public Security (Key Project 2014JSYJA018). scheme is based on the concepts of pattern redundancy and
parallel computing.
978-1-4799-6284-6/14/$31.00
c 2014 IEEE 9
II. PROPOSED METHOD local motion information inherently contained among the
This section presents a novel practical pixel-level video continuous frames. Moreover, searching in a single image
super resolution scheme that is to a large extent implemented instead of the whole image sequence, only in order to make a
on CUDA programming platform. It is motivated from the good use of space-time redundancies of the video.
edge-guided single image super resolution techniques focusing
on preserving edge-like details in the image. The proposed C. Local Motion Estimation
scheme consists of three parts: initial interpolation of image, In the early literatures, many spatial-domain SR algorithms
motion estimation, fusion and adjustment of missing pixels adopted only whole translation or other parametric global
values. In this case, the bicubic interpolator works for SR motion model, which have produced good performance on
initialization, despite any existing interpolation method could some scenes, however, it is very hard to meet more complex
be a choice, which means to demonstrate the efficiency of the motion present in nearly all videos. Thus local motion is
proposed scheme. employed in this proposed SR reconstruction method to depict
x kth xk more accurate motion information. Moreover, this makes a key
kth
contribution to connecting the single image SR problem with
yk
video sequence SR problem.
( L1 , L2 ) +
nk
space
A. Observation Model
To present a basic concept of SR reconstruction technique,
the observation model for still images is employed in this
article, from which the video sequence observation model can t t+1 t+2 time
be extended straightforward.
(a) (b)
It is a common way to establish a forward relation model Fig. 3. (a).Global motion and local motion in a video sequence.(b).Local
motion vectors in a searching window.
for the solution of SR reconstruction problem, and it could be
represented as the following formula in this paper. Fig. 3(a) illustrates the global motion and local motion
existing simultaneously in a video sequence by a simple way.
y = DBMx + n , (1) In Fig. 3(a), the red patch and green patch random selected in
the image move to new locations with a same motion vector
where M , B , D , n denote warping, blurring, down
from time t to t+1, which can present a global translation
sampling operators and the noise treated commonly as the motion model, and then from time t+1 to t+2, the red, green,
additive Gaussian white noise respectively. y is the observed and blue patches move to next locations with three different
LR image corresponding to the desired HR image x . motion vectors, which is resulted from the local motions of
different objects in the image. Moreover, this behavior often
In general speaking, it is crucial to find M , B , D as appears irregularly in the whole video sequence. How can
accurate as possible in the forward relation observation model. describe this natural motion model becomes a crucial problem.
Non-local similarity, as mentioned above, plays an amazing
B. Non-local Similarity Descriptor role in motion estimation. As shown in Fig. 3(b), a preset
There is no way to get an optimal image representation threshold is defined, it is a good way to estimate or update the
model for all images in the image processing field. It is proper missing pixel value using the weighted fusion value of the
to design an efficient feature descriptor, maybe several searched similar pixels. It has been clearly seen that repeated or
descriptors simultaneously, directed at specific image/video similar patterns denoted by 55 patch in a relative searching
applications. window, not only reflect the structural similarity at the scale of
patch size in a single image, which can be used to preserving
edge-like structural details in most image restoration problems,
but also indirectly describe the motion trend in the range of
searching window, which can be treated as local motion, and
even global motion if the most motions trend keep uniform,
Fig. 2. Examples of image repetitive structures
between neighbor frames in a video sequence captured at a
According to the credible survey and statistic, there are normal frame rate.
often many patterns and structures appearing repeatedly
throughout the nature images. Fig. 2 gives a straightforward Noticeable, the non-local similarity descriptor works on not
demonstration of this phenomenon. Such non-local self- only preserving the edge-like structural details in the still
similarity information is very helpful to improve the quality of image, but also replacing the local motion estimation among
reconstructed images. This paper proposes a method that the neighboring frames, which can be done in just current
constructs the non-local similarity descriptor to present the frame with the similarity measurement in a defined searching
structural features of the image and help predict the global and window. One point needs to extra explain is that we avoid
10
processing the heavy corrupted frames resulted from motion platform gives a reasonable excuse to examine the possibilities
blur effects during the video capture procedure, due to the of using GPU for accelerating SR reconstruction techniques.
percentage of heavy corrupted frames in an acceptable whole
video is certainly low. And an uniform processing for each In order to weaken the negative effects of numerous branch
frame is desirable in order to keep fast or real time processing, instructions which are always the bottleneck for algorithm
which will actually not real weaken the feeling quality of our implementation on CUDA, Trying to uniform the instructions
visual perception. for each pixel is very important and desired for algorithm
design. Corresponding to the main flow diagram shown in Fig.
4, the proposed method in this paper parallelizes the most parts
D. Marked Iterative Back Projection in different degrees on CUDA in the order of initial bicubic
The IBP technique is an SR reconstruction method in interpolation, motion estimation, Gaussian filtering for
spatial domain, it can minimize the reconstruction error by smoothing, and IBP procedure. Especially, the most time-
iterative back projecting the reconstruction error into the consuming partmotion estimationis be done by the
reconstructed image. The following formula demonstrates the strategy adopted in [10]. The other operations such like logical
procedure of IBP. control of the whole algorithm, data stream importing and
exporting are processed by CPU.
Ih(t +1) = I h(t ) + GBP ( Il Il(t ) ) = I h(t ) + G BP ( Il GI h(t ) ) , (2)
11
V. CONCLUSION
This paper proposes an efficient and structure-based
image/video (interframe independent) super resolution (SR)
scheme. The proposed scheme digs the time-space redundant
information within the video stream, though it is motivated
from the single image super resolution technique, and fully
(a) (b) (c)
considers the parallelism in the procedure of algorithm design
Fig. 6. An example of visual comparisons. (a) original 1st LR frame in and execution decomposition under CUDA platform. Non-
foreman.cif video sequence, (b) The SR result by using bicubic interpolation local similarity not only keeps a high performance on
method, (c) The SR result by using the proposed method.
interpolating and denoising in a single image, but also
appropriately predicts the motion information within the video
stream, and then to constrain the IBP post processing technique
as auxiliary information. Therefore, the proposed SR
reconstruction algorithm has carried out real time SR
processing from cif format video stream to 4cif video stream
while achieving very high image/video visual quality with
(a) (b) (c) CUDA acceleration, which was indicated by the experimental
results. Future work along this line should address the
Fig. 7. An example of visual comparisons. (a) original 67th LR frame in
silent.cif video sequence, (b) The SR result by using bicubic interpolation following issues. First, current regularization term based on
method, (c) The SR result by using the proposed method. back-progagation is limited, more efficient regularization terms
should be investigated. Second, prior knowledge defined by
It can be clearly seen that the proposed SR reconstruction human is weak at adaptivity for extensive applications in
algorithm, which is based on the relationship between non- image/video processing, more information learned from the
local similarity in time-space redundance domain and motion data themselves by machine learning methods should be
estimation by block matching, and the constrainted IBP post- studied.
processing technique, far outperforms traditional bicubic
interpolation method commonly referenced in all the major
media players in the terms of preserving edge-like structural REFERENCES
details, image/video contrast and sharpness, more specially, on [1] Mallat S, Yu G (2010) Super-resolution with sparse mixing estimators,
the real time processing speed while keep a good visual IEEE Transaction. Image Process, 19:28892900.
quality. The following two tables quantitatively describe the [2] W. T. Freeman, T. R. Jones, and E. C. Pasztor (2002) Example-based
advance of the proposed SR reconstruction algorithm on super-resolution, IEEE Computer Graphics and Applications, 22(2):56-
65.
precessing speed and SR reconstructed image/video quality,
[3] J. Yang, Z. Wang, Z. Lin, and T. Huang (2012) Coupled dictionary
which further indicate the proposed algorithms high training for image super-resolution, IEEE Trans. on Image Process.,
performance and the great possibility in practical SR 21(8):3467-3478.
applications, such as smart phone high resolution video sharing [4] R. G. Keys (1981) Cubic convolution interpolation for digital image
in real time through the internet and media players basic zoom processing, IEEE Trans. Acoust., Speech, Signal Process., 29(6):1153
function updating. 1160.
[5] H. S. Hou and H. C. Andrews (1978) Cubic splines for image
TABLE I. SR Running Speed Comparison on GPU(CUDA) and CPU interpolation and digital filtering, IEEE Trans. Acoust., Speech, Signal
Process., 26(6):508517.
File name YUV420 Video Size (300 frames) Average speed(fps)
[6] L. Zhang and X. Wu (2006) An edge-guided Image interpolation via
Input file size Output file size CPU GPU directional filtering and data fusion, IEEE Trans. Image Process.,
15(8):22262238.
foreman 352288 704576 4 30.3
[7] W. Dong, L. Zhang, G. Shi, and X. Wu (2009) Nonlocal back-projection
silent 352288 704576 4 30.3 for adaptive image enlargement, Proc. IEEE International Conference on
Image Processing, 349-352.
coastguard 352288 704576 4 30.3
[8] M. Irani and S. Peleg (1993) Motion analysis for image enhancement:
resolution, occlusion and transparency, Journal of Visual
Communication and Image Representation, 4(4):324-335.
TABLE II. BRISQUE Index Comparison on bicubic and proposed method [9] NVIDIA CUDATM (2011) NVIDIA CUDA C Programming Guide
Version 4.0.
Bicubic Proposed [10] E. Monteiro, B. Vizzotto, C. Diniz, B. Zatt, and S. Bampi (2011)
Input file Applying cuda architecture to accelerate full search block matching
(Averaged on 300 frames) (Averaged on 300 frames)
algorithm for high performance motion estimation in video encoding,
foreman.cif 38.35 34.25 Computer Architecture and High Performance Computing (SBAC-PAD),
2011 23rd International Symposium on, 128-135.
silent.cif 41.60 27.77 [11] A. Mittal, A. K. Moorthy, and A. C. Bovik (2012) No-reference image
coastguard.cif 52.36 45.83 quality assessment in the spatial domain, IEEE Trans. Image Process.,
21(12): 4695-4708.
12