Professional Documents
Culture Documents
VOL. 64,
NO. 9,
SEPTEMBER 2015
INTRODUCTION
N. Rajapaksha is with theDepartment of Electrical and Computer Engineering, The University of Akron, Akron, OH 44325.
E-mail: ntr3@uakron.edu.
A. Madanayake is with the Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB, Canada and the Department
of Electrical and Computer Engineering, The University of Akron, Akron,
OH 44325. E-mail: arjuna@uakron.edu.
R. J. Cintra is with the Signal Processing Group, Departamento de Estatstica, Universidade Federal de Pernambuco, Recife, PE, Brazil.
E-mail: rjdsc@ieee.org.
J. Adikari is with Elliptic Technologies Inc., Ottawa, ON, Canada.
E-mail: jithra.adikari@gmail.com.
V. S. Dimitrov is with the Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB, Canada.
E-mail: vdvsd103@gmail.com.
Manuscript received 13 Nov. 2013; revised 3 Oct. 2014; accepted 8 Oct. 2014.
Date of publication 3 Nov. 2014; date of current version 12 Aug. 2015.
Recommended for acceptance by P. A. Montuschi.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TC.2014.2366732
algorithm for low-complexity computation of a given trigonometric transform, based on number-theoretical results. A
prominent example is the arithmetic Fourier transform
(AFT) proposed by Reed et al. [17], [18]. The AFT allows
multiplication-free calculation of Fourier coefficients using
number-theoretic methods and non-uniformly sampled
inputs. A feature of the AFT is its suitability for parallel
implementation [17], [18].
Recently, an arithmetic transform method for the computation of the DCT, called the arithmetic cosine transform
(ACT) was proposed in [19]. The ACT can provide a multiplication-free framework and leads to the exact computation
of the DCTprovided that the input signal has null-mean
and is non-uniformly sampled [19]. The computational
gains of the ACT are only possible when its prescribed nonuniformly sampled data is available.
Classically the required non-uniform samples are derived
by means of interpolation over uniformly sampled data [19].
Such interpolation implies a computational overhead.
Another aspect of the ACT is that, for arbitrary input signal,
it requires the computation of the input signal mean value
[19]. Usually, the mean value is computed from uniformly
sampled data [19]. In fact, this dependence on uniformly
sampled data has been precluding the implementation of the
ACT based exclusively on non-uniformly sampled data.
On the other hand, the requirement for non-uniform
samples can be satisfied when spatial input signals are considered. In spatial signal processing, non-uniformly sampled signals can be directly obtained, without interpolation
using a non-uniform placement of sensors [7], [20]. This
motivates the search for architectures which could solely
rely on non-uniformly sampled inputs.
In this paper we address two main problems: (i) the
proposition of a method to obtain the mean value of a given
0018-9340 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
RAJAPAKSHA ET AL.: VLSI COMPUTATIONAL ARCHITECTURES FOR THE ARITHMETIC COSINE TRANSFORM
Vk
ml Skl ;
k 1; 2; . . . ; N 1;
(3)
l1
where m is the M
obius function [17], [18], [19]. The derivation of the ACT [19] utilizes the M
obius inversion formula.
Because the M
obius function values are limited to
f1; 0; 1g, (3) results in no additional multiplicative
complexity.
In practice, input sequences are not always null mean,
therefore a correction term is necessary to (3). In [19] an
expression suitable for the non-null mean signals is given as:
r bN1c
r
k
N X
N
N 1
vM
;
ml Skl
Vk
2 l1
2
k
(4)
Pn
where Mn ,
m1 mm is the Mertens function [19] and v
is the mean value of the uniformly sampled input sequence.
N
1
X
wn r vn ;
r 2 R;
(5)
n0
r bN1c
k
N X
2709
2mN 1
;
k
2
DN1
n r ; n 0; 1; . . . ; N 1;
N
and
DN x
sin N 1=2x
sin x=2
(2)
The ACT averages can be employed to computed DCT coefficients according to [19]:
(1)
It is important to notice that the values of r are not necessarily integer. In fact, they are expected to be fractional.
If the signal of interest has zero mean, then the ACT
algorithm can be used to calculate the DCT coefficients as
follows. First, let the ACT averages Sk , k 1; 2; . . . ; N 1,
of the non-uniform sampled inputs be defined as [19]:
Sk ,
k1
1X
v N 1;
k m0 2m k 2
k 1; 2; . . . N 1:
2710
v W vr :
1
61
6
6
61
6
S6
61
61
6
6
41
1
Consequently, the mean value of v can be determined exclusively from the non-uniform samples, according to
1
v w vr ;
8
(7)
(8)
where
2
1 1
60 1
6
6
60 0
6
Mo 6
60 0
60 0
6
6
40 0
0 0
2
1
60
6
6
60
6
D1 6
60
60
6
6
40
0
1
0
1
0
0
0
0
0
1
2
0
0
0
0
0
0
1
0
1
0
0
0
1
0
0
0
1
0
0
0 0 0
0 0 0
1
0 0
3
0 14 0
0 0 15
0 0 0
0 0 0
1
1
1
0
0
1
0
0
0
0
0
0
1
6
3
0
07
7
7
07
7
07
7;
07
7
7
05
1
7
3
1
0 7
7
7
0 7
7
0 7
7;
0 7
7
7
0 5
1
0
0
0
0
0
0
2
0
0
0
0
0
2
0
0
0
0
0
2
0
0
VOL. 64,
0
0
0
2
0
0
0
0
0
0
0
0
0
2
0
0
2
0
0
2
0
NO. 9,
0
0
0
0
2
0
0
0
0
0
0
0
0
2
SEPTEMBER 2015
3
0
17
7
7
07
7
17
7;
07
7
7
15
0
60
6
6
60
6
Me 6
60
60
6
6
40
0
0 0
1
0
4
0 0
0 0
0 0
0 0
0 0
3
0
0
0
0
0
0
0
0 7
7
7
0
0
0
0 7
7
0
0 7
14 0
7 17 ;
0 14 0
0 7
7
7
0
0 14 0 5
0
0
0 14
VLSI ARCHITECTURES
4.1 Architecture I
The ACT expressions for null mean signals in (2) and (3) can
be implemented for N 8 as shown in Fig. 1a. We refer to
this design as Architecture I. The eight-point null mean
ACT block admits the 10 non-uniformly sampled inputs
corresponding to (6). Constant multiplications by two are
implemented as left-shift operations; and the fractional constant multipliers 1=2; 1=3; 1=4; . . . are converted to integers
by scaling them by the least common multiple of their
denominators: 420. The integer constant multipliers can be
implemented as Booth encoded shift-and-add structures
making the architecture multiplier free, and the outputs of
p
the block are scaled by 420 2=N 210. This architecture
is useful in applications that have null mean input sequences, and can be implemented with very low computational
and area complexity.
4.2 Architecture II
The proposed method in Section 3 for the computation of v
from the non-uniformly sampled 10-point signal can be
implemented as shown in Fig. 1b. We will refer to it as the
mean calculation block, which computes (7). The correction
term associated to the Mertens function required in (4) is
RAJAPAKSHA ET AL.: VLSI COMPUTATIONAL ARCHITECTURES FOR THE ARITHMETIC COSINE TRANSFORM
2711
Fig. 1. (a) Null mean ACT and (b) mean calculation block.
Fig. 2. Architecture II: Non-null mean DCT calculation using the Mertens correction block.
2712
TABLE 1
Computational Complexity of Proposed Architecture I
and Architecture II
Architecture II
0
36
11
54
multiplications are implemented as shift-and-add structures, therefore are not counted as multipliers. Note that the
adder count also include the adders required for the Booth
encoded structures.
NO. 9,
SEPTEMBER 2015
TABLE 3
Average Percentage Error and Average Peak Signal to Noise
Ratio of ACT Implementations with Fixed Point Input WordLength L, when Tested with 10,000 Input Vectors
Architecture I
Constant multipliers
Two-input Adders
VOL. 64,
Architecture I
% error
PSNR (dB)
8
4:594 101
12
1:977 102
16 1:840 103
20
2:943 104
24 1:001 105
28
1:167 106
32 2:274 108
50.3
74.3
98.4
122.4
145.6
170.6
194.7
Architecture II
% error
PSNR (dB)
2:262 101
2:149 101
1:550 102
2:565 103
9:462 106
3:137 106
3:207 107
38.8
63.0
87.1
110.8
135.4
159.4
183.4
DL
110
1118
1922, 24
23
25, 26, 31
27, 32, 34
2830
33, 35
0
2
3
1
10
12
11
13
Architecture II
Points
DL
3655
56
57, 59, 61, 62
58
60
6366
0
1
13
11
14
12
CONCLUSIONS
The ACT algorithm is suitable for calculating the eightpoint DCT coefficients exactly using only adders and
integer constant multiplications, also with low
RAJAPAKSHA ET AL.: VLSI COMPUTATIONAL ARCHITECTURES FOR THE ARITHMETIC COSINE TRANSFORM
2713
TABLE 4
Speed of Operation, Resource Utilization and Power Consumption of the XC6VLX240T FPGA Device Used for Input
Fixed Point Word-Lengths L and for Architectures I and II
Architecture
tested
Fixed point
word-length (L)
Slices
Slice FF
Slice LUTs
Norm. power
(W/MHz)
8
12
8
12
263 (1%)
329 (1%)
443 (1%)
495 (1%)
930 (1%)
1205 (1%)
1276 (1%)
1678 (1%)
756 (1%)
1019 (1%)
1386 (1%)
1639 (3%)
1.37
1.16
0.54
0.53
500
333.33
166.66
133.33
2:74 103
3:49 103
3:22 103
3:97 103
Architecture I
Architecture II
TABLE 5
Speed of Operation, Power Consumption and Area Utilization in ASIC Synthesis Results for
Fixed Point Word-Lengths L for Architectures I and II (45 nm technology)
Architecture
Synthesized
Fixed point
Static power Dynamic power Total power
Norm. power
Area (mm2 )
Op. Freq. (GHz)
word-length (L)
(mW)
(mW)
(mW)
(mW /GHzV2 )
Architecture I
Architecture II
8
12
8
12
39,007.27
53,961.52
65,314.36
96,087.77
0.27
0.37
0.46
0.63
67.31
90.32
60.34
79.29
67.59
90.70
60.80
79.92
1.11
1.11
0.625
0.588
50.12
67.25
79.78
111.45
TABLE 6
Comparison of the Proposed Implementation with Published DCT Implementations
Proposed architectures
Gong
et al. [25]
Shams
et al. [26]
Gosh
et al. [27]
Livramento
et al.[28]
Arch. I
Arch. I
Arch. II
2D but 1D
results
available
1D
2D but 1D
results
available
2D but 1D
results
available
1D
1D
1D
1D
No
No
No
No
Yes
Yes
Yes
Yes
Precision
Non-exact
Non-exact
Non-exact
Non-exact
Exact
Exact
Non-exact
Non-exact
Method
Vector
matrix
DCT core
New distributed
arithmetic
based DCT
Coefficient
arithmetic
based DCT
LLM
algorithm
ACT,
null mean
ACT,
null mean
ACT,
Mertens
function
ACT,
Mertens
function
11
11
11
1D/2D DCT
Replicated and
measured results
by authors
Multipliers
Input word-length
Arch. II
12
12
12
0.125 (2D)
1.5
0.05 (2D)
0.00489 (2D)
1.11
1.11
0.625
0.588
0.125
12
0.4
1.792
8.88
8.88
5.00
4.70
Power consumption
(mW)
N/A
210
12.45 (1D)
6.08 (2D)
67.31z
90.32z
60.34z
79.29z
Normalized power
consumption
(mW/GHzV2 )
N/A
12.86
110.67
114.17
50.11
67.25
79.79
111.44
Gate count
30,290
N/A
N/A
N/A
11,491
16,478
17,673
25,197
0.25 mm
CMOS
0.35 mm CMOS
0.12 mm
CMOS
0.35 mm
CMOS
45 nm
CMOS
45 nm
CMOS
45 nm
CMOS
45 nm
CMOS
2.5
3.3
1.5
3.3
1.1
1.1
1.1
1.1
Operating
frequency (GHz)
Implementation
technology
Supply voltage (V)
Some implementations are 2-D but since they are implemented with 1-D DCT module with row column decomposition, results can be taken that can be compared
with the proposed architectures. z Dynamic power.
2714
ACKNOWLEDGMENTS
This work was supported by The University of Akron, Ohio,
USA; Conselho Nacional de Desenvolvimento Cientfico e
Tecnol
ogico (CNPq) and FACEPE, Brazil; and NSERC,
Canada.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
N. Ahmed, T. Natarajan, and K. R. Rao, Discrete cosine transform, IEEE Trans. Comput., vol. 23, no. 1, pp. 9093, Jan. 1974.
C. Chakrabarti and J. J
aJa, Systolic architectures for the computation of the discrete Hartley and the discrete cosine transforms
based on prime factor decomposition, IEEE Trans. Comput.,
vol. 39, no. 11, pp. 13591368, Nov. 1990.
F. A. Kamangar and K. R. Rao, Fast algorithms for the 2-D discrete cosine transform, IEEE Trans. Comput., vol. 31, no. 9,
pp. 899906, Sep. 1982.
H. Kitajima, A symmetric cosine transform, IEEE Trans. Comput., vol. 21, no. 4, pp. 317323, Apr. 1980.
S. Yu and E. Swartziander Jr, ,DCT implementation with distributed arithmetic, IEEE Trans. Comput., vol. 50, no. 9, pp. 985991,
Sep. 2001.
V. Britanak, P. Yip, and K. R. Rao, Discrete Cosine and Sine Transforms. Amsterdam, The Netherlands: Academic Press, 2007.
N. Romaand L. Sousa,Efficient hybrid DCT-domain algorithm
for video spatial downscaling, EURASIP J. Adv. Signal Process.,
vol. 2007, no. 2, pp. 3030, 2007.
H. Lin and W. Chang, High dynamic range imaging for stereoscopic scene representation, in Proc. 16th IEEE Int. Conf. Image
Process., Nov. 2009, pp. 43054308.
E. Magli and D. Taubman, Image compression practices and
standards for geospatial information systems, in Proc. IEEE Int.
Geosci. Remote Sens. Symp., Jul. 2003, vol. 1, pp. 654656.
M. Bramberger, J. Brunner, B. Rinner, and H. Schwabach, Realtime video analysis on an embedded smart camera for traffic
surveillance, in Proc. 10th IEEE Real-Time Embedded Technol. Appl.
Symp., May. 2004, pp. 174181.
C. F. Chiasserini and E. Magli, Energy consumption and image
quality in wireless video-surveillance networks, in Proc. 13th
IEEE Int. Symp. Pers., Indoor Mobile Radio Commun., Sep. 2002,
vol. 5, pp. 23572361.
T. Tada, K. Cho, H. Shimoda, T. Sakata, and S. Sobue, An evaluation of JPEG compression for on-line satellite images transmission, in Proc. Int. Geosci. Remote Sens. Symp., Aug. 1993,
pp. 15151518.
B. Bennett, C. Dee, and C. Meyer, Emerging methodologies in
encoding airborne sensor video and metadata, in Proc. IEEE Military Commun. Conf., Oct. 2009, pp. 16.
S. Marsi, G. Impoco, A. Ukovich, S. Carrato, and G. Ramponi,
Video enhancement and dynamic range control of HDR sequences for automotive applications, EURASIP J. Adv. Signal Process.,
vol. 2007, p. 080971, 2007.
I. F. Akyildiz, T. Melodia, and K. R. Chowdhury,A survey on
wireless multimedia sensor networks, Comput. Netw., vol. 51,
no. 4, pp. 921960, 2007.
J. G. Proakis and D. G. Manolakis, Digital Signal
Processing. Upper Saddle River, NJ, USA: Prentice-Hall, 2007.
I. S. Reed, D. W. Tufts, X. Yu, T. K. Truong, M. T. Shih, and X. Yin,
Fourier analysis and signal processing by use of the M
obius
inversion formula, IEEE Trans. Acoust., Speech Signal Process.,
vol. ASSP-38, no. 3, pp. 458470, Mar. 1990.
I. S. Reed, M. T. Shih, T. K. Truong, E. Hendon, and D. W. Tufts,
A VLSI architecture for simplified arithmetic Fourier transform
algorithm, IEEE Trans. Signal Process., vol. 40, no. 5, pp. 1122
1133, May 1992.
VOL. 64,
NO. 9,
SEPTEMBER 2015
RAJAPAKSHA ET AL.: VLSI COMPUTATIONAL ARCHITECTURES FOR THE ARITHMETIC COSINE TRANSFORM
Jithra Adikari received the BSc degree in electronic and telecommunication engineering from
the University of Moratuwa, Moratuwa, Sri Lanka,
in 2002, the MSc degree in information technology from the Royal Institute of Technology (KTH),
Stockholm, Sweden, in 2005, and the PhD degree
in electrical and computer engineering from the
University of Calgary, Calgary, AB, Canada, in
2010. He is with Elliptic Technologies, Canada.
He was with the University of Waterloo, Waterloo,
ON, Canada. He is a member of the IEEE.
2715