VHDL Image Processing

First Edition
U. Chuks

6/1/2010

Copyright 2010 by U. Chuks

Cover design by U. Chuks
Book design by U. Chuks

All rights reserved.

No part of this book may be reproduced in any
form or by any electronic or mechanical means
including information storage and retrieval
systems, without permission in writing from the
author. The only exception is by a reviewer, who
may quote short excerpts in a review.

U. Chuks
Visit my page at
http://www.lulu.com/spotlight/Debarge

iii

Contents
Table of Contents
Contents ............................................................................................ iii
Preface ............................................................................................... vi
Chapter 1 .......................................................................................... 1
Introduction ...................................................................................... 1
1.1 Overview of Digital Image Processing .................................. 1
1.1.1 Application Areas .................................................... 2
1.2 Digital Image Filtering .......................................................... 2
1.2.1 Frequency Domain .......................................................... 2
1.2.2 Spatial Domain ................................................................. 4
1.3 VHDL Development Environment ......................................... 6
1.3.1 Creating a new project in ModelSim .............................. 7
1.3.2 Creating a new project in Xilinx ISE ............................. 14
1.3.3 Image file data in VHDL image processing ................. 18
1.3.4 Notes on VHDL for Image Processing ......................... 20
References................................................................................... 23
Chapter 2 ........................................................................................ 25
Spatial Filter Hardware Architectures ............................................ 25
2.1 Linear Filter Architectures .................................................... 25
2.1.1 Generic Filter architecture ............................................. 28
2.1.2 Separable Filter architecture ......................................... 30
2.1.3 Symmetric Filter Kernel architecture ............................ 32

iv

2.1.4 Quadrant Symmetric Filter architecture ....................... 34
2.2 Non-linear Filter Architectures ............................................. 35
Summary ...................................................................................... 35
References................................................................................... 36
Chapter 3 ........................................................................................ 37
Image Reconstruction .................................................................. 37
3.1 Image Demosaicking .......................................................... 37
3.2 VHDL implementation........................................................... 44
3.2.1 Image Selection ............................................................. 49
Summary ...................................................................................... 57
References................................................................................... 57
Chapter 4 ......................................................................................... 59
Image Enhancement ....................................................................... 59
4.1 Point-based Enhancement ................................................... 60
4.1.1 Logarithm Transform ..................................................... 60
4.1.2 Gamma Correction ........................................................ 62
4.1.3 Histogram Clipping ........................................................ 62
4.2 Local/neighbourhood enhancement .................................... 64
4.2.1 Unsharp Masking ........................................................... 64
4.2.2 Logarithmic local adaptive enhancement .................... 65
4.3 Global/Frequency Domain Enhancement ........................... 65
4.3.1 Homomorphic filter ......................................................... 66
4.4 VHDL implementation........................................................... 66
Summary ...................................................................................... 68
References................................................................................... 68
Chapter 5 ......................................................................................... 70

v

Image Edge Detection and Smoothing ......................................... 70
5.1 Image edge detection kernels.............................................. 70
5.1.1 Sobel edge filter ............................................................. 71
5.1.2 Prewitt edge filter ........................................................... 72
5.1.3 High Pass Filter .............................................................. 73
5.2 Image Smoothing Filters ...................................................... 74
5.2.1 Mean/Averaging filter..................................................... 75
5.2.2 Gaussian Lowpass filter ................................................ 75
Summary ...................................................................................... 77
References................................................................................... 77
Chapter 6 ......................................................................................... 78
Colour Image Conversion............................................................... 78
6.1 Additive colour spaces ......................................................... 78
6.2 Subtractive Colour spaces ................................................... 79
6.3 Video Colour spaces ............................................................ 82
6.4 Non-linear/non-trivial colour spaces .................................... 91
Summary ...................................................................................... 95
References................................................................................... 95
Circuit Schematics .......................................................................... 97
Creating Projects/Files in VHDL Environment ............................ 106
VHDL Code ................................................................................... 118
Index............................................................................................... 123

vi

Preface
The relative dearth of books regarding the know-how involved in implementing
several algorithms in hardware was the motivating factor in writing this book,
which was written for those with a prior understanding of image processing
fundamentals who may or may not be familiar with programming environments
such as MATLAB and VHDL. Thus, the subject is addressed very early on,
bypassing the fundamental theories of image processing, which are better
covered in several contemporary books given in the references sections in the
chapters of this book.

By delving into the architectural design and implications of the chosen
algorithms, the user is familiarized with the necessary tools to realize an
algorithm from theory to software to designing hardware architectures.

Though the book does not discuss the vast theoretical mathematical processes
underlying image processing, it is hoped that by providing working examples
of actual VHDL and MATLAB code and simulation results of the software, that
the concepts of practical image processing can be appreciated.

This first edition of this book attempts to provide a working aid to readers who
wish to use the VHDL hardware description language for implementing image
processing algorithms from software.

1

Chapter 1
Introduction
Digital image processing is an extremely broad and ever
expanding discipline as more applications, techniques and
products utilize digital image capture in some form or the
other. From industrial processes like manufacturing to
consumer devices like video games and cameras, etc,
image processing chips and algorithms have become
ubiquitous in everyday life.
1.1 Overview of Digital Image Processing
Image processing can be performed in certain domains
using:
- Point (pixel-by-pixel) processing operations.
- Local /neighbourhood/window mask operations.
- Global processing operations.

A list of the areas of digital image processing includes but is
not limited to:
- Image Acquisition and Reconstruction
- Image Enhancement
- Image Restoration
- Geometric Transformations and Image Registration
- Colour Image Processing
- Image Compression
- Morphological Image Processing
- Image Segmentation
- Object and Pattern Recognition

For the purposes of this book we shall focus on the areas of
Introduction
2

Image Reconstruction, Enhancement and Colour Image
Processing and the VHDL implementation of selected
algorithms from these areas.

1.1.1 Application Areas
- Image Reconstruction and Enhancement techniques
are used in digital cameras, photography, TV and
computer vision chips.
- Colour Image and Video Enhancement is used in
digital video, photography, medical imaging, remote
sensing and forensic investigation.
- Colour Image processing involves colour
segmentation, detection, recognition and feature
extraction.

1.2 Digital Image Filtering
Digital image filtering is a very powerful and vital area of
image processing, with convolution as the fundamental and
underlying mathematical operation that underpins the
process makes filtering one of the most important and
studied topics in digital signal and image processing.

Digital image filtering can be performed in the Frequency,
Spatial or Wavelet domain and operating in any of these
domains requires a domain transformation or changing the
representation of a signal or image into a form in which it is
easier to visualize and/or modify the particular aspect of the
signal one wishes to analyze, observe or improve upon.
1.2.1 Frequency Domain
Filtering in the frequency domain involves transforming an
image into a representation of its spectral components and
then using a frequency filter to modify and alter the image
Introduction
3

by passing a particular frequency and suppressing or
eliminating other unwanted frequency components. This
frequency transform can involve the famous Fourier
Transform or Cosine Transform. Other frequency
transforms also exist in the literature but these are the most
popular. The (Discrete) Fourier transform is another core
component in digital image processing and signal analysis.
The transform is built on the premise that complex signals
can be formed from fundamental and basic signals when
combined together spectrally. For a discrete image function,
of M N dimensions with spatial coordinates, x and

y, the DFT transform is given as;

(1.2.1-1)

And its inverse transform back to the spatial domain is;

(1.2.1-2)

Where is the discrete image function in the
frequency domain with frequency coordinates, u and v, and
j is the imaginary component. The basic steps involved in
frequency domain processing are shown in Figure 1.2.1(i).

Figure 1.2.1(i) - Fundamental steps of frequency domain filtering
Frequency
Domain
Filter

Inverse
Fourier
Transform
Fourier
Transform
Pre-
Processing
Post-
Processing
Introduction
4

The frequency domain is more intuitive due to the
transformation of the spatial image information to
frequency-dependent information. The frequency
transformation makes it is easier to analyze image features
across a range of frequencies. Figure 1.2.1(ii) illustrates the
frequency transformation of the spatial information inherent
in an image.

(a) (b)
Figure 1.2.1(ii) (a) Image in spatial domain (b) Image in frequency
domain
1.2.2 Spatial Domain
Spatial domain processing operates on signals in two
dimensional space or higher, e.g. grayscale, colour and
MRI images. Spatial domain image processing can be
point-based, neighbourhood/kernel/mask or global
processing operations.

The spatial domain mask filtering involves convolving a
small spatial filter kernel or mask around a local region of
the image, performing the task repeatedly until the entire
image is processed. Linear spatial filtering processes each
pixel as a linear combination of the surrounding, adjacent
neighbourhood pixels while non-linear spatial filtering uses
statistical, set theory or logical if-else operations to process
Introduction
5

each pixel in an image. Examples include the median and
variance filters used in image restoration. Figure 1.2.2(i)
show the basics of spatial domain processing where

is the input image and
is the processed output

image.
Pre-
processing
Filter
Function
Post-
processing
) , ( y x
o
I ) , ( y x
i
I

Figure 1.2.2(i) - Basic steps in spatial domain filtering

Spatial domain filtering is highly favoured in hardware
image processing filtering implementations due to the
practical feasibility of employing it in real-time industrial
processes. Figure 1.2.2(ii) shows the plots of a frequency
response of the filter and the spatial domain equivalent for
high and low pass filters.

(a) (b)

(c) (d)
Figure 1.2.2(ii) Low-pass filter in the (a) frequency domain (b) spatial
domain and High-pass filter in the (c) frequency domain (d) spatial
domain
Introduction
6

This gives an idea of the span of the spatial domain filter
kernels relative to their frequency domain counterpart.

Since a lot of the algorithms in this book involve spatial
domain filtering techniques and their implementation in
hardware description languages (HDLs), emphasis will be
placed on spatial domain processing throughout the book.
1.3 VHDL Development Environment
VHDL is one of the languages for describing the behaviour
of digital hardware devices and highly complex circuits such
as FPGAs, ASICs and CPLDs. In other words, it is called a
hardware description language (HDL) and others include
ADA and Verilog, which is the other commonly-used HDL.
VHDL is preferred because of its open source nature in that
it is freely available and has a lot of user input and support
helping to improve and develop the language further. There
has been three or four language revisions of VHDL since its
inception in the 80s, and have varying syntax rules.

Tools for hardware development with VHDL include such
popular software such as ModelSim for simulation and
Xilinx ISE tools and Leonardo Spectrum for complete circuit
design and development. With software environments like
MathWorks MATLAB and Microsoft Visual Studio, image
processing algorithms and theory can now be much more
easily implemented and verified in software before being
rolled out into physical, digital hardware.

We will be using the Xilinx software and ModelSim software
for Xilinx devices for the purposes of this book.

Introduction
7

1.3.1 Creating a new project in ModelSim
Before proceeding, ModelSim software from Mentor
Graphics must be installed and enabled. Free ModelSim
software can be downloaded from internet sites like Xilinx
website or other sources. The one used for this example is
a much earlier version of ModelSim (version 6.0a) tailored
for Xilinx devices.

Once ModelSim is installed, run it and the window like the
one in Figure 1.3.1(i) should appear.

Figure 1.3.1(i) ModelSim starting window

Close the welcome page and click on File, select New ->
Project as shown in Figure 1.3.1(ii).

Click on the Project option and a dialog box appears as
shown in Figure 1.3.1(iii). You can then enter the project
name. However we would select an appropriate location to
Introduction
8

store all project files to have a more organized work folder.
Thus, click on Browse and the dialog box shown in Figure
1.3.1(iv) appears. Now we can navigate to an appropriate
folder or create one if it doesnt exist. In this case, a
previously created folder called colour space converters
was created to store the project files. Clicking OK returns
us to the Create a New Project dialog box and now we
name the project as Colour space converters and click
OK.

Figure 1.3.1(ii) Creating a new project in ModelSim

Introduction
9

A small window appears for us to add a new or existing file
as shown in Appendix B, Figure B1.

Since we would like to add a new file for illustrative
purposes, we create a file called example_file as in Figure
B3 and it appears on the left hand side workspace as
depicted in Figure B4.

Then we add existing files by clicking the Add Existing File
and navigate to the relevant files and select them as shown
in Figure B5. They now appear alongside the newly created
file as shown in Figure B6.

The rest of the process is easy to follow. For further
instruction on doing this, refer to Appendix B or the Xilinx
sources listed at the end of the chapter.

Now these files can be compiled before simulation as
shown in the subsequent figures.

Successful compilation is indicated by messages in green
colours while a failed compilation messages are in red and
will indicate the errors and the location of those errors like
all smart debugging editors for software code development.
Any errors are located and corrected and the files
recompiled until there are no more syntax errors.

Introduction
10

Figure 1.3.1(iii) Creating a new project

Once there are no more errors, the simulation of the files
can begin. Clicking on the simulation tab will open up a
window to select the files to be simulated. However, you
must create a test bench file for simulation before running
any simulation. A test bench file is simply a test file to
evaluate your designed system to verify its correct
functionality.

You can choose to add several more windows to view the
ports and signals in your design.
Introduction
11

Figure 1.3.1(iv) Changing directory for new project

The newly created file is empty upon inspection, thus we
have to add some code to the blank file. We start with
including and importing the standard IEEE libraries needed
as shown in Figure 1.3.1(v) at the top of the blank file.

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

Figure 1.3.1(v) Adding libraries

Introduction
12

The IEEE.std_logic_1164 and the
IEEE.std_logic_arith are the standard logic and the
standard logic arithmetic libraries, which are the minimum
libraries needed for any VHDL logic design since they
contain all the necessary logic functions.

With that done, the next step would be to add the
architecture of the system we would like to describe in this
example file. Thus, the block diagram for the design we are
going to implement in VHDL is shown in Figure 1.3.1(vi).

Figure 1.3.1(vi) Top level system description of example_file

This leads to the top level architecture description in VHDL
code shown in Figure 1.3.1(vii).

----TOP SYSTEM LEVEL DESCRIPTION-----
entity example_file is
port ( ---the collection of all input and output
ports in top level
Clk : in std_logic; ---clock for synchronization
rst : in std_logic; ---reset signals for new data
input_port : in bit; ---input port
output_port : out bit ---output port
);
end example_file;

Figure 1.3.1(vii) VHDL code for black box description of example_file

rst
input_port

example_file
clk
output_port
Introduction
13

The code in Figure 1.3.1(vii) is the textual or code
description of the black box diagram shown in Figure
1.3.1(vi).

The next step is to detail the actual operation of the system
and the relationship between the input and output ports and
this operation of the system is shown in the VHDL code in
Figure 1.3.1(viii).

---architecture and behaviour of TOP SYSTEM
LEVEL DESCRIPTION in more detail
architecture behaviour of example_file is
---list signals which connect input to output
ports here
---for example
signal intermediate_port : bit := '0'; --
initialize to zero

begin ---start
process(clk, rst) --process which is
triggered by clock or reset pin
begin
if rst = '0' then --reset all output ports
intermediate_port <= '0'; --initialize
output_port <= '0'; --initialize
elsif clk'event and clk = '1' then --operate
on rising edge of clock
intermediate_port <= not(input_port); -
-logical inverter
output_port <= intermediate_port or
input_port; --logical or operation
end if;
end process; --self-explanatory
end behaviour; --end of architectural behaviour

Figure 1.3.1(viii) VHDL code for operation of example_file

Introduction
14

The first line of code in Figure 1.3.1(viii) defines the
beginning of the behavioural level of the architecture. The
next line defines a signal or wire that will be used in
connecting the input port to the output port. It has been
defined as a single bit and initialized to zero.

The next line indicates the beginning of a triggered process
that responds to both the clock and reset signals.

The ifthenelsethen statements indicate what actions
and statements to trigger when the stated conditions are
met.

The actual logical operation starts at the rising edge of the
clock and the signal takes on the value from the input port
and inverts it while the output port performs the logical or
operation on the inverted and non-inverted signals to
produce the output value. Though this is an elaborate circuit
design for a simple inverter operation, it was added to
illustrate several aspects that will be recurring themes
throughout the work discussed in the book.

1.3.2 Creating a new project in Xilinx ISE
Like the ModelSim software, the software for evaluating
VHDL designs in FPGA devices can be downloaded for free
from FPGA Vendors like Leonardo Spectrum for Altera and
Actel FPGAs or the Xilinx Project Navigator software from
Xilinx. The Xilinx ISE version used in this book is 7.1.

Once the software has been fully installed, we can then
begin, so by opening the program, we get a welcome
screen, just like that when we launched ModelSim.
Introduction
15

Creating a project in the Xilinx ISE is similar to the process
in ModelSim., however one would have to select the
specific FPGA device for which the design is to be loaded.
This is because the design must be physically mapped onto
a physical device and the ISE software is comprised of
special, complicated algorithms that emulate the actual
hardware device to ensure that the design is safe and error-
free before being downloaded to an actual device. This
saves on costly errors and damage to the device by
incorrectly routed pins when designing for large and
expensive devices like ASICs.

A brief introduction to creating a project in Xilinx is shown in
Figure 1.3.2(i) 1.3.2(iv).

Figure 1.3.2(i) Opening the Xilinx Project Navigator

Introduction
16

We then click OK on the welcome dialog box to access the
project workspace. Then click on File, select New Project
as shown in Figure 1.3.2(ii) and enter a new name for the
project as shown in Figure 1.3.2(iii). Then click Next and
the next window shown in Figure 1.3.2(iv) prompts you to
select the FPGA hardware device family your final design is
going to be implemented in. We select the Xilinx Spartan 3
FPGA chip which is indicated by the chip number xc3s200
and the package is ft256 and the speed grade is -4. This
device will be referred to as 3s200ft256-4 in the Project
Navigator.

We leave all the other options as they are since we will be
using the ModelSim simulator and use the VHDL language
for most of the work and only implementing the final design
after correct simulation and verification.

Depending on the device you are implementing your design
on, the device family name will be different. However, the
cost of the free software means that you do not have
access to all the FPGA devices in every available device
family in the softwares database and thus will not be able
to generate a programming file to be downloaded to an
actual FPGA.

The design process from theoretical algorithm description to
circuit development and flashing to an FPGA device is a
non-linear exercise as the design may need to be optimized
and/or modified depending on the design constraints of the
project.
Introduction
17

Figure 1.3.2(ii) Creating a new project in Xilinx Project Navigator

Figure 1.3.2(iii) Creating a new project name

Introduction
18

Figure 1.3.2(iv) Selecting a Xilinx FPGA target device

Clicking Next to the next set of options allows you to add
HDL source files, similar to ModelSim. The user can add
them from here or just click through to create the project
and then add the files manually like in ModelSim.

1.3.3 Image file data in VHDL image processing
Figure 1.3.3 shows an image in the form of a text file, which
will be read using the textio library in VHDL. A software
program was written to convert image files to text in order to
process them. The images can be converted to any
numerical type including binary, hexadecimal (to save
space). Integers were chosen for easy readability and
debugging and for illustration of the concepts. After doing
this, another software program is written to convert the text
files back to images to be viewed.
Introduction
19

Writing MATLAB code is the easiest and quickest way of
doing this when working with VHDL. MATLAB also enables
fast and easy prototyping of algorithms without re-inventing
the wheel and being force to write each and every function
needed to perform standard operations, especially image
processing algorithms. This was why it was chosen over the
.NET environment.

Coding in VHDL is a much different experience than coding
with MATLAB, C++ or JAVA since it is describing hardware
circuits, which have to be designed as circuits rather than
simply software programs.

VHDL makes it much easier to describe highly complex
circuits that would be impractical to design with basic logic
gates and it infers the fundamental logical behaviour based
on the nature of the operation you describe within the code.
In a sense, it is similar to the Unified Modeling Language
(UML) used to design and model large and complex object-
oriented software algorithms and systems in software
engineering.

SIMULINK in MATLAB is also similar to this and new tools
have been developed to allow designers with little to know
knowledge of VHDL to work with MATLAB and VHDL code.
However, the costs of these tools are quite prohibitive for
the average designer with a small budget.

FPGA system development requires a reasonable amount
of financial investment and the actual prototype hardware
chip cost can be quite considerable in addition to the
software tools needed to support the hardware. Thus, with
Introduction
20

these free tools and a little time spent on learning VHDL,
designing new systems becomes much more fulfilling and
gives the coder the chance to really learn about how the
code and the system they are trying to build is going to
work on a macro and micro level. Also, extensive periods
debugging VHDL code will definitely make the coder a
much better programmer because of the experience.

Figure 1.3.3 image as a text file to be read into VHDL testbench

1.3.4 Notes on VHDL for Image Processing
Most users of this book probably have had some exposure
to programming or at least have heard of programming
languages and packages like C++, JAVA, C, C#, Visual
Basic, MATLAB, etc. But fewer people are aware of
languages like VHDL and other HDLs like Verilog and ADA,
which make it much easier to design larger and more
complex circuits for digital hardware chips like ASICs,
FPGAs, and CPLDs used in highly sophisticated systems
and devices.

Introduction
21

When using these fourth generation languages like C# and
MATLAB, writing programs to perform mathematical tasks
and operations is much easier and users can make use of
existing libraries to build larger scale systems that perform
more complex mathematical computations without thinking
much about them.

However, with languages like VHDL, performing certain
mathematical computations like statistical calculations or
even divisions require careful system design and planning if
the end product is to realize a fully synthesizable circuit for
downloading to an FPGA. In order words, floating point
calculations in VHDL for FPGAs is a painful and difficult
task for the uninitiated and those without developer and
design resources. Some hardware vendors have developed
their own specialized floating point cores but these come at
a premium cost and are not for the average hardware
design hobbyist. Floating point calculations take up a lot of
system resources and along with operations like divisions,
especially when calculating non-multiples of 2. Thus, most
experienced designers prefer to work with fixed-point
mathematical calculations.

For example, if we choose to write a program to calculate
the logarithm, cosine or exponential of signal values, this is
usually taken care of in software implementation by calling
a log, cosine or exponential function from the inbuilt library
without even being aware of the algorithm behind the
function. This is not the case with VHDL or hardware
implementation. Though it is vital to note that VHDL has
libraries for all these non-linear functions, the freely
available functions are not synthesizable. This means that
they cannot be realized in digital hardware and thus
Introduction
22

hardware design engineers must devise efficient
architectures for these algorithms or purchase hardware IP
cores developed by FPGA vendors before they can be
implement them on an FPGA.

The first obvious route to building these type of functions is
to create a look-up-table (LUT) consisting of pre-calculated
entries in addressable memory (ROM) which can then
accessed for a defined range of values. However, the size
of the LUT can expand to unmanageable proportions and
render the entire system inefficient, cumbersome and
wasteful. Thus, a better approach would involve a mixture
of some pre-computed values and the calculation of other
values to reduce the memory size and increase efficiency.
Thus, the LUT is a constant recurring theme in hardware
design involving certain systems that perform intensive
mathematical computation and signal processing.

Usually, when a non-linear component is an essential part
of an algorithm, the LUT becomes an alternative to
implementing such crucial part of the algorithm or an
alternative algorithm may have to be devised in accordance
with error trade-off curves. This is the standard theme of
research papers and journals on digital logic circuits.

Newer and more expensive FPGAs now have a soft core
chip built into them, enabling the designer the flexibility of
apportioning soft computing tasks to the PC chip on the
FPGA while devoting more appropriate device resources to
architectural demands. However the other challenge of real-
time reconfigurable computing and linking both the soft core
and the hard core aspects of the system to work in tandem
comes into play.
Introduction
23

Most of the images used in this book are well known in the
image processing community and were obtained from the
University of South Carolina Signal and Image Processing
Institute website and others from relevant research papers
and online repositories.

References
- R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2
ed.: Prentice Hall, 2002.
- R. C. Gonzalez, R. E. Woods, and S. L. Eddins, Digital Image
Processing Using MATLAB: Prentice Hall, 2004.
- W. K. Pratt, Digital Image Processing, 4 ed.: Wiley-Interscience,
2007.
- U. Nnolim, FPGA Architectures for Logarithmic Colour Image
Processing, Ph.D. thesis, University of Kent at Canterbury,
Canterbury-Kent, 2009.
- MathWorks, "Image Processing Toolbox 6 User's Guide for use
with MATLAB," The Mathworks, 2008, pp. 285 - 288.
- [6] Mathworks, "Designing Linear Filters in the Frequency
Domain," in Image Processing Toolbox for use with MATLAB,
T. Mathworks, Ed.: The Mathworks, 2008.
- Mathworks, "Filter Design Toolbox 4.5," 2009.
- Weber, "The USC-SIPI Image Database," University of South
Carolina Signal and Image Processing Institute (USC-SIPI),
1981.
- Zuloaga, J. L. Martn, U. Bidarte, and J. A. Ezquerra, "VHDL
test bench for digital image processing systems using a new
image format."
- Cyliax, "The FPGA Tour: Learning the ropes," in Circuit Cellar
online, 1999.
- T. Johnston, K. T. Gribbon, and D. G. Bailey, "Implementing
Image Processing Algorithms on FPGAs," in Proceedings of the
Eleventh Electronics New Zealand Conference (ENZCon04),
Palmerston North, 2004, pp. 118 - 123.
- EETimes, "PLDs/FPGAs," 2009.
- Digilent, "http://www.digilentinc.com," 2009.
- E. R. Davies, Machine Vision: Theory, Algorithms, Practicalities
3rd ed.: Morgan Kaufmann Publishers, 2005.
- Xilinx, "XST User Guide ": http://www.xilinx.com, 2008.
Introduction
24

- www.xilinx.com, "FPGA Design Flow Overview (ISE Help)." vol.
2008: Xilinx, 2005.

25

Chapter 2
Spatial Filter Hardware Architectures
Prior to the implementation of the various filters, it is
necessary to lay the groundwork for the design of spatial
filter hardware architectures in VHDL.

2.1 Linear Filter Architectures
Using spatial filter kernels for image filtering applications in
hardware systems has been a standard route for many
hardware design engineers. As a result, various
architectures in the spatial domain exist in company
technical reports, academic journals and conferences
papers dedicated to digital FPGA hardware-based image
processing. This is not surprising because of the myriad of
image processing applications that incorporate image
filtering techniques.

Such applications include but are not limited to image
contrast enhancement/sharpening, demosaicking,
restoration/noise removal/deblurring, edge detection,
pattern recognition, segmentation, inpainting, etc.

Several authors have published papers involving
implementing a myriad of algorithms involving spatial
filtering hardware architectures for FPGA platforms
performing different tasks or used as add-ons for even
more complex and sophisticated processing operations..

Linear Spatial filter architectures

26

A sample of application areas in industrial processes
include the detection of structural defects in manufactured
products using real-time imaging and edge detection
techniques to remove damaged products from the
assembly line.

Though frequency (Fourier Transform) domain filtering may
be faster for larger images and optical processes, spatial
filtering using relatively small kernels and make several of
these processes feasible for physical, real-time applications
and reduce computational costs and resources in FPGA
digital hardware systems.

Figure 2.1(i) shows one of the essential components of a
spatial domain filter, which is a window generator for a 5 x 5
kernel for evaluating the local region of the image.

Line In 1
Line In 2
Line In 5
FF
FF
FF
Line In 3
Line In 4 FF
FF
FF FF FF FF Line Out 1
FF FF FF FF
FF FF FF FF
FF FF FF FF
FF FF FF FF
Line Out 2
Line Out 3
Line Out 4
Line Out 5

Figure 2.1(i) 55 window generator hardware architecture

The boxes represent the flip flops (FF) or delay elements
with each box providing one delay. In digital signal
processing notation, a flip flop is represented in the z-
domain by
and in the discrete time domain as ,


27

where x would be the delayed signal. The data comes in
from the left hand side of the unit and each line is delayed
by 5 cycles. For a 3 x 3 kernel, there would be three lines
and each would be delayed by 3 cycles.

Figure 2.1(ii) shows the line buffer array unit which consists
of long shift registers composed of several flip flops. Each
line buffer is set to the length of one row of the image.
Thus, for a 128 x 128 greyscale image with 8 bits per pixel,
each line buffer would be 128 wide and 8 bits deep.

Line Buffer1
Line Buffer2
Line Buffer5
Line out1
Line out2
Line out5
Data_in
Line Buffer3 Line out3
Line Buffer4 Line out4

Figure 2.1(ii) Line buffer array hardware architecture

The rest of the architecture would include adders, dividers,
and multipliers or look up tables. These are not shown as
they are much easier to understand and implement.

The main components of the spatial domain architectures
are the window generator and line delay elements. The
delay elements can be built from First in First out (FIFO) or
shift register components for the line buffers.

28

The architecture of the processing elements is heavily
determined by the mathematical properties of the filter
kernels. For instance the symmetric or separable nature of
certain kernels is incorporated in the hardware design to
reduce multiply-accumulate operations. There are mainly
three kinds of filter kernels, namely symmetric, separable-
symmetric and non-separable, non-symmetric kernels. To
understand the need for this clarification, it is necessary to
discuss the growth in mathematical operations of image
processing algorithms implemented in digital hardware.

2.1.1 Generic Filter architecture
In the standard spatial filter architectures, the filter kernel is
defined as is and each coefficient of the defined kernel has
its own dedicated multiplier and corresponding image
window coefficient. Thus, this architecture is flexible for a
particular defined size of kernel and any combination of
coefficient values can be loaded to this architecture without
modifying the architecture in any way. However, this
architecture is inefficient when a set of coefficients in the
filter have the same values and redundancy grows as the
number of matching coefficients increases. It also becomes
computationally complex as filter kernel size increases
since more processing elements will be needed to perform
the full operation on a similarly sized image window. The
utility of the filter is limited to small kernel sizes ranging
from 33 to about 99 dimensions. Beyond this, the
definition and instantiation of the architecture and its
coefficients become unwieldy, especially in digital hardware
description languages used to program the hardware
devices. Figure 2.1.1 depicts an example of generic 55
filter kernel architecture.

29

Line Buffer
Line Buffer
Line Buffer
Data_in FF
FF
FF
Line Buffer
Line Buffer FF
FF
FF FF FF FF
FF FF FF FF
FF FF FF FF
FF FF FF FF
FF FF FF FF
Data_out
c20
c21
c22
c23
c24
Data_out
c5
c6
c7
c8
c9
Data_out
c10
c11
c12
c13
c14
Data_out
c15
c16
c17
c18
c19
Data_out
c0
c1
c2
c3
c4

Figure 2.1.1 Generic 55 spatial filter hardware architecture

The 25 filter coefficients range from c0 to c24 and are
multiplied with the values stored in the window generator
grid made up of flip flops (FF). These coefficients are
weights, which determine the extent of the contribution of

30

the image pixels in the final convolution output. The partial
products are then summed in the adder blocks. Not shown
in the diagram is another adder block to sum all the five
sums of products. The final sum is divided by a constant
value, which is usually defined as a multiple of 2 for good
digital design practice.

2.1.2 Separable Filter architecture
The separable filter kernel architectures are much more
computationally efficient where applicable. However, these
are more suited to low-pass filtering using Gaussian kernels
(which have the separability property). The architecture
reduces a two dimensional N N sized filter kernel to two,
one dimensional filters of length N. Thus a one-dimensional
convolution operation (which is much easier than 2-D
convolution) is performed followed by multiplication
operations. The savings on multiply-accumulate operations
as a result in the reduction in the number of processing
elements demanded by the architecture can really be truly
appreciated when designing very large filter convolution
kernel sizes. Due to the fact that spatial domain convolution
is more computationally efficient for small filter kernel sizes,
separable spatial filter kernels further increase this
efficiency (especially for large kernels built as with a generic
filter architecture implementation).

Figure 2.1.2 depicts an example of separable filter kernel
architecture for a 5 5 spatial filter now reduced to 5 since
the row and the column filter coefficients are the same with
one 1-D filter being the transpose of the other.


31

Figure 2.1.2 Separable 55 spatial filter hardware architecture

32

Observing the diagram in Figure 2.1.2, it can be seen that
the number of processing elements and filter coefficients
have been dramatically reduced in this filter architecture.
For example, the 25 coefficients in the generic filter
architecture have been reduced to just 5 coefficients which
are reused.
2.1.3 Symmetric Filter Kernel architecture
Symmetric filter kernel architectures are more suited to
high-pass and high-frequency emphasis (boost filtering)
operations with equal weights and reduce the number of
processing elements, thereby reducing the number of
multiply-accumulate operations. A set of pixels in the image
window of interest are added together and then the sum is
multiplied by the corresponding coefficient, which has the
same value for those particular pixels in their respective,
corresponding locations. Figure 2.1.3(i) shows a Gaussian
symmetric high-pass filter generated using the windowing
method while Figure 2.1.3(ii) depicts an example of
symmetric filter kernel architecture

Figure 2.1.3(i) Frequency domain response of symmetric Gaussian
high-pass filter obtained from spatial domain symmetric Gaussian with
windowing method

33

Figure 2.1.3(ii) 5 x 5 symmetric spatial filter hardware architecture


34

2.1.4 Quadrant Symmetric Filter architecture
The quadrant symmetric filter is basically one quadrant (or
a quarter) of a circular symmetric filter kernel and rotated
360 degrees. The hardware architecture is very efficient
since it occupies a quarter of the space normally used for a
full filter kernel.

To summarize the discussion of spatial filter hardware
architectures, it is necessary to present a comparison of the
savings of hardware resources with regards to reduced
multiply-accumulate operations.

For an N N spatial filter kernel, N N multiplications and
(N N)-1, additions are required. For example, for a 3 3
filter, 9 multiplications and 8 additions are needed for each
output pixel calculation, while for a 99 filter, 81
multiplications and 80 additions are needed per output pixel
computation.

Since multiplications are costly in terms of hardware,
designs are geared towards reducing the number of
multiplication operations or eliminating them entirely.

Table 2.1.4 gives a summary of the number of multiplication
and addition operations per image pixel required for varying
filter kernel sizes using different filter architectures.


35

Kerne
l size
*/pixel
(GFKA)
+/
pixel
(GFK
A)
*/
pixel
(SFK
A)
+/pixe
l
(SFK
A)
*/ pixel
(Sym
FKA)
+/ pixel
(Sym
FKA)
33 9 8 6 4 4/3 8
55 25 24 10 8 6/5 24
77 49 48 14 12 8/7 48
99 81 80 18 16 10/9 80
1313 169 168 26 24 14/13 168
2727 729 728 54 52 28/27 728
3131 961 960 62 60 32/31 960

Table 2.1.4 MAC operations and filter kernel size and type

KEY
*/pixel Multiplications per pixel
+/pixel Additions per pixel
GFKA Generic Filter Kernel Architecture
SFKA Separable Filter Kernel Architecture
Sym FKA Circular Symmetric Filter Kernel Architecture

2.2 Non-linear Filter Architectures
The nature of non-linear filter architectures is more complex
than that of linear filters and depends on the algorithm or
order statistics used in the algorithm. Since most of the
algorithms covered in this book involve linear filtering, we
focus more on linear spatial domain filtering.
Summary
In this section, we discussed several linear spatial filter
hardware architectures used for implementing algorithms in
FPGAs using VHDLs and analyzed the cost savings of
each architecture with regards to use of processing
elements in hardware.

36

References
- Cyliax, "The FPGA Tour: Learning the ropes," in Circuit Cellar
online, 1999.
- E. Nelson, "Implementation of Image Processing Algorithms on
FPGA Hardware," in Department of Electrical Engineering. vol.
Master of Science Nashville, TN: Vanderbilt University, 2000, p.
86.
- T. Johnston, K. T. Gribbon, and D. G. Bailey, "Implementing
Image Processing Algorithms on FPGAs," in Proceedings of the
Eleventh Electronics New Zealand Conference (ENZCon04),
Palmerston North, 2004, pp. 118 - 123.
- S. Saponara, L. Fanucci, S. Marsi, G. Ramponi, D. Kammler,
and E. M. Witte, "Application-Specific Instruction-Set Processor
for Retinex-Like Image and Video Processing," IEEE
Transactions on Circuits and Systems II: Express Briefs, vol.
54, pp. 596 - 600, July 2007.
- Google, "Google Directory," in Manufacturers, 2009.
- Digilent, "http://www.digilentinc.com," 2009.
- E. R. Davies, Machine Vision: Theory, Algorithms, Practicalities
3rd ed.: Morgan Kaufmann Publishers, 2005.
- www.xilinx.com, "FPGA Design Flow Overview (ISE Help)." vol.
2008: Xilinx, 2005.
- Mathworks, "Designing Linear Filters in the Frequency
Domain," in Image Processing Toolbox for use with MATLAB,
T. Mathworks, Ed.: The Mathworks, 2008.
- Mathworks, "Filter Design Toolbox 4.5," 2009.

37

Chapter 3
Image Reconstruction
The four stages of image retrieval from camera sensor
acquisition to display device comprise of Demosaicking,
White/Colour Balancing, Gamma Correction and Histogram
Clipping. The process of interest in this chapter is the
demosaicking stage and the VHDL implementation of the
demosaicking algorithm will also be described. The steps of
colour image acquisition from the colour filter array are
shown in Figure 3.

Figure 3 Image acquisition process from camera sensor

3.1 Image Demosaicking
The process of demosaicking attempts to reconstruct a full
colour image from incomplete sampled colour data from an
image sensor overlaid with a colour filter array (CFA) using
interpolation techniques.

The Bayer array is the common type of colour filter array
used in colour sampling for image acquisition. The other
methods of colour image sampling are the Tri-filter, and
Fovean sensor. References to these other methods are
listed at the end of the chapter.

Before we delve deeper into the mechanics of
demosaicking, it is necessary to describe the Bayer filter
Gamma
Correction
Histogram
Clipping
Colour
Balancing Demosaicking
Image Demosaicking
38

array. This grid system involves a CCD or CMOS sensor
chip with M columns and N rows. A colour filter is attached
to the sensor in a certain pattern. For example, the colour
filters could be arranged in a particular pattern as shown by
the Bayer Colour Filter Array architecture shown in Figure
3.1(i).

G
B
G
B
G
R G R G
G B G B
R G R G
G B G B
R G R G

Figure 3.1(i) Bayer Colour Filter Array configuration

Where R, G, B stands for the red, green and blue colour
filters respectively and the sensor chip produces an M N
array. There are two green pixels for every red and blue
pixel in a 2x2 grid because the CFAs are designed to suit
the human sensitivity to green light.

The demosaicking process involves splitting a colour image
into its separate colour channels and filtering with an
interpolating filter. The final convolution results from each
channel are recombined to produce the demosaicked
image.
Image Demosaicking
39

The equation for the basic linear interpolation demosaicking
algorithm is shown for one image channel of an RGB colour
image in (3.1-1 to 3.1-5).

(3.1-1)

(3.1-2)

(3.1-3)

Yielding

(3.1-4)

Expressing the input image as a function of the output
image gives the expression:

1+ (3.1-5)

Where
and
are the original, interpolated stage 1 and

2 images respectively, while
is the demosaicked output

image and
and
are interpolation kernels usually

consisting of an arrangement of ones and zeros. In the
case of this implementation,
and
are 3 x 3 spatial
domain kernels defined as;

and

respectively.

Note the redundant summation of
and
with
, the
original image.
Keeping in mind that this is for one channel of an RGB
colour image, this process can be performed on the R and
Image Demosaicking
40

B channels modified for the G channel as will be explained
further in the following subsections.

The system level diagram for the process for an RGB
colour image is a shown in Figure 3.1(ii):

Figure 3.1(ii) Image Demosaicking process

In the diagram, the convolution involves an interpolation
process in addition to redundant summation

Some example images have been demosaicked to illustrate
the results. The first example is the image shown on the left
hand side in Figure 3.1(iii), which needs to demosaicked.
More examples of demosaicking are shown in Figure 3.1(v)
and 3.1(vi).

(a) (b)
Figure 3.1(iii) (a) Original undersampled RGB image overlaid with
bayer colour filter array and (b) demosaicked image
Convolution
Interpolatio
n
Redundant
Summation
R
G
B
G
R

B

Demosaicking
Image Demosaicking
41

(a) (b)
Figure 3.1(iv) (a) Original undersampled R,G and B channels (b)
Interpolated R, G and B channels
Image Demosaicking
42

The images in Figure 3.1(iv) show the gaps in the image
channel samples. The checkerboard pattern indicates the
loss of colours in between colour pixels by black
spaces/pixels in each channel.

A checkerboard filter kernel is generated and convolved
with the images in Figure 3.1(iv)-(a) to produce the
interpolated images in Figure 3.1(iv)-(b). As can be seen,
most of the holes or black pixels have been filled. The
images in Figure 3.1(iv)-(b) can be filtered again with
checkerboard filters to eliminate all the lines seen in the
blue and red channels.

The reason why the green channel is interpolated in one
pass is that there are two green pixels for every red and
green pixel thus the green channel provides the strongest
contribution in each 2 x 2 grid of the array.

It is important to note that there are various demosaicking
algorithms and they include Pixel Binning/Doubling, Nearest
Neighbour, Bilinear, Smooth Hue Transition, Edge-sensing
Bilinear, Relative Edge-sensing Bilinear, Edge-sensing
Bilinear 2, Variable Number Gradients, Pattern Recognition
interpolation methods. For more information about these
methods, consult the sources listed at the end of the
chapter. Some comparisons between some of the methods
are made using energy images in Figure 3.1(vii).

This is by no means an exhaustive list but indicates that
demosaicking is a very important and broad field as
evidenced by the volume of published literature, which can
be found in several research conference papers and
journals.
Image Demosaicking
43

(a) (b)
Figure 3.1v) (a) Image with Bayer pattern (b) Demosaicked image

(a) (b) (c)

(d) (e) (f)
Figure 3.1(vi) (a) Image combined with Bayer array pattern and
demosaicked image using (b) bilinear interpolation (c) Original image
(d) demosaicked using bilinear 2 and (e) high quality bilinear (f)
Gaussian-laplacian method

It is important to note that current modern digital cameras
have the ability to store digital images in raw format, which
Image Demosaicking
44

enables users to accurately demosaick images using
software without restricting them to the cameras hardware.

(a) (b) (c)

(d) (e) (f)
Figure 3.1(vii) Energy images calculated using Sobel kernel
operators for (a) Original Image (b) combined with Bayer array pattern
and demoisaicked image using (c) and (d) bilinear interpolation, (e)
Gaussian smoothing with Laplacian and (f) Pixel doubling

3.2 VHDL implementation
In this section, the VHDL implementation of the linear
interpolation algorithm used in the demosaicking of RGB
colour images will be discussed. The first part of the
chapter dealt with the software implementation using
MATLAB as the prototyping platform.

Using MATLAB, the implementation was quite trivial,
however in the hardware domain, the VHDL implementation
of a synthesizable digital circuit for the demosaicking
algorithm is going to be a lot more involved as we will
discover.
Image Demosaicking
45

Prior to coding in VHDL, the first step is to understand the
dataflow and to devise the architecture for the algorithm. A
rough start would be to draw a system level diagram that
would include all the major processing blocks of the
algorithm. A top level system diagram is shown in Fig.
3.2(i).

Figure 3.2(i) Black Box system top level description of demosaicking

This is the black box system specification for this
demosaicking algorithm. The next step is to go down a level
into the demosaicking box to add more detail to the system.
Figure 3.2(ii) shows the system level description of the first
interpolation stage of the demosaicking algorithm for the R
channel.

Figure 3.2(ii) System level 1 description showing first interpolation
stage of R channel using demosaicking algorithm

The R channel is convolved with a linear spatial filter mask
as specified in the previous section used in the MATLAB
implementation. The convolved R channel or Rc is then
summed with the original R channel to produce an
interpolated R channel, Rs. The channel, Rs is then passed
on to the second interpolation stage shown in Figure 3.2(iii).
Linear
Spatial Filter1

R
Rs
Rc
Interpolation Stage 1

Demosaicking
R
G
B
R
G
B
Image Demosaicking
46

In this stage, the Rs channel is then convolved with another
linear spatial filter mask, to produce a new signal, Rcs,
which is subsequently summed with the original R channel
and the Rs output channel from the first interpolation stage.
This produces the final interpolated channel, R shown as
the output in Figure 3.2(iii).

Figure 3.2(iii) System level 1 description showing second
interpolation stage of R channel using demosaicking algorithm

The block diagrams shown in the Figure 3.2(ii) and (iii) can
also be used for the B channel. For the G channel, only the
first stage of the interpolation is needed as shown in the
original algorithm equations. Thus, the system level
description for G is as shown in Figure 3.2(iv).

Figure 3.2(iv) System level 1 description showing interpolation stage
of G channel using demosaicking algorithm

The system design can also be done in SIMULINK, which is
the visual system description component of MATLAB. The
Linear
Spatial Filter2

Rs Rc
s
R
R
G
Linear
Spatial Filter1

G
Gc
Image Demosaicking
47

complete circuit would look that shown in Figure 3.2(v).

Figure 3.2(v) SIMULINK System description of linear interpolation
demosaicking algorithm

The diagram designed in SIMULINK shown in Figure 3.2(v)
is the system level architecture of the demosaicking
algorithm with the major processing blocks.

The next step is to develop and design the crucial inner
components of the major processing blocks. Based on the
mathematical expression for the algorithm, we know that
the system will incorporate 3x3 spatial filters and adders.
This leads to the design specification for the spatial filter
which is the most crucial component of this algorithm.

Several spatial filter architectures exist in research literature
with various modifications and specifications depending on
the nature of the desired filter. These basic architectures
were discussed in Chapter 2 and include the generic form,
separable, symmetric and separable symmetric filters. In
this section, we choose the generic 3 x 3 filter architecture
using long shift registers for the line buffers to the filter
instead of FIFOs. We remember that hardware spatial filter
architecture comprises a window generator, pixel counter
and line buffers, shift registers, flip flops, adders and
Vi deo
Vi ewer
R
G
B
Vi deo Vi ewer
dc168_l enna_bayer.png
R
G
B
Image From Fi l e
Bs
B
B_prime
i nterp_fi l ter_b2
Embedded
MATLAB Functi on5
Rs
R
R_prime
i nterp_fi l ter_r2
Embedded
MATLAB Functi on3
B
Bs
B1
i nterp_fi l ter_b
Embedded
MATLAB Functi on2
G G_prime i nterp_fi l ter_g
Embedded
MATLAB Functi on1
R
Rs
R1
i nterp_fi l ter_r
Embedded
MATLAB Functi on
Image Demosaicking
48

multipliers. Building on the spatial filter architectures
discussed in Chapter 2, all that needs to be modified in the
filter architecture are the coefficients for the filter and the
divider settings. Skeleton VHDL codes, which can be
modified for this design can be found in the Appendices.

A brief snippet of the VHDL code used in constructing the
interpolation step for the R channel is shown in Figure
3.2(vi). The top part of the code in Figure 3.2(vi) includes
the instantiations of the necessary libraries and packages.

Figure 3.2(vi) VHDL code snippet for specifying interpolation filter for
R channel

Image Demosaicking
49

Figure 3.2(vii) Visual system level description of the VHDL code
snippet for specifying interpolation filter for R channel

The component specification of the interp_mask_5x5_512
part in the VHDL code shown in Figure 3.2(i) is embedded
within the system level description of the interp_filter_r
system as described in Figure 3.2(ii) 3.2(iii).

3.2.1 Image Selection
Now we select the image we want to process so for the
convenience, we choose the Lena image overlaid over a
CFA array as shown in Figure 3.2.1(i) The criteria for
choosing this image includes the familiarity of this image to
the image processing community and also because it is a
square image (256 x 256), which makes it easier to specify
in the hardware filter without having to pad the image or
add extra pixels.

Figure 3.2.1(i) Original image to be demosaicked

Interp_filter_r
top_clk
top_rst
dat_in
dat_out
D_out_valid
Image Demosaicking
50

Based on what was discussed about demosaicking, we
know that the easiest channel to demosaick would be the
green channel since there are two green pixels for every
red and blue pixel in a 2 x 2 CFA array, thus only one
interpolation pass is required. Thus we will discuss the
green channel last.

(a) (b)

(c) (d)
Figure 3.2.1(ii) Demosaicked R image channel from left to right
(software simulation) (a) original R channel, (b) filtered channel, Rc,
from first stage interpolation (c) filtered channel, Rcs, from second
stage interpolation (d) demosaicked image
Image Demosaicking
51

In Figure 3.2.1(ii), we can observe the intermediate
interpolation results of the spatial filter from left to right. Red
image is the original red channel, (R) of the image in Figure
3.2.1(i). Red1 image is the interpolated image from the first
stage or Rc from the diagram in Figure 3.1.2(i). Red2 image
is the second interpolated image (Rcs) from Figure 3.1.2(iii)
while (d) is the final demosaicked R channel, R.

The diagrams shown in Figure 3.2.1(iii), are the image
results obtained from both software (a) and the hardware
simulation (b). The results show no visually perceptible
difference, thus the hardware filter scheme was
implemented correctly.

There is no need to attempt to quantify the accuracy by
taking the difference between the images obtained from
software and hardware simulations as the visual results are
very good.

The three image channels processed with both the software
and hardware implementation of the demosaicking
algorithm are shown for the purposes of visual analysis.

The three channels are then recombined together to create
the compositie RGB colour image and compared with the
colour image obtained from the software simulation as well
as the original CFA overlaid image in Figure 3.2.1(iv).

Image Demosaicking
52

(a) (b)
Figure 3.2.1(iii) Demosaicked images with (a) software simulation
and (b) hardware simulation: first row: R channel, second row: G
channel and third row: B channel

Image Demosaicking
53

(a) (b)
Figure 3.2.1(iv) Demosaicked colour image: (a) software simulation
(b) hardware simulation

Comparing the images in the Figure 3.2.1(iv) shows the
strikingly good result obtained from the hardware simulation
in addition to the successful removal of the CFA
interference in the demosaicked image. However, on closer
inspection, one may observe that there are colour artifacts
in regions of high/low frequency discontinuities in the
image.

Also, because this image contains a relatively medium
amount of high frequency information, one can get away
with this linear interpolation demosaicking method. For
images with a lot of high frequency information, the
limitations of linear methods become ever more apparent.

In Figure 3.2.1(v), we present the original CFA overlaid
image with the demosaicked results for comparison and the
results are even more striking.

The investigation of higher and more advanced methods
are left to the reader who wishes to learn more. Some
Image Demosaicking
54

useful sources and research papers are listed at the end of
the chapter for further research.

(a) (b) (c)
Figure 3.2.1(v) (a) Image to be demosaicked (b) Demosaicked
image (software) (c) Demosaicked image (hardware simulation)

A snapshot of the ModelSim simulation window is shown in
Figure 3.2.1(vi) indicating the clock signal, the inputs and
outputs of the interpolation process.

Figure 3.2.1(vi) Snapshot of VHDL image processing in ModelSim
simulation window

The system top level description generated by Xilinx ISE
from the VHDL code is shown in Figure 3.2.1(vii). Since we
are dealing with unsigned 8-bit images, we only require 8
bits for each channel leading to 256 levels of gray for each
Image Demosaicking
55

channel. The data_out_valid signal and the clock are
needed for proper synchronization of the inputs and outputs
of the system. Note that this diagram mirrors the black box
system description defined at the beginning of this section
describing the VHDL implementation of the algorithm.

Figure 3.2.1(vii) Black box top level VHDL description of
demosaicking algorithm

The next level of the top level system shows the major
components of the system for each of the R, G and B
channels.

Further probing reveals structures similar to those that were
described earlier on at the beginning of the VHDL section of
this chapter. Refer to the Appendix for more detailed RTL
technology schematics and levels of the system.

Image Demosaicking
56

Figure 3.2.1(viii) first level of VHDL description of demosaicking
algorithm

The synthesis results for the implemented demosaicking
algorithm on the Xilinx Spartan 3 FPGA chip is given as:

Minimum period: 13.437ns (Maximum Frequency: 74.421MHz)
Minimum input arrival time before clock: 6.464ns
Maximum output required time after clock: 10.644ns
Maximum combinational path delay: 4.935ns

The maximum frequency implies that for a 256 x 256
image, the frame rate for this architecture is given by:

Image Demosaicking
57

Using this formula yields 1135 frames/sec, which is
exceedingly fast.

Using the spatial filter architectures described in Chapter 2,
several of the other demosaicking methods can be
implemented in VHDL and hardware. Some good papers on
image demosaicking are listed in the references section
and enable the reader to start implementing the algorithms
quickly and performing experiments with the various
algorithms.

Summary
In this chapter, the demosaicking process using linear
interpolation was described and implemented in software
and followed by the VHDL implementation of the linear
interpolation algorithm for demosaicking.

References
2007.
- Henrique S. etal, HIGH-QUALITY LINEAR INTERPOLATION
FOR DEMOSAICING OF BAYER-PATTERNED COLOR
IMAGES, Microsoft Research, One Microsoft Way, Redmond
WA 98052
- Alexey Lukin and Denis Kubasov, An Improved Demosaicing
Algorithm,Faculty of Applied Mathematics and Computer
Science, State University of Moscow, Russia
- Rmi Jean, Demosaicing with The Bayer Pattern, Department
of Computer Science, University of North Carolina.
- Robert A. Maschal Jr., etal, Review of Bayer Pattern Color
Filter Array (CFA) Demosaicing with New Quality Assessment
Algorithms, ARMY RESEARCH LABORATORY,ARL-TR-5061,
January 2010.
- Yang-Ki Cho, etal, Two Stage Demosaicing Algorithm for Color
Filter Arrays, International Journal of Future Generation
Communication and Networking, Vol. 3, No. 1, March, 2010.
Image Demosaicking
58

- Rajeev Ramanath and Wesley E. Snyder, Adaptive
demosaicking, Journal of Electronic Imaging 12(4), 633642
(October 2003).
- Boris Ajdin, etal, Demosaicing by Smoothing along 1D
Features, MPI Informatik, Saarbrucken, Germany.
- Yizhen Huang,Demosaicking Recognition with Applications in
Digital Photo Authentication based on a Quadratic Pixel
Correlation Model, Shanghai Video Capture Team, ATI
Graphics Division, AMD Inc.

59

Chapter 4
Image Enhancement
This chapter explores some image enhancement concepts,
algorithms, their architectures and implementation in VHDL.

Image enhancement is a process that involves the
improvement of an image by modifying attributes such as
contrast, colour, tone and sharpness. This process can be
performed manually by a human user or automatically using
an image enhancement algorithm, developed as a
computer program. Unlike image restoration, image
enhancement is a subjective process and usually operates
without prior objective image information used to judge the
level or quantify the amount of enhancement. Also,
enhancement results are usually targeted at human end
users who use visual assessment to judge the quality of an
enhanced image, which would be difficult for a machine or
program to perform.

Image enhancement can be performed in the spatial,
frequency, wavelet, and fuzzy domain and in these domains
can be classified as local (point and/or mask) or global
operations in addition to being linear or nonlinear
processes.

A myriad of algorithms have been developed in this field
both in industry and in academia evidenced by the
numerous conference papers and journals, reports and
books and several useful sources are listed at the end of
the chapter for further study.

Image Enhancement
60

4.1 Point-based Enhancement
These work on each individual pixel of the image
independent of surrounding points or pixels to enhance the
whole image. An example would be any function like
logarithm, cosine, exponential, or square root operations.

4.1.1 Logarithm Transform
An example of a point-based enhancement process is the
logarithm transform. It is used to compress the dynamic
range of the image scene and can also be a pre-processing
step for further image processing processes as will be seen
in the subsequent section. The logarithm transform using
the natural logarithm (in base e) is given as;

(4.1-1)

In digital hardware implementation, it is more convenient
and logical to use binary (base 2) logarithms instead. A
simple logarithm circuit could consist of a range of pre-
computed logarithm values stored in ROM memory as a
look up table (LUT). This relatively trivial design is shown in
Figure 4.1(i). More complex designs can be found in the
relevant literature.

Linear Input
ROM
LUT
Address
Generator
Offset
+
Logarithmic
Output

Figure 4.1.1(i)ROM LUT-based binary logarithm hardware
architecture
Image Enhancement
61

Figure 4.1.1(ii) show the results of using the design in
Figure 4.1.1(i) to enhance the original cameraman image
(top left) to produce the log-transformed image (top right)
and the double precision, floating point log-transformed
image (bottom left) while the error image (bottom right is
shown).

Figure 4.1.1(ii) Comparison of image processed with fixed-point LUT
logarithm values against double-precision, floating-point logarithm
values

There is a subtle difference between the fixed point and
floating point logarithm results in Figure 4.1.1(ii).

As was mentioned earlier, there are several other complex
algorithms used to compute binary logarithms in digital logic
circuits and these have a varying range of performance with
Image Enhancement
62

regards to power, accuracy, efficiency, memory
requirements, speed, etc. However, the topic of binary
logarithmic calculation is quite broad and beyond the scope
of this book. The next section discusses the Gamma
Correction method used in colour display devices.
4.1.2 Gamma Correction
Gamma correction is a simple process for enhancing
images for display on various displays, viewing and printing
devices. The formula is quite straightforward and is
basically an exponential transform using a particular
constant value known as the gamma factor, which is the
exponent. An example of image processed with Gamma
Correction is shown in Figure 4.1.2.

(a) (b)
Figure 4.1.2 (a) Original image (b) Gamma Corrected image

Note the change in colour difference after Gamma
Correction, especially for adjacent, similar colours. The next
section talks about Histogram Clipping, which belongs to
the same class of algorithms like Histogram Equalization.
4.1.3 Histogram Clipping
Histogram clipping involves the re-adjustment of pixel
intensities to enable the proper display of the acquired
image from the camera sensor. It expands the dynamic
range of the captured image to improve colour contrast.
Image Enhancement
63

Figure 4.1.3(i) illustrates an the image from Figure 4.1.2(a)
processed with Histogram clipping and Gamma Correction.
(a) (b)
Figure 4.1.3(i) (a) Histogram clipped image and (d) Gamma
Corrected image after Histogram Clipping

Note the difference between the original and Gamma
Corrected images in Figure 4.1.2 and the Histogram
Clipped image in Figure 4.1.3(i) and its Gamma Corrected
version in (b).

The code snippet for the basic histogram clipping algorithm
is shown in Figure 4.1.3(ii).

Figure 4.1.3(ii) MATLAB code snippet of histogram clipping
Image Enhancement
64

4.2 Local/neighbourhood enhancement
These types of enhancement methods process individual
pixels as a function of adjacent neighbourhood image
pixels. They perform these operations using linear or non-
linear filter processes.. Examples of this type of filtering
include un-sharp masking (linear) and logarithmic local
adaptive (non-linear) enhancement.
4.2.1 Unsharp Masking
Unsharp masking involves using sharpening masks like the
Laplacian to sharpen an image by magnifying the effects of
the high frequency components of the image, where most
of the information in the scene resides. The Laplacian
masks used in the software and VHDL hardware
implementation are shown as;

L1 =
(
(
(

1 1 1
1 9 1
1 1 1
and L2 =
(
(
(

0 1 0
1 5 1
0 1 0
respectively.

The image results from the hardware simulation of the low-
pass and Laplacian 3x3 filters are shown in Figure 4.2.1.

(a) (b) (c)
Figure 4.2.1 VHDL-based hardware simulation of (a) - (c) Laplacian-
filtered images using varying kernel coefficients
Image Enhancement
65

4.2.2 Logarithmic local adaptive enhancement
This algorithm uses the logarithmic transform and local non-
linear statistics, (local image variance) to enhance the
image. The method is similar to a spatial filtering operation
in addition to using a logarithm transform. Figure 4.2.2
shows an image processed with the algorithm.

(a) (b)
Figure 4.2.2 (a) Original image (b) Image processed with LLAE

This method produces improved contrast in the processed
image as is evident in the images in Figure 4.2.2 where the
lines and details on the mountain terrain can be clearly
seen after enhancement in addition to richer colours.
4.3 Global/Frequency Domain Enhancement
Global/Frequency domain enhancement processes the
image as a function of the cumulative summation of the
frequency components in the entire image. This transforms
the spatially varying image into a spectral one by summing
up all the contributions of each pixel in relation to the entire
image. The image is then processed in the spectral domain
with a spectral filter after which the image is transformed
back to the spatial domain for visual observation.

Image Enhancement
66

4.3.1 Homomorphic filter
The operation of the Homomorphic filter is based on the
Illuminance/Reflectance image model and was developed
by Allan Oppenheim initially for filtering of audio signals and
has found numerous applications in digital image
processing. This filtering technique achieves enhancement
by improving the contrast and dynamic range compression
of the image scene. The process follows the scheme in
Figure 1.2 and the equation for the operation is given as
follows:

(4.3.1-1)

Where
is the enhanced image and
is the
input image, FFT stands for the Fast Fourier Transform and
is the frequency domain filter.

With the basic introduction to enhancement, the next step is
to describe the VHDL implementation of the key
enhancement algorithm.
4.4 VHDL implementation
Unfortunately, performing the Fourier Transform in software
is much less demanding than in hardware and though there
are hardware IP cores for the FFT algorithm, it makes
sense to transform all frequency domain image filtering
processes to spatial domain because of the ease of
implementation in hardware. Thus, the VHDL
implementation of the Homomorphic filter is done in the
spatial domain since we can then avoid the Fourier
Transform computation and generate small but effective
spatial domain filter kernel for the filtering.

Image Enhancement
67

In implementing this, the main components are the
logarithm transformation components and the spatial
domain filter. By building each individual component
separately, debugging and testing becomes much easier.
Once more, we describe the top level system.

Thus, we have the RGB input and output ports in the top
level. Then the next level in Figure 4.4(ii) shows the inner
main components of the top level system.

Figure 4.4(i) Top level architecture of RGB Homomorphic filter
a1
1
a2
2
3
a3
4
a4
b1
b2
b3
b4
5
6
7
8
3x3
Spatial Filter
a1
1
a2
2
3
a3
4
a4
b1
b2
b3
b4
5
6
7
8
LIN2LOG
a1
1
a2
2
3
a3
4
a4
b1
b2
b3
b4
5
6
7
8
LOG2LIN
A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
Red(7:0)
Green(7:0)
Blue(7:0)
Red(15:0)
Green(15:0)
Blue(15:0)
Red(15:0)
Green(15:0)
Blue(15:0)
Red(7:0)
Green(7:0)
Blue(7:0)
Clk
A1
1
Data_Valid A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
A1
1
Homomorphic Filter System (top level)
Figure 4.4(ii) Top level architecture of RGB Homomorphic system
with inner sub-components

The image shown in Figure 4.4(iii) was processed with a
RGB Homomorphic filter implemented in VHDL for an
Image Enhancement
68

FPGA. The hardware simulation image result is also shown
alongside the original image for comparison.

(a) (b)
Figure 4.4(iii) (a) Original image (b) processed image with RGB
Homomorphic filter (hardware simulation)

It can be easily observed that the Homomorphic filter clearly
improved the original image as there are more details in the
enhanced image scene where we can now distinguish
foreground and background objects. The maximum speed
of this architecture on the Xilinx Spartan 3 FPGA is around
80 MHz based on synthesis results.

Summary
We discussed several image enhancement algorithms and
implemented the more effective and popular ones in VHDL
and analysed the image results of implemented
architectures of the system.
References
Image Enhancement
69

2007.
1981.
- Zuloaga, J. L. Martn, U. Bidarte, and J. A. Ezquerra, "VHDL
test bench for digital image processing systems using a new
image format."
- Xilinx, "XST User Guide ": http://www.xilinx.com, 2008..
- G. Deng and L. W. Cahill, "Multiscale image enhancement
using the logarithmic image processing model," Electronics
Letters, vol. 29, pp. 803 - 804, 29 Apr 1993.
- G. Deng, L. W. C., and G. R. Tobin, "The Study of Logarithmic
Image Processing Model and Its Application to Image
Enhancement," IEEE Transaction on Image Processing, vol. 4,
pp. 506-512, 1995.
- S. E. Umbaugh, Computer Imaging: Digital Image Analysis and
Processing. Boca Raton, FL: CRC Press, Taylor & Francis
Group, 2005.
- A. Oppenheim, R. W. Schafer, and T. G. Stockham, "Nonlinear
Filtering of Multiplied and Convolved Signals," Proceedings of
the IEEE, vol. 56, pp. 1264 - 1291, August 1968.
- U. Nnolim and P. Lee, "Homomorphic Filtering of colour images
using a Spatial Filter Kernel in the HSI colour space," in IEEE
Instrumentation and Measurement Technology Conference
Proceedings, 2008, (IMTC 2008) Victoria, Vancouver Island,
Canada: IEEE, 2008, pp. 1738-1743.
- F. T. Arslan and A. M. Grigoryan, "Fast Splitting alpha - Rooting
Method of Image Enhancement: Tensor Representation," IEEE
Transactions on Image Processing, vol. 15, pp. 3375 - 3384,
November 2006.
- S. S. Agaian, K. Panetta, and A. M. Grigoryan, "Transform-
Based Image Enhancement Algorithms with Performance
Measure," IEEE Transactions on Image Processing, vol. 10, pp.
367 - 382, March 2001.

70

Chapter 5
Image Edge Detection and
Smoothing
This chapter deals with the VHDL implementation of image
edge detection and smoothing filter kernels using the
spatial filter architectures from Chapter 2.The original
greyscale images to be processed are shown in Figure 5.

All the filters are modular in their design, thus the RGB
colour versions are simply triplicate instantiations of the
greyscale filters.

Figure 5 Original (256 256) images to be processed
5.1 Image edge detection kernels
These kernels are digital mask approximations of derivative
filters for edge enhancements and they include:
- Sobel kernel
- Prewitt kernel
- Roberts kernel

They are the products of numerical solutions to complex
partial differential equations like the Laplacian equation.

Image Edge Detection and Smoothing

71

This class of filter kernels are used to find and identify
edges in an image by finding gradients of the image in
vertical and horizontal directions which are then combined
to produce the actual amplitude of the image. Some well
known kernels are the Sobel, Prewitt and Roberts kernels.
Also the Canny edge detection method uses the edge
finding filters as part of the algorithm.

The Sobel, Prewitt and Roberts kernel approximations are
simple but effective tools in image edge and corner
detection. The best edge detection algorithm commonly
used is the famous Canny edge detector technique, which
is the most effective method for detecting both weak and
strong edges. However, though the Canny algorithm is a bit
more involved and beyond the focus of this book the
mentioned filtering techniques provide the basic steps of
the algorithm.
5.1.1 Sobel edge filter
The Sobel kernel masks used to find the horizontal and
vertical edges in the image in the VHDL implementation
were
S
X
=
(
(
(
1 2 1
0 0 0
1 2 1
, S
Y
=
(
(
(

1 0 1
2 0 2
1 0 1

The x and y subscripts denote horizontal and vertical
positions respectively.

The hardware and software simulation results of the images
processed with these filter kernels in hardware are shown in
Figure 5.1.1.

72

(a) (b)

(c) (d)
Figure 5.1.1 Comparison between (a) & (b) VHDL-based hardware
simulation of Sobel filter (x and y direction) processed image and (c)
and (d) MATLAB-based software simulation of Sobel filter (x and y
direction)
5.1.2 Prewitt edge filter
The Prewitt kernel masks used for finding horizontal and
vertical lines in the image in the VHDL implementation were

P
X
=
(
(
(

0 1 0
1 5 1
0 1 0
, P
Y
=
(
(
(

1 0 1
2 0 2
1 0 1


73

(a) (b)
Figure 5.1.2 VHDL-based hardware simulation of (a) & (b) Prewitt
filter (x and y direction) processed image.

The image results for the edge filters and the high-pass are
in this form because most of the (negative) image pixel
values are outside the unsigned 8-bit integer display range

Appropriate scaling within the filter (using the mat2gray
function in MATLAB for example) would ensure that all pixel
values are mapped into the display range. The end result
will be an embossing effect in the output image.

On further analysis and comparison, the results from the
hardware filter simulation are quite comparable to the
software versions.

5.1.3 High Pass Filter
The high-pass filter only allows high frequency components
in the image (like lines and edges) in the passband and is
the default state of the edge or derivative filters. These
filters are the image processing applications of derivatives
from Calculus as was mentioned earlier.


74

The kernel for the default high-pass filter used is defined as;

HPF =
(
(
(

1 1 1
1 8 1
1 1 1

Though the earlier filter kernels mentioned are also types of
high-pass filters, the default version is much harsher than
the Sobel and Prewitt filters as can be observed from the
image results in Figure 5.1.3 from VHDL hardware
simulation. And the reason for the harshness is easily seen
from the kernel coefficients because weak edges are not
favoured over strong edges since all weights are equally
weighted unlike the other edge filter kernels.

Figure 5.1.3 VHDL hardware simulations of high-pass filtered images
5.2 Image Smoothing Filters
These types of filters enhance the low frequency
components of the image scene by reducing the gradients
or sharp changes across frequency components in the
image, which is visually manifested as a smoothing effect.
They can also be called integration or anti-derivative filters
from Calculus. They can be used in demosaicking, noise
removal or suppression and their effectiveness varies

75

depending on the complexity and level of non-linearity of
the algorithms.

5.2.1 Mean/Averaging filter
Averaging or mean filters are low-pass filters used for
image smoothening tasks such as removing noise in an
image. They eliminate the high frequency components,
which contribute to the visual sharpness and high contrast
areas of an image. The easiest method of implementing an
averaging filter is to use the kernel specified as:

LPF=
(
(
(
9 / 1 9 / 1 9 / 1
9 / 1 9 / 1 9 / 1
9 / 1 9 / 1 9 / 1

There is a considerable loss of detail in using the basic
mean (box) filter for image smoothing/denoising as it will
blur edges along with the noise it is attempting to remove.
Also, note that the Low-pass mean filter is the inverse of the
Highpass in 5.1.3.

5.2.2 Gaussian Lowpass filter
The Gaussian lowpass filter is another type of smoothing
filter that produces a better result than the standard
averaging filter because it assigns different weights to
different pixels in the local image neighbourhood. Also,
Gaussian filters can be separable and/or circularly
symmetric depending on the design. Separable filter
kernels are very important in hardware image filtering
operations because of the reduction of multiplications or

76

operations needed as was discussed in Chapter 2. The
kernel for the spatial Gaussian filter is;

Which can also be expressed in its separable form;

Figure 5.2.2 shows the image results comparing a mean
filter with the Gaussian low-pass filter.

(a) (b)
Figure 5.2.2 VHDL-based hardware simulation of (a) mean filter & (b)
Gaussian low-pass filtered images

In (b), the Gaussian filter with different weights was used
and provides a much better result than the image in (a).

It is important to note that the filter architectures for these
types of filters can be further minimized for efficient usage
of hardware resources. For example, the High-pass filter

77

can have symmetric filter architecture while the Low-pass
filter can have separable and symmetric filter architecture
while the Laplacian and high boost filtering for edge
enhancement can also have symmetric filter architecture).
Additionally, the Sobel, Prewitt and Gaussian filters can
have symmetric and separable filter architecture.

Summary
In this chapter, we introduced spatial filters used for edge
detection and smoothing and showed the VHDL
implementation of the algorithms compared with the
software versions.

References
2007.
1981.

78

Chapter 6
Colour Image Conversion
This chapter deals with the VHDL implementation of colour
space converters for colour image processing.

Colour space conversions are necessary for certain
morphological and analytical processes such as
segmentation, pattern and texture recognition, where the
colour information of each pixel must be accurately
preserved throughout the processing. Processing an image
in RGB colour image with certain algorithms like histogram
equalization will lead to distorted hues in the output image
since each colour pixel in an RGB image is a vector
composed of three scalars values from the individual R, G
and B image channels.

Colour space conversions can be additive, subtractive,
linear and non-linear processes. Usually, the more involved
the colour conversion process, the better the results.

Examples of the various types of colour spaces include but
are not limited to:

6.1 Additive colour spaces
The additive colour spaces include:
CIELAB/ L*a*b* Colour Coordinate System
RGB Colour Coordinate System

These colour spaces are used in areas such as digital film
photography and television
79

The CIELAB Colour space system was developed to be
independent of display devices and is one of the more
complete colour spaces since it approximates human
vision. Additionally, a lot of colours in the L*a*b* space
cannot be realized in the real world and so are termed
imaginary colours. This implies that this colour space
requires a lot of memory for accurate representation, thus
conversion to 24 bit RGB is a lossy process and will require
at least 48 bit RGB for good resolution.

The RGB colour space was devised specifically for
computer vision for display (LCDs, CRTs, etc) and camera
devices and has several variants of which include sRGB
(used in HD digital image and video cameras) and Adobe
RGB. It is made of Red, Green and Blue channels from
which various combinations of these three primary colours
are used to generate a myriad of secondary and higher
colours.

6.2 Subtractive Colour spaces
CMY Colour Coordinate System
CMYK Colour Coordinate System

Subtractive colour spaces like the CMY (Cyan, Magenta
and Yellow) and CMYK (CMY plus black Key) are used for
printing purposes. For CMY, the simple formula is:

(6.2-1)

Where R, G and B values are normalized to the range [0, 1]
by using the expressions,
80

and
(6.2-2)

However, just by observation and using this formula, one
can observe that this formula is not very good in practice.
Thus, the CMYK method is the preferred colour space for
printers.

The formula of the CMYK method is a bit more involved and
is dependent on the colour space and the colour ICC
profiles used by the hardware device used to output the
colour image (e.g. scanner, printer, camera, camcorder,
etc).
Some sample formulae include:

(6.2-3)

(6.2-4)

Another variation is given as;

(6.2-5)

(6.2-6)

(6.2-7)

(6.2-8)

81

(6.2-9)

(6.2-10)

(a) (b) (c) (d)
Figure 6.2(i) (a)C image (b)M image (c) Y image (d) K image

(a) (b) (c) (d)
Figure 6.2(ii) (a)C image (b)M image (c) Y image (d) K image

(a) (b)
Figure 6.2(iii) (a)CMY image (b) CMYK image (K not added)

82

The VHDL implementation of the CMYK converter is trivial
and is left as an exercise for the interested reader using the
design approach outlined for the more complex designs.

6.3 Video Colour spaces
YIQ NTSC Transmission Colour Coordinate System
YCbCr Transmission Colour Coordinate System
YUV Transmission Colour Coordinate System

These colour space conversions must be fast and efficient
to be useful in video operation. The typical form for such
transformations is as given in (6.3-1).

(6.3-1)

Where X, Y and Z are the channels of the required colour
space, R, G and B are the initial channels from the RGB
colour space and
are constant coefficients.

The implemented colour spaces are the YIQ (NTSC) colour
space, YCbCr and the YUV colour spaces. The MATLAB
code for the conversion is given in Figure 6.3(i).

The YIQ transformation matrix is given as

(6.3-2)

A software program is developed to test the algorithm and
as a template for the hardware system that will be
83

implemented in VHDL. The program is shown in Figure
6.3(i).

Figure 6.3(i) Software and hardware simulation results of RGB2HSI
converter

The top level system architecture is given in the form shown
in Figure 6.3(ii).

Figure 6.3(ii) Software and hardware simulation results of RGB2HSI
converter

RGB to YIQ
converter
R
G
B
Y
I
Q
84

The detailed system is shown in Figure 6.3(iii).
Figure 6.3(iii) Hardware architecture of RGB2YIQ/YUV converter

(a) (b) (c)
Figure 6.3(iv) (a) RGB image (b)Software and (c) hardware
simulation results of RGB2YIQ/NTSC colourspace converter

The transformation matrix for the YUV conversion from
RGB is given as:
85

(6.3-3)

(a) (b) (c)
Figure 6.3(v) (a)RGB image (b)Software and (c) hardware simulation
results of RGB2YUV colourspace converter

Figure 6.3(vi) VHDL code snippet for RGB2YIQ/YUV colour converter
showing coefficients

The coding of the signed, floating point values in VHDL is
achieved with a custom program written in MATLAB to
convert the values from double-precision floating point
86

values to fixed point representation in VHDL. The use of
fixed-point math is necessary since this system must be
feasible and synthesizable in hardware. The RTL level
system description generated from the synthesized VHDL
code is shown in Figure 6.3.

Figure 6.3(vii) (a)RTL top level of RGB to YIQ/YUV colour converter

Based on synthesis results on a Spartan 3 FPGA device,
the device usage is as shown in Table 6.3.

Device Usage percentage
Number of Slices
Number of Slice Flip
Flops
268 out of 1920
373 out of 3840
13%
9%
Number of 4 input LUTs 174 out of 3840 4%
Number of bonded IOBs 51 out of 173 29%
Number of MULT18X18s
Number of GCLKs
9 out of 12
1 out of 8
75%
12%

Table 6.3 Device utilization summary of RGB2YIQ/YUV converter

The minimum period is 8.313ns (Maximum Frequency:
120.293MHz), which is extremely fast. Thus for a 256 x 256
image, using the formula from chapter 3, we get 1835
frames per second. The results of software and VHDL
87

hardware simulation are shown and compared in Figure
6.3(viii).

(a) (b) (c)
Figure 6.3(viii) Software (first row) and VHDL hardware (second row)
simulation results of RGB2YIQ converter showing (a) Y (b) I and (c) Q
channels

This particular implementation takes 8-bit colour values and
can output up to 10 bits though it can easily be scaled to
output 9 bits, where the extra bit is used for the signing
since we expect negative values. The formula for
conversion back to RGB from YIQ is given as :

(6.3-4)

The architecture for this conversion is the same as the
RGB2YIQ, except that the coefficients are different and the
image result from the VHDL hardware simulation of the YIQ
to RGB conversion are shown in Figure 6.3(ix).
88

(a) (b)
Figure 6.3(ix) (a) Software and (b) VHDL hardware simulation results
of YIQ2RGB converter

The colour of the image in Figure 5.3(ix) obtained from the
hardware result (b) is different and the solution to improving
the colour of the output is left as an exercise to the reader.
The next converter to investigate is the RGB to YCbCr
architecture.

Figure 6.3(x) Software and hardware simulation results of RGB2HSI
converter

The equation for the RGB to YCbCr conversion is similar to
that of the YIQ and YUV methods (in that they all involve
simple matrix multiplication) and is shown in (6.3-5).

(6.3-5)

RGB to YCbCr
converter
R
G
B
Y
Cb
Cr
89

The architecture is also similar except that there are
additional adders for the constant integer values.

Figure 6.3(ix) Hardware architecture of RGB2YCbCr converter

The results of the hardware and software simulation are
shown in Figure 6.3)x and its very difficult to differentiate
the two images considering that the hardware result was
generated from fixed point math and using truncated
integers without floating point values unlike in the software
version. However, it is up to the reader to investigate the
conversion back to RGB space and what would be the likely
result in RGB space using the image results from the VHDL
hardware simulation.
90

Figure 6.3(x) (a)Software and (b) VHDL hardware simulation results
of RGB2YCbCr colourspace converter

(a) (b) (c)
Figure 6.3(xi) Software (first row) and VHDL hardware (second row)
simulation results of RGB2YCbCr converter showing (a) Y (b) I and (c)
Q channels

The RGB2YCbCr circuit was rapidly realized by including
three extra adders in the circuit template used in performing
NTSC and YUV conversions and loading a different set of
91

coefficients. Thus, the device utilization results and
operating frequencies are similar.

The ease of hardware implementation of video colour space
conversion is a great advantage when designing digital
hardware circuits for high speed colour video and image
processing where a lot of colour space conversions are
regularly required.

6.4 Non-linear/non-trivial colour spaces
These are the more complex colour transformations, which
are better models for human colour perception. These
colour spaces decouple the colour information from the
intensity and the saturation information in order to preserve
the values after non-linear processing. They include;
- Karhunen-Loeve Colour Coordinate System
- HSV Colour Coordinate System
- HSI/LHS/IHS Colour Coordinate System
We will focus on the HSI and HSV colour spaces in this
section.

The architecture for the conventional RGB2HSI described
in [] is depicted in Figure 6.4(i).

3
B G R
I
+ +
= (6.4-1)

( )
B G R
B G R
S
+ +
=
, , min 3
1 (6.4-2)

92

>
s
=
G B if
G B if
H
u
u
360
(6.4-3)

( ) ( ) | |
( ) ( )( ) | |
+
+
=

B G B R G R
B R G R
2
1 2
1
cos u
(6.4-4)

-
-
-
+
0.5
/ | . |
2

+
cos
-1
(.)
comparator
2
-
MUX
2
/
3
1
-
+
/
3
R
G
B
H
S
I

Figure 6.4(i) RGB2HSI colour converter hardware architecture

The results for HSI implementation are shown in Figure 6.4.
For further information on this implementation, refer to
references at the end of the chapter. The conventional HSI
conversion for a hardware synthesis is extremely difficult to
implement accurately in digital hardware without using
some floating point facilities or large LUTs.

The results of the implementation are shown in Figure
6.4(ii) where the last two images show the results when the
individual channels are processed and recombined after
being output and the latter is before being output.
93

From visual observation, the hardware simulation results
are quite good.

Figure 6.4(ii) Software and hardware simulation results of RGB2HSI
converter

The equations for conversion to HSV space are :
(6.4-5)

(6.4-6)

(6.4-7)

The diagram of the hardware architecture for RGB to HSV
colour space conversion is shown in Figure 6.4(iii). Note the
division operations and the digital hardware constraints and
device a solution for implementing these dividers in a
synthesizable circuit for typical FPGA hardware.
) , , max( B G R V =
) , , min( B G R V S =
+
=
+
=
=
V B for
S
G R
V G for
S
R B
V R for
S
B G
H
, 4
, 2
,
94

-
-
/
+
4
/
MUX
6
/
R
G
B
H
Max
Min
- S
v
-
/
+
2

Figure 6.4(iii) Hardware architecture of RGB2HSV converter

The synthesizable HSV conversion is relatively easy to
implement in digital hardware without floating point or large
LUTs.

The results of VHDL implementation are shown in Figure
6.4(iv). Compare the hue from the HSV to the HSI and
decide which one is better for colour image processing.

(a) (b) (c)
Figure 6.4(iv) (a) Software and hardware simulation results of
RGB2HSV converter for (b) individual component channel processing
and (c) combined channel processing

95

The implementations of these non-linear colour converters
are quite involved and much more complicated than the
VHDL implementations of the other colour conversion
algorithms.
Summary
In this chapter, several types of colour space conversions
were investigated and implemented in VHDL for analysis.
The architectures show varying levels of complexity in the
implementation and can be combined with other
architectures to form a hardware image processing pipeline.
It should also be kept in mind that the architectures
developed here are not the most efficient or compact but
provide a basis for further investigation by the interested
reader.

References

2007.
1981.
- E. Welch, R. Moorhead, and J. K. Owens, "Image Processing
using the HSI Colour space," in IEEE Proceedings of
Southeastcon '91, Williamsburg, VA, USA, 1991, pp. 722-725.
- T. Carron and P. Lambert, "Colour Edge Detector using jointly
Hue, Saturation and Intensity," in Proceedings of the IEEE
96

International Conference on Image Processing (ICIP-94),
Austin, TX, USA, 1994, pp. 977-981.
- Andreadis, "A real-time color space converter for the
measurement of appearance," Journal of Pattern Recognition
vol. 34 pp. 1181-1187, 2001.

97

Circuit Schematics
Appendix A contains the schematic design files and the
device usage summary generated from the synthesized
VHDL code (relevant sample code sections are also
included) using the Xilinx Integrated Software Environment
(ISE) synthesis tools.

.
APPENDIX A
Appendix A
98

.

Figure A1 Demosaicking RTL schematic1

Appendix A
99


Appendix A
100


Appendix A
101


Appendix A
102


Appendix A
103


Appendix A
104


Appendix A
105

Figure A8 Colour Space Converter RTL schematic

106

Creating Projects/Files in VHDL Environment

Appendix B contains the continuation guide of setting up a
project in ModelSim and Xilinx ISE environments.
APPENDIX B
Appendix B
107

Figure B1 Naming a new project

Appendix B
108

Figure B2 Adding a new or existing project file

Appendix B
109

Figure B3 Creating a new file
Appendix B
110

Figure B4 Loading existing files

Appendix B
111

Figure B5 Addition and Selection of existing files

Appendix B
112

Figure B6 Loaded files

Figure B7 inspection of newly created file
Appendix B
113

Figure B8 Inspection of existing file

Figure B9 Compilation of selected files

Appendix B
114

Figure B10 Compiling Loaded files

Figure B11 Successful compilation
Appendix B
115

Figure B12 Code Snippet of newly created VHDL file

Appendix B
116

Figure B13 Adding a new VHDL source in an open project

Appendix B
117

Figure B14 Adding an existing file to an open project

118

VHDL Code
Appendix C lists samples of relevant VHDL code sections.

APPENDIX C
Appendix C
119

example_file.vhd
library IEEE;
----TOP SYSTEM LEVEL DESCRIPTION-----
entity example_file is
port ( ---the collection of all input and output
ports in top level
Clk : in std_logic; ---clock for
synchronization
rst : in std_logic; ---reset signals for new
data
input_port : in bit; ---input port
output_port : out bit); ---output port
end example_file;
---architecture and behaviour of TOP SYSTEM
LEVEL DESCRIPTION in more detail
architecture behaviour of example_file is
---list signals which connect input to output
ports here
---for example
signal intermediate_port : bit := '0'; --
initialize to zero
begin ---start
process(clk, rst) --process which is
triggered by clock or reset pin
begin
if rst = '0' then --reset all output ports
intermediate_port <= '0'; --initialize
output_port <= '0'; --initialize
elsif clk'event and clk = '1' then --operate
on rising edge of clock
intermediate_port <= not(input_port); -
-logical inverter
output_port <= intermediate_port or
input_port; --logical or operation
end if;
end process; --self-explanatory
end behaviour; --end of architectural behaviour

Appendix C
120

colour_converter_pkg.vhd
------------------------------------------------
------------------------------------------------
library IEEE;
use IEEE.numeric_std.all;
package colour_converter_pkg is
--Filter Coefficients---------------------------
--------------------------------------------
---NTSC CONVERSION COEFFICIENTS USING Y, I, Q
------------------------------------------------
------------------------------------------------
constant coeff0 : std_logic_vector(15 downto
0):= "0001001100100011"; -- 0.299
0):= "0010010110010001"; -- 0.587
0):= "0000011101001100"; -- 0.114
0):= "0010011000100101"; -- 0.596
0):= "1110111001110111"; -- -0.274
------------------------------------------------
0):= "1110101101100100"; -- -0.322
0):= "0000110110000001"; -- 0.211
0):= "1101111010000111"; -- -0.523
0):= "0001001111111000"; -- 0.312
----------------------------------------------
--End colour Coefficients-----------------------
------------------------------------------------
constant data_width : integer := 16;
end colour_converter_pkg;
------------------------------------------------
------------------------------------------------
----------------------------

Appendix C
121

colour_converter.vhd
library IEEE;
use work.colour_converter_pkg.all;

entity colour_converter is
generic (data_width: integer:=16);
port (
Clk : in std_logic;
rst : in std_logic;
R, G, B : in integer range 0 to 255;
X, Y, Z : out integer range -511 to 511;
Data_out_valid : out std_logic
);
end colour_converter;

architecture struct of colour_converter is

signal x11, x12, x13, X21, x22, x23, x31, x32,
x33 : std_logic_vector(data_width-1 downto 0);

signal m0, m1, m2, m3, m4, m5, m6, m7, m8 :
signed((data_width*2) downto 0):=(others=>'0');
signal a10, a20, a30 : signed((data_width*2)+1
downto 0):=(others=>'0');

begin
Data_out_valid <= '1';
x11 <= conv_std_logic_vector(R, 16);
x21 <= x11; x31 <= x21;
x12 <= conv_std_logic_vector(G, 16);
x22 <= x12; x32 <= x22;
x13 <= conv_std_logic_vector(B, 16);
x23 <= x13; x33 <= x23;
----multiplication------------------------------
-----------------------------------
m0 <= signed('0'&x11)*signed(coeff0);
Appendix C
122

----addition------------------------------------
-----------------------------
a10 <= (m0(32)&m0)+m1+m2;
a20 <= (m3(32)&m3)+m4+m5;
a30 <= (m6(32)&m6)+m7+m8;
----output--------------------------------------
---------------------------
Data_out_valid <= '1';
X <= conv_integer(a10(24 downto 14));
Y <= conv_integer(a20(24 downto 14));
Z <= conv_integer(a30(24 downto 14));

end struct;

Index
123

Index
adders ................ 27, 47, 89, 90
amplitude ............................. 71
anti-derivative ...................... 74
ASICs .............................. 6, 20
averaging filter ..................... 75
background .......................... 68
Bayer Colour Filter Array ...... 38
Bilinear ................................ 42
binary ....................... 18, 60, 61
black box ............ 12, 13, 45, 55
Calculus......................... 73, 74
Canny .................................. 71
CFA array ...................... 49, 50
CIELAB/ L*a*b* .................... 78
circularly symmetric ............. 75
CMY .............................. 79, 81
CMYK ................. 79, 80, 81, 82
colour .. 2, 8, 37, 38, 40, 42, 51,
53, 59, 62, 69, 78, 79, 80,
82, 85, 86, 87, 88, 91, 92,
93, 94, 95, 120, 121
colour filter array ............ 37, 40
colour image processing ...... 94
Colour Image Processing .. 1, 2,
23, 36, 68, 77, 95
colour space ....... 69, 79, 82, 91
Colour space conversions
additive, subtractive, linear,
non-linear .................... 78
contrast... 25, 59, 62, 65, 66, 75
convolution ........... 2, 30, 38, 40
CPLDs ............................. 6, 20
demosaicking ..... 25, 37, 38, 39,
40, 42, 44, 45, 46, 47, 50,
51, 53, 55, 56, 57, 58, 74
derivative filters ......... 70, 73, 74
display range ....................... 73
double-precision floating point
........................................ 85
dynamic range .......... 60, 62, 66
edge detection ... 25, 26, 70, 71,
77
Edge-sensing Bilinear .......... 42
Edge-sensing Bilinear 2 ........ 42
embossing............................ 73
FIFOs ................................... 47
filter kernel4, 28, 30, 32, 34, 35,
42, 66
fixed-point ................ 21, 61, 86
flip flops.............. 26, 27, 29, 47
floating point . 21, 61, 85, 89, 92,
94
Floating point calculations .... 21
floating point cores ............... 21
foreground ........................... 68
Fourier transform .................... 3
Fourier Transform................... 3
FPGAs ................................... 6
frame rate ............................ 56
Frequency . 2, 23, 36, 56, 65, 86
Domain .............................. 2
Gamma Correction ... 37, 62, 63
Gaussian 30, 32, 43, 44, 75, 76,
77
global ............................... 4, 59
gradients ........................ 71, 74
hardware ................................ 5
hardware description language
..................................... vi, 6
hardware description
languages (HDLs) .............. 6
hardware IP cores ................ 66
hardware simulation . 52, 53, 89
HDL ................................. 6, 18
high boost filtering ................ 77
high frequency components . 64,
73, 75
Histogram Clipping 37, 62, 63
histogram equalization.......... 78
Homomorphic filter ... 66, 67, 68
HSI/LHS/IHS ........................ 91
HSV ......................... 91, 93, 94
ICC profiles .......................... 80
IEEE libraries ....................... 11
Illuminance/Reflectance ....... 66
Index
124

image vi, 1, 2, 3, 4, 5, 6, 18, 19,
20, 23, 25, 27, 28, 30, 32,
34, 37, 38, 39, 40, 42, 43,
44, 49, 50, 51, 53, 54, 56,
57, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71,
72, 73, 74, 75, 76, 78, 79,
80, 81, 84, 85, 86, 87, 88,
89, 91, 94, 95
image contrast
enhancement/sharpening. 25
Image Enhancement ... 1, 59, 69
Image Reconstruction ............ 1
image scene .................. 66, 68
integration ............................ 74
interpolating filter ................. 38
Karhunen-Loeve .................. 91
kernel coefficients .......... 64, 74
Laplacian ............ 44, 64, 70, 77
line buffers ..................... 27, 47
linear 21, 22, 35, 39, 44, 45, 47,
53, 57, 59, 64, 65, 78, 91, 95
linear interpolation................ 57
logarithm transform ........ 60, 65
logarithmic .......... 62, 64, 65, 69
look-up-table (LUT) .............. 22
low frequency components... 74
LUT .......................... 22, 60, 61
mat2gray ............................. 73
MATLAB ....vi, 6, 19, 20, 23, 36,
44, 45, 46, 63, 69, 72, 73,
77, 82, 85, 95
maximum frequency ............. 56
mean filters .......................... 75
median .................................. 5
median and variance filters .... 5
ModelSim .... 6, 7, 8, 14, 15, 16,
18, 54, 106
morphological ...................... 78
multipliers ...................... 27, 48
multiply-accumulate . 28, 30, 32,
34
natural logarithm .................. 60
Nearest Neighbour ............... 42
neighbourhood ............ 1, 64, 75
kernel ................................. 4
Non-linear .............................. 4
open source ........................... 6
partial differential equations .. 70
passband ............................. 73
pattern and texture recognition
........................................ 78
Pattern Recognition
interpolation ..................... 42
Pixel Binning ........................ 42
pixel counter ......................... 47
Prewitt ..... 70, 71, 72, 73, 74, 77
raw format ............................ 43
Relative Edge-sensing Bilinear
........................................ 42
restoration/noise
removal/deblurring ........... 25
RGB . 39, 40, 44, 51, 67, 68, 70,
78, 79, 82, 84, 85, 86, 87,
88, 89, 93
RGB colour .. 39, 40, 44, 51, 70,
78, 79, 82
RGB2HSI ...... 83, 88, 91, 92, 93
Roberts .......................... 70, 71
ROM .............................. 22, 60
segmentation ............. 2, 25, 78
separable 28, 30, 47, 75, 76, 77
sharpness ...................... 59, 75
shift registers .................. 27, 47
signal . 2, 13, 14, 21, 22, 26, 46,
54, 55, 119, 121
simulation .... vi, 6, 9, 10, 50, 51,
52, 53, 54, 64, 68, 71, 72,
73, 74, 76, 83, 84, 85, 87,
88, 89, 90, 93, 94
SIMULINK ................ 19, 46, 47
Smooth Hue Transition ......... 42
smoothing ..... 44, 70, 74, 75, 77
Sobel ...... 44, 70, 71, 72, 74, 77
Spatial ................ 2, 4, 5, 25, 69
Domain .............................. 2
Spatial domain ................... 4, 5
Spatial domain filtering ........... 5
spatial filtering ...4, 6, 25, 26, 65
spatially varying .................... 65
Index
125

sRGB ................................... 79
Symmetric filter .................... 32
textio.................................... 18
tone ..................................... 59
Unified Model Language (UML)
........................................ 19
un-sharp masking ................ 64
Unsharp masking ................. 64
unsigned ........................ 54, 73
Variable Number Gradients .. 42
Verilog ................................... 6
VHDL ..... vi, 2, 6, 12, 13, 14, 16,
18, 19, 20, 21, 23, 25, 37,
44, 45, 48, 49, 54, 55, 56,
57, 59, 64, 66, 67, 68, 69,
70, 71, 72, 73, 74, 76, 77,
78, 82, 83, 85, 86, 87, 88,
89, 90, 94, 95, 97, 106, 115,
116, 118
weak edges .......................... 74
window generator 26, 27, 29, 47
Xilinx .... 6, 7, 14, 15, 16, 17, 18,
23, 24, 36, 54, 56, 68, 69,
96, 97, 106
Xilinx Project Navigator . 14, 15,
17
YCbCr ............................ 82, 88
YIQ NTSC ............................ 82
YUV ..................................... 82

VHDL Image Processing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VHDL Image Processing

Uploaded by

Copyright:

Available Formats

First Edition

of M N dimensions with spatial coordinates, x and

is the processed output

and in the discrete time domain as ,

are the original, interpolated stage 1 and

is the demosaicked output

are interpolation kernels usually

is the enhanced image and

is the frequency domain filter.

are constant coefficients.

You might also like