Professional Documents
Culture Documents
of
FPGAs
David Maliniak, Electronic Design Automation Editor
Design
Tradeoffs Abound
in FPGA Design
F
ield-programmable gate arrays
(FPGAs) arrived in 1984 as an
alternative to programmable
Understanding device
logic devices (PLDs) and ASICs.
As their name implies, FPGAs
types and design flows
offer the significant benefit of being readily
programmable. Unlike their forebearers in
is key to getting the
the PLD category, FPGAs can (in most cas-
es) be programmed again and again, giving
most out of FPGAs
designers multiple opportunities to tweak their circuits. the programming scheme.
There’s no large non-recurring engineering (NRE) cost Just a few years ago, the largest FPGA was measured
associated with FPGAs. In addition, lengthy, nerve- in tens of thousands of system gates and operated at 40
wracking waits for mask-making operations are MHz. Older FPGAs often cost more than $150 for the
squashed. Often, with FPGA development, logic design most advanced parts at the time. Today, however, FPGAs
begins to resemble software design due to the many itera- offer millions of gates of logic capacity, operate at 300
tions of a given design. Innovative design often happens MHz, can cost less than $10, and offer integrated func-
with FPGAs as an implementation platform. tions like processors and memory (Table 1).
F
But there are some downsides to FPGAs as well. The PGAs offer all of the features needed to imple-
economics of FPGAs force designers to balance their rel- ment most complex designs. Clock management
atively high piece-part pricing compared to ASICs with is facilitated by on-chip PLL (phase-locked loop)
the absence of high NREs and long development cycles. or DLL (delay-locked loop) circuitry. Dedicated
They’re also available only in fixed sizes, which matters memory blocks can be configured as basic single-port
when you’re determined to avoid unused silicon area. RAMs, ROMs, FIFOs, or CAMs. Data processing, as
embodied in the devices’ logic fabric, varies widely. The
What are FPGAs? ability to link the FPGA with backplanes, high-speed bus-
FPGAs fill a gap between discrete logic and the smaller es, and memories is afforded by support for various single-
PLDs on the low end of the complexity scale and costly ended and differential I/O standards. Also found on
custom ASICs on the high end. They consist of an array today’s FPGAs are system-building resources such as high-
of logic blocks that are configured using software. Pro- speed serial I/Os, arithmetic modules, embedded proces-
grammable I/O blocks surround these logic blocks. Both sors, and large amounts of memory.
are connected by programmable interconnects (Fig. 1). Initially seen as a vehicle for rapid prototyping and
The programming technology in an FPGA determines the emulation systems, FPGAs have spread into a host of
type of basic logic cell and the interconnect scheme. In applications. They were once too simple, and too costly,
turn, the logic cells and interconnection scheme deter- for anything but small-volume production. Now, with the
mine the design of the input and output circuits as well as advent of much larger devices and declining per-part costs,
FPGAs are finding their way off the prototyping bench and board or system test, and then reprogrammed to perform
into production (Table 2). their main task. On the flip side, though, SRAM-based
FPGAs must be reconfigured each time their host system is
Comparing FPGA Architectures powered up, and additional external circuitry is required to
FPGAs must be programmed by users to connect the chip’s do so. Further, because the configuration file used to pro-
resources in the appropriate manner to implement the gram the FPGA is stored in external memory, security issues
desired functionality. Over the years, various technologies concerning intellectual property emerge.
have emerged to suit different requirements. Some FPGAs Antifuse-based FPGAs aren’t in-system programmable,
can only be pro-
grammed once. These Table 1: KEY RESOURCES AVAILABLE IN THE LARGEST DEVICES FROM MAJOR FPGA VENDORS
devices employ anti-
fuse technology. Features Xilinx Virtex II Pro Altera Stratix Actel Axcelerator Lattice ispXPGA
Flash-based devices
can be programmed Clock DCM PLL PLL SysCLOCK PLL
and reprogrammed management Up to 12 Up to 12 Up to 8 Up to 8
again after debug-
ging. Still others can Embedded BlockRAM TriMatrix memory Embedded RAM SysMEM blocks
be dynamically pro- memory blocks Up to 10 Mbits Up to 10 Mbits Up to 338 kbits Up to 414 kbits
grammed thanks to
Data processing Configurable logic Logic elements and Logic modules (C- Based on
SRAM-based technol-
blocks and 18-bit by embedded multipli- Cell and R-Cell) programmable
ogy. Each has its 18-bit multipliers ers Up to 10,000 functional unit
advantages and disad-
vantages (Table 3). Up to 125,000 logic Up to 79,000 LEs and R-Cells and 21,000 Up to 3844 PFUs
M
ost mod- cells and 556 multipli- 176 embedded mul- C-Cells
ern er blocks tipliers
FPGAs
are Programmable I/Os SelectI/O Advanced I/O Advanced I/O sup- SysI/O
based on SRAM con- support port
figuration cells, which
offer the benefit of Special features Embedded PowerPC DSP blocks PerPin FIFOs for SysHSI for high-
unlimited reprogram- 405 cores bus applications speed serial
mability. When pow- High-speed differ- interface
ered up, they can be RocketI/O multi-giga- ential I/O and inter-
bit transceiver face standards sup-
configured to perform port
a given task, such as a
I
n a sense, flash-based FPGAs fulfill the promise of FPGAs in At one time, design entry was performed in the form of
that they can be reprogrammed many times. They’re non- schematic capture. Most designers have moved over to hard-
volatile, retaining their configuration even when powered ware description languages (HDLs) for design entry. Some
down. Programming is done either in-system or with a pro- will prefer a mixture of the two techniques. Schematic-based
grammer. In some cases, IP security can be achieved using a multi- design-capture tools gave designers a great deal of control
bit key that locks the configuration data after programming. over the physical placement and partitioning of logic on the
But flash-based FPGAs require extra process steps above and device. But it’s becoming less likely that designers will take
beyond standard CMOS technology, leaving them at least a gen- that route. Meanwhile, language-based design entry is faster,
eration behind. Moreover, the many pull-up resistors result in but often at the expense of performance or density.
F
high static power consumption. or many designers, the choice of whether to use
FPGAs can also be characterized as having either fine-, medi- schematic- or HDL-based design entry comes down
um-, or coarse-grained architectures. Fine-grained architectures to their conception of their design. For those who
boast a large number of relatively simple logic blocks. Each logic think in software or algorithmic-like terms, HDLs are
block usually contains either a two-input logic function or a 4-to- the better choice. HDLs are well suited for highly complex
1 multiplexer and a flip-flop. Blocks can only be used to imple- designs, especially when the designer has a good handle on
ment simple functions. But fine-grained architectures lend them- how the logic must be structured. They can also be very useful
for designing smaller functions when you haven’t the time or
1. Functional Blocks inclination to work through the actual hardware implementa-
tion.
Just about all FPGAs include a regular, programmable, and flexible architecture of logic blocks
surrounded by input/output blocks on the perimeter. These functional blocks are linked together
by a hierarchy of highly versatile programmable interconnects. 2. The Big Picture
A “big picture” look at an FPGA design flow shows the major steps in the process:
design entry, synthesis from RTL to gate level, and physical design. Place and
Input/output blocks route is done using the FPGA vendors’ proprietary tools that account for the
devices’ architectures and logic-block structures.
Logic blocks
Modify design
Programmable
interconnects
C
oarse-grained architectures consist of relatively large log- Achieved
ic blocks often containing two or more lookup tables and timing? No
two or more flip-flops. In most of these architectures, a
four-input lookup table (think of it as a 16 x 1 ROM)
implements the actual logic. Yes
Done!
A
third option for design entry, state-machine entry, timing and resource usage are still unknowns.
works well for designers who can see their logic design The next step following RTL simulation is to convert the RTL
as a representation
series Table 3: ADVANTAGES/DISADVANTAGES OF VARIOUS FPGA TECHNOLOGIES of the design
of states that the into a bit-stream
system steps Feature SRAM Antifuse Flash file that can be
through. It shines Reprogrammable? Yes (in-system) No Yes (in-system or offline) loaded onto the
when designing Reprogramming speed Fast Not 3X SRAM FPGA. The
somewhat simple (including erasure) applicable interim step is
functions, often in Volatile? Yes No No (but can be if required) FPGA synthesis,
the area of system External configuration file? Yes No No which translates
control, that can Good for prototyping? Yes No Yes the VHDL or
be clearly repre- Verilog code
Instant-on? No Yes Yes
sented in visual into a device
IP security Poor Very good Very good
formats. Tool netlist format
support for finite Size of configuration cell Large (six transistors) Very small Small (two transistors) that can be
state-machine Power consumption High Low Medium understood by a
entry is limited, Radiation hardness? No Yes No bit-stream con-
though. verter.
Some designers The synthesis
approach the start of their design from a level of abstraction process can be broken down into three steps. First, the HDL
higher than HDLs, which is algorithmic design using the C/C++ code is converted into device netlist format. Then the resulting
programming languages. A number of EDA vendors have tool file is converted into a hexadecimal bit-stream file, or .bit file.
flows supporting this design style. Generally, algorithmic design This step is necessary to change the list of required devices and
has been thought of as a interconnects into hexa-
tool for architectural 3. Go With The Flow decimal bits to down-
exploration. But increas- The implementation flow for FPGAs begins with synthesis of the HDL design description into a gate-level netlist. load to the FPGA. Last-
ingly, as tool flows Accounting for user-defined design constraints on area, power, and speed, the tool performs various optimiza- ly, the .bit file is
emerge for C-level syn- tions before creating the netlist that’s passed on to place-and-route tools. downloaded to the
thesis, it’s being accepted physical FPGA. This
as a first step on the road final step completes the
to hardware implemen- FPGA synthesis proce-
tation. Language input (VHDL/Verilog) dure by programming
A
fter design HDL files Initial optimization
the design onto the
entry, the physical FPGA.
I
design is sim- Timing analysis t’s important to fully
ulated at the constrain designs
Timing optimization
register-transfer level before synthesis (Fig.
Design
Constraints
(RTL). This is the first of 3). A constraint file
several simulation stages, is an input to the synthe-
because the design must Placement sis process just as the
be simulated at successive RTL code itself. Con-
levels of abstraction as it straints can be applied
Implement
F
ollowing synthesis, device implementation begins. ed system board a very sticky issue. All too often, an FPGA is
After netlist synthesis, the design is automatically soldered to a pc board and it doesn’t function as expected or,
converted into the format supported internally by worse, it doesn’t function at all. That can be the result of errors
the FPGA vendor’s place-and-route tools. Design- caused by manual placement of all those pins, not to mention
rule checking and optimization is performed on the incoming the board-level timing issues created by a complex FPGA.
netlist and the software partitions the design onto the avail- More than ever, designers must strongly consider an integrat-
able logic resources. Good partitioning is required to achieve ed flow that takes them from conception of the FPGA through
high routing completion and high performance. board design. Such flows maintain complete connectivity
Increasingly, FPGA designers are turning to floorplanning between the system-level design and the FPGA; they also do so
after synthesis and design partitioning. FPGA floorplanners between design iterations. Not only do today’s integrated FPGA-
work from the netlist hierarchy as defined by the RTL cod- to-board flows create the schematic connectivity needed for veri-
ing. Floorplanning can help if area is tight. When
possible, it’s a good idea to place critical logic in 4. Simulation Stages
separate blocks.
After partitioning and floorplanning, the FPGA simulation occurs at various stages of the design process: after RTL design, after synthesis, and once again
after implementation. The latter is a final gate-level check, accounting for actual logic and interconnect delays,
placement tool tries to place the logic blocks to of logic functionality.
achieve efficient routing. The tool monitors rout-
ing length and track congestion while placing the
blocks. It may also track the absolute path delays
to meet the user’s timing constraints. Overall, the
FPGA gate
process mimics PCB place and route. FPGA Testbench library
Functional simulation is performed after syn- RTL design
thesis and before physical implementation. This
step ensures correct logic functionality. After
implementation, there’s a final verification step
with full timing information. After placement
and routing, the logic and routing delays are HDL simulator
back-annotated to the gate-level netlist for this Place and route
final simulation. At this point, simulation is a
much longer process, because timing is also a fac-
tor (Fig. 4). Often, designers substitute static tim-
ing analysis for timing simulation. Static timing
Synthesis
analysis calculates the timing of combinational
paths between registers and compares it against
the designer’s timing constraints.
O
nce the design is successfully verified
and found to meet timing, the final step
is to actually program the FPGA itself. At the com- fication and layout of the board, but they also document which
pletion of placement and routing, a binary pro- signal connections are made to which device pins and how these
gramming file is created. It’s used to configure the device. No map to the original board-level bus structures.
I
matter what the device’s underlying technology, the FPGA ntegrated flows for FPGAs make sense in general, consider-
interconnect fabric has cells that configure it to connect to ing that FPGA vendors will continue to introduce more com-
the inputs and outputs of the logic blocks. In turn, the cells plex, powerful, and economical devices over time. An inte-
configure those logic blocks to each other. Most programma- grated third-party flow makes it easier to re-target a design
ble-logic technologies, including the PROMs for SRAM- to different technologies from different vendors as conditions
based FPGAs, require some sort of a device programmer. warrant.
Devices can also be programmed through their configuration
ports using a set of dedicated pins.
BASICS
of
FPGAs Design