Professional Documents
Culture Documents
Jon Beecroft, David Hewson, Fred Homewood, Duncan Roweth and Ed Turner.
Device Architecture
Abstract—The Elan5 is a single chip network processor The Elan5 has seven Packet Processing Engines, with
which acts as a host adapter for high speed network one assigned to device management, two dedicated to
protocols. It is capable of handling both 10Gb Ethernet,
link input and four being available for output packet
and proprietary Quadrics protocols developed for ultra low
latency communication in High Performance Computing generation and processing of requests from remote Elans.
applications. In order to provide flexibility in the choice of The PPEs are identical with the exception of the two
protocols the device is implemented as an array of identical connected to the network links, which have additional
RISC processors, which can be dedicated to tasks such as input and output buffers. Although the input buffers are
input packet handling and host memory DMA handling. owned by their respective PPEs, each of the output
buffers can be accessed by any PPE.
Index Terms—10GbE, HPC, Ethernet, QsNet
Each PPE consists of; a dual issue 500MHz RISC
INTRODUCTION processor core optimized for data communication tasks, a
necessitates a much greater degree of programmability in Host I/F Local Memory Local Functions Bridge
The interface is designed to support many outstanding Libelan library. Device independent binary interface providing optimized
communications primitives, e.g. message passing, put/get, collectives.
reads to service requirements of the multiple PPEs. Up
to 32 concurrent register load operations can be
supported. For register loads, the data is not transferred libelan3 libelan4 libelan5
to the processor until the PCI Express bus CRC has been
checked. This requires local buffering in the PCI Express Device specific firmware and thread code
interface. To avoid having to buffer the larger DMA
loads in the PCI Express Interface, they are allowed to
complete directly to the DMA buffers, but are tagged as Figure 2: Elan5 software stack
unchecked until the final CRC is validated. The DMA Elan5 supports secure multi-user access to QsNet
buffer score-boarding uses this tag to block store through use of job specific capabilities that describe the
operations using that data until it has been both loaded rights of each users and network context numbers
and checked. assigned to each packet by the outputting PPE.
1
we write a DMA descriptor to the source Elan instead of T
M
DMA read
DMA
4 Link
M Buffers
the data. U
Input PPE Out
1
DMA
Queue Buffer In
3 3
C
L
U Buffer QsNetIII
C Tag Switch
H L Match
MPI Recv 2 Get request
O DMA Link U
S Acks 3
T H
Out O Link
Queue
M DMA S
M DMA read T Input PPE Out
Buffers
U In
M
M Buffer In
U 4 4
QsNetIII
Switch
Conclusion
The architecture of the Elan5 enables a single device to
support a range of different communication protocols.
Performing the protocol handling in firmware allows the
device to provide a high level of communications
processing off-load, without the complexity and
verification challenges of a custom hardware
implementation The architecture has been designed so
that future variants can scale the number of links, and the
number of packet processing engine. They will also
utilize, higher bandwidth, host interfaces.