Professional Documents
Culture Documents
What is a Soft Processor What is the NIOS II? Architecture for NIOS II, what are the implications
NIOS II IDE
TigerSHARC Architecture
NIOS II Architecture
-thirty two 32-bit general registers, six 32-bit control registers -variable cache based on how much FPGA space you have -ALU- 32bit two input to one input, does shifts, logic and arithmetic. Shifter is not separate like TigerSHARC
Avalon Interface
-separate address, data and control lines -up to 1024-bit data width transfer, can be set to any width (not power of 2) -one transfer per clock cycle.
Multi-cycle instructions Cache misses Data dependencies (2 cycles between calculating and using result)
Hardware multiply
Can use different options for multiplier (at the processor design stage)
No h/w multiply (saves FPGA gates) Speed depends on algorithm Use embedded multipliers (if FPGA has
those)
1-5 cycles (depends on FPGA)
Compare to TigerSHARC
No support for parallel instructions No support for SIMD operations Multicycle instructions stall the pipeline
All the above limitations can be overcome by using FPGA space unoccupied by the processor itself
Speed analysis
0 1 2 3
Loop: movi r4,8 ldw r2,0(r6) ldw r3,0(r7) addi r4,r4,-1
4
5 6 7 8 9
addi r6,r6,4
mul r2,r2,r3 addi r7,r7,-4 stall
coeffPt++
data = data * coeff dataPt-data stall waiting for multiplication result output += data will mispredict 2 times in the beginning, and 1 time in the end of the loop (waste 3 cycles each time)
Speed analysis
9 cycles per iteration except the first two (branch predicted not taken) and the last (branch predicted taken) those will be 9+3=12 cycles 1 data stall can remove by moving instruction from line 4 to 7 Speed: 8 cycles * (N-3) + 11 cycles * 3 = 8*(N-3)+33 cycles For 1024-tap FIR: 8201 cycles Clock cycle is 3 times longer (200MHz vs 600MHz)
Speed comparison
Worse than unoptimized assembly, but no hardware acceleration used, so this is not that bad
Hardware Acceleration
Profiling tool in Eclipse can show how long each function takes If function takes too long, it can be sped up by
Custom instructions
Hardware Acceleration
Hardware Acceleration is to take the function and transform it into FPGA circuitry
Hardware Acceleration
Can be done using C2H compiler from Altera Trades off Logic Size for Speed up.
Table 1. User Application Results Example Algorithm Autocorrelation Bit Allocation Convolution Encoder Fast Fourier Transform (FFT) High Pass Filter Matrix Rotate RGB to CMYK RGB to YIQ Speed Increase (vs. Nios II CPU) 41.0x 42.3x 13.3x 15.0x 42.9x 73.6x 41.5x 39.9x System fMAX (Mhz) 115 110 95 85 110 95 120 110 System Resource Increase (1) 124% 152% 133% 208% 181% 106% 84% 158%
Conclusion
Soft Processors such as the NIOSII offers another alternative in the embedded system scene. The NIOSII offers the advantage of added configurability, and customization that blur the line between FPGAs and DSPs
References
[1] http://www.fpgajournal.com/articles/behere.htm Describes an FPGA-DSP project based on Altera Nios [2] http://www.altera.com/products/ip/processors/nios2/ni2-index.html Official Nios II page [3] http://www.hunteng.co.uk/dsp-fpga.htm DSP or FPGA? What is better when? [4] http://www.hunteng.co.uk/pdfs/tech/DSP1736FPGA.pdf Article from Xilinx about FPGA DSPs [5] http://www.niosforum.com Community forum for NIOS [6] http://www.altera.com/literature/hb/nios2/n2cpu_nii5v1.pdf NIOSII Processor Handbook Altera Corporation [7] http://www.altera.com/literature/manual/mnl_avalon_spec.pdf Avalon Memory-Mapped Interface Specifications Altera Corporation [8] http://www.analog.com/en/prod/0,2877,ADSP%252DTS201S,00.html ADSP-TS201S 500/600 MHz TigerSHARC Processor with 24 Mbit on-chip embedded DRAM