Professional Documents
Culture Documents
eBACS
?
D. Bernstein, T. Lange
Set of scripts written in Perl aimed at an AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms
Why Athena?
"The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. Athena Goddess o known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest. from "Athena, Greek Goddess of Wisdom and Craftsmanship"
Designers of ATHENa
Venkata Vinny MS CpE student Ekawat Ice MS CpE student Marcin PhD ECE student Xin PhD ECE student Michal PhD exchange PhD ECE student from Slovakia student Rajesh
Database query
Ranking of designs
ATHENa Server
Database Entries
0
configuration files
testbench
constraint files
configuration files
heuristic adaptive optimization strategies aimed at maximizing selected performance measures (e.g., speed, area, speed/area ratio, power, cost, etc.)
10
heuristic adaptive optimization strategies aimed at maximizing selected performance measures (e.g., speed, area, speed/area ratio, power, cost, etc.)
11
Minimum clock
12 12
13
ATHENa Applications
single_run: - one set of options placement_search - one set of options - multiple starting points for placement exhaustive_search - multiple sets of options - multiple starting points for placement - multiple requested clock frequencies
SHA-1 Results
Throughput [Mbit/s]
Architectures
15
1800
1600
1400
1200
Mb/s
1000
800
600
400
200
FPGA family
16
Ideas (1)
Select several representative FPGA platforms with significantly different properties e.g., different vendor Xilinx vs. Altera process - 90 nm vs. 65 nm LUT size - 4-input vs. 6-input optimization - low-cost vs. high-performance
Use ATHENa to characterize all SHA-3 candidates and SHA-2 using these platforms in terms of the target performance metrics (e.g. throughput/area ratio)
17
Ideas (2)
Calculate ratio SHA-3 candidate performance vs. SHA-2 performance (for the same security level) Calculate geometrical average over multiple platforms
18
65 nm
45 nm 40 nm Spartan 6
Virtex 5
Virtex 6
Version
Xilinx ISE 10.1 Xilinx WebPACK 11.1 Xilinx WebPACK 11.3
Low-cost
All up to Virtex 5 Smallest up to Virtex 5 Smallest up to Virtex 5 Smallest Spartan 6, Virtex 6
High-performance
All up to Virtex 5 Smallest up to Virtex 5 Smallest up to Virtex 5 Smallest Spartan 6, Virtex 6
Low-cost
Cyclone Cyclone II Cyclone III Cyclone IV
Mid-range
Highperformance Stratix
Stratix II
Arria I Arria II
Low-cost
Cyclone IV none, Cyclone III all Cyclone IV none, Cyclone III all Cyclone IV none, Cyclone III all Cyclone IV smallest, Cyclone III all
Mid-range
Arria GX all Arria II GX none Arria GX all Arria II GX none Arria GX all Arria II GX none Arria GX all Arria II GX smallest
Highperformance
Stratix II smallest, Stratix III none Stratix I, II, III smallest Stratix I, II, III smallest Stratix I, II, III all Stratix IV none
Quartus 8.1
23
Mbit/s
for Throughput
ns
for
Latency
Allows for easy cross-comparison among implementations in software (microprocessors), FPGAs (various vendors), ASICs (various libraries)
25
But how to define and measure throughput and latency for hash functions?
Time to hash N blocks of message = Htime(N, TCLK) = Initialization Time(TCLK) + N * Block Processing Time(TCLK) + Finalization Time(TCLK)
Latency = Time to hash ONE block of message = Htime(1, TCLK) = = Initialization Time + Block Processing Time + Finalization Time
Block size
But how to define and measure throughput and latency for hash functions?
Initialization Time(TCLK) = cyclesI TCLK
Block Processing Time(TCLK) = cyclesP TCLK Finalization Time(TCLK) Block size from place & route report (or experiment) = cyclesF TCLK
from specification
27
In graphs
Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, with n=0,1, 2, 3, 2048, 4096 In tables Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg
Time(4096) Time(2048)
Performance for long message =
2048
28
29
30
31
Deliverables (1)
1. Detailed block diagram of the Datapath with names of all signals matching VHDL code
[electronic version a bonus]
2. Interface with the division into the Datapath and the Controller [electronic version]
3. ASM charts of the Controller, and a block diagram of connections among FSMs (if more than one used)
[electronic version a bonus]
4. RTL VHDL code of the Datapath, the Controller, and the Top-Level Circuit 5. Updated timing and area analysis formulas for timing confirmed through simulation
32
Deliverables (2)
6. Report on verification highest level entity verified for functional correctness
Functional simulation Post-synthesis simulation Timing simulation [bonus] Name of entity Testbench used for verification Result of verification, incorrect behavior, possible source of error
33
Deliverables (3)
7. Results of benchmarking using ATHENa
Entire core or the highest level entity verified for correct functionality Xilinx Spartan 3, Virtex 4, Virtex 5 Three methods of testing
Single_run Placement_search [cost table = 1, 11, 21] Exhaustive_search [cost_table = 31, 41, 51; speed or area; two sets of requested frequencies]
Results generated by ATHENa Your own graphs and charts Observations and conclusions
34
Altera Cyclone II, Stratix II, Cyclone III, Arria I, Stratix III Three methods of testing
Single_run Placement_search [seed = 1, 1000, 2000] Exhaustive_search [seed = 3000, 4000, 5000; speed or area; two sets of requested frequencies]
Results generated by ATHENa Your own graphs and charts Observations and conclusions
35
37
Composition of Students
4 GWU PhD candidates
14 international students
38
After Grading
1. Summary of results published on the course web page 2. Selected students invited to develop articles/reports to be posted on the - ATHENa web page - SHA-3 Zoo Web Page 3. Unification, generalization and optimization of codes by Ice, myself, and other students
4. Presentation to NIST, conference submissions, presentation at the Second SHA-3 Conference in Santa Barbara in August 2010.
39