You are on page 1of 4

Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support AIM The main aim of the project

is to design Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support. !ABS"#AC"$ Binary64 arithmetic is rapidly becoming inadequate to cope with todays large scale computations due to an accumulation of errors. Therefore! binary"#$ arithmetic is now required to increase the accuracy and reliability of these computations. %t the same time! an ob&ious trend emerging in modern processors is to e'tend their instruction sets by allowing single instruction multiple data ()*+,e'ecution! which can significantly accelerate the data parallel applications. To address the combined demands mentioned abo&e! this paper presents the architecture of a low cost binary"#$ floating point fused multiply add (.+%- unit with )*+, support. The proposed .+% design can e'ecute a binary"#$ .+% e&ery other cycle with a latency of four cycles! or two binary64 .+%s fully pipelined with a latency of three cycles! or four binary/# .+%s fully pipelined with a latency of three cycles. 0e use two binary64 .+% units to support binary"#$ .+% which requires much less hardware than a fully pipelined binary"#$ .+%. The presented binary"#$ .+% design uses both segmentation and iteration hardware &ectori1ation methods to trade off performance! such as throughput and latency! against area and power. 2ompared with a standard binary"#$ .+% implementation! the proposed .+% design has /3 percent less area and #4 percent less dynamic power dissipation.
V.Mallikarjuna (Project manager) Mobile No: +91-8297578555. Branch !: "#$ ra%a$ &

ISO: 9001- 2008 CERTIFIED COMPANY


Na'()r

Propose% Metho% *n this architecture! the proposed .+% design can e'ecute a binary"#$ .+% e&ery other cycle with a latency of four cycles! or two binary64 .+%s fully pipelined with a latency of three cycles! or four binary/# .+%s fully pipelined with a latency of three cycles. 0e use two binary64 .+% units to support binary"#$ .+% which requires much less hardware than a fully pipelined binary"#$ .+%. A%&antage .ollowing the approach! the presented binary"#$ .+% design uses both segmentation and iteration hardware &ectori1ation methods to trade off performance! such as throughput and latency! against area and power. 2ompared with a standard binary"#$ .+% implementation! the proposed .+% design has /3 percent less area and #4 percent less dynamic power dissipation.

V.Mallikarjuna (Project manager)

ISO: 9001- 2008 CERTIFIED COMPANY


Na'()r

Mobile No: +91-8297578555. Branch !: "#$ ra%a$ &

BL'C( DIA)#AM

.ig. " bloc5 diagram of the proposed )*+, .+% unit

V.Mallikarjuna (Project manager)

ISO: 9001- 2008 CERTIFIED COMPANY


Na'()r

Mobile No: +91-8297578555. Branch !: "#$ ra%a$ &

"''LS6 7*88*97 *): 4.#i! +;,:8 )*+ 6.4c #*F*#*+C* <"= >.?. +ontoye! :. @o5ene5! and ).8. >unyon! ,esign of the *B+ >*)2 )ystemA6333 .loating Boint :'ecution Cnit! *B+ D. >esearch E ,e&elopment! &ol. /4! pp. F4 G3! "443. <#= ).?. >aman! H. Bent5o&s5i! and D. ?esha&a! *mplementing )treaming )*+, :'tensions on the Bentium *** Brocessor! *::: +icro! &ol. #3! no. 4! pp. 4G FG! DulyA%ug. #333. </= 2. ?eltcher! ?. +cIrath! %. %hmed! and B. 2onway! The %+, ;pteron Brocessor for +ultiprocessor )er&ers! *::: +icro! &ol. #/! no. #! pp. 66 G6! +ar.A%pr. #33/. <4= ). 2hatterjee and 8.>. Bachega! ,esign and :'ploitation of a @igh Berformance )*+, .loating Boint Cnit for Blue IeneA8! *B+ D. >esearch and ,e&elopment! &ol. 44! pp. /GG /4#! #33F. <F= .. ,inechin and I. Hillard! @igh Brecision 9umerical %ccuracy in Bhysics >esearch! 9uclear *nstruments and +ethods in Bhysics >esearch! &ol. FF4! pp. #3G #"3! #336.

V.Mallikarjuna (Project manager)

ISO: 9001- 2008 CERTIFIED COMPANY


Na'()r

Mobile No: +91-8297578555. Branch !: "#$ ra%a$ &

You might also like