Arithmetic Coprocessor Coprocessor Basic

Arithmetic Coprocessor
Coprocessor basic:
• The 80x87 is able to multiply, divide, add, subtract, find the

sqrt and calculate transcendental functions and logarithms.
• Data types include

- 16-, 32- and 64-bit signed integers
- 18-digit BCD data and
- 32-,64- and 80-bit (extended precision) floating-point
numbers.
• The operation performed by the 80x87 generally executes

much faster than equivalent operation written in
microprocessor normal instruction.
Advanced Microprocessor 1
Data Formats for the Arithmetic Coprocessor:
Signed Integers:-
• 16 bit ( word ) – range -32768 to +32767
• 32bit ( short integer ) – range -2x10+9 to + 2x10+9
• 64 bit ( long integer ) – range -9x10+18 to +9x10+18
3 forms of signed integers-

15 0
s magnitude
31 0
s magnitude
s magnitude
63 0
• The directives dw, dd and dq are used for declaring signed

integer storage
- dw to define word
- dd to define short integer
- dq to define long integer
 for every microprocessor their will be a coprocessor
8086 8087
8088 8087
80186 80187
& so on
Binary Coded Decimal ( BCD ):-
• BCD form requires 80 bits of memory.
• Each number is stored as an 18-digit packed integer in 9 bytes

of memory as 2 digit per byte, 10th byte for sign bit.
S 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
• Both positive & negative numbers are stored in true form

ex :
DATA1 DT 20 ; 20 as bcd
00 00 00 00 00 00 00 00 00 20
DATA2 DT -220 ; -220 as bcd
80 00 00 00 00 00 00 00 02 20
DATA3 DT 50000 ; 50000 as bcd
00 00 00 00 00 00 00 05 00 00
Floating point:-
• Hold signed integers, fractions & mixed numbers.
• Floating point numbers has 3 parts
- Sign bit
- Biased exponent
- Significand
• Intel family arithmetic coprocessor supports 3 types of

floating point numbers
- Short (32 bit) : single precision, with a bias of 7FH

- Long (64 bit) : double precision, with a bias of 3FFH
- Temporary (80 bit) : extended precision, with a bias of
3FFFH
31 30 23 22 0
s exp fraction
63 62 52 51 0
s exp fraction
79 78 64 63 0
s exp 1 fraction
Converting Decimal to Floating-point form:
- Convert the decimal number into binary.

- Normalize the binary number.
- Calculate the biased exponent.
- Store the number in the floating-point format.
Ex : convert decimal to floating-point
 100.2510
1 - convert to binary
100 ->1100100
.25 -> 01
1100100.01
2 - normalize binary
1100100.01 = 1.10010001x26
3 - calculate bias expo
7FH(127) for single precision
add expo with precision
110 + 01111111 ( 6 + 127)
10000101
4 - floating-point number
sign -> 0
expo -> 10000101
significand -> 10010001000000000000000
 -100.2510
1- convert to binary
100 ->1100100
.25 -> 01
-1100100.01
2 - normalize binary
-1100100.01 = -1.10010001x26
3 - calculate bias expo
7FH(127) for single precision
add expo with precision
110 + 01111111 ( 6 + 127)
10000101
4 - floating-point number
sign -> 1
expo -> 10000101
significand -> -10010001000000000000000
Special Rules:
- The number 0 is stored as all 0s (except for the sign bit).

- +/- infinity is stored as logic 1s in the exponent, with a
significand of all 0s. Sign bit is used to represent +/-

infinity. - A NAN (not-a-number) is an invalid floating-point
result that
has all 1s in the exponent with a Significand that is NOT all
zeros.
Converting Floating-point to Decimal:
- Separate the sign-bit, biased exponent and significand.

- Convert the biased exponent into a true exponent by
subtracting the bias.
- Write the number as a normalized binary number.
- Convert it to a de-normalized binary number.
- Convert the de-normalized binary number to decimal.
Ex: convert floating-point to decimal:

 1- separate the
sign = 0
expo = 10000011
significand = 10010010000000000000000
2 - convert the biased to true expo
100 <- 10000011 – 01111111 ( 7FH , 127 for single preci)
3 - normalized binary number
1.1001001 x 24
4 - convert to de-normalized binary number
11001.001
5 - convert into decimal
25.125
 1 - separate the
sign = 1
expo = 10000011
significand = 10010010000000000000000
2 - convert the biased to true expo
100 <- 10000011 – 01111111 ( 7FH , 127 for single preci)
3 - normalized binary number
1.1001001 x 24
4 - convert to de-normalized binary number
11001.001
5 - convert into decimal
-25.125
The 8087 Architecture:
• 8087 designed to operate concurrently with microprocessor
• 8087 executes 68 different instructions
• Both microprocessor & coprocessor can execute their

respective instruction simultaneously or concurrently
• The numeric or arithmetic coprocessor is a special purpose

microprocessor, especially designed to execute arithmetic &
transcendental operation
• Microprocessor intercepts & executes normal instruction

set, Coprocessor intercepts & executes its instruction
Internal Structure of the 80x87:
Status
Address
• Control unit ( CU ):- interface the coprocessor to the
microprocessor data bus. if instruction is ESC then coprocessor
executes, if not microprocessor will executes it.
• Numeric execution unit ( NEU ) :-

- Unit is responsible for executing all coprocessor instruction
- Has 8 register stack, hold arithmetic instruction & results
- Also other register status, tag, control & exception pointers
- Stack within the coprocessor contain 8-registers each 80

bits wide, contain 80 bit extended-precision floating-point
number
- Coprocessors converted data are moved between memory &

coprocessor register stack.
Status register:
• Reflects overall operation of the coprocessor.
• Coprocessor is accessed by executing, FSTSW

instructions which stores the content of status register into
word of memory
• The coprocessor/microprocessor communications are

carried out thru I/O ports
B – busy bit: indicate coprocessor is busy, can be checked

by testing status register or by FWAIT instruction
• C3 to C0 – condition code bit : indicate the condition of the
coprocessor
• TOP - top of stack (ST) : bit indicate the current register

address as the top of stack
• ES – error summary : bit is set if any unmasked error bit

(PE,UE, OE, ZE, IE ) is set. In 8087 coprocessor the error
summary also caused a coprocessor interrupt
• PE – precision error : result exceed the precision
• UE – underflow : non-zero result , which is too small to

represent it current precision selected
• OE – overflow : result is too large. If error is masked,

coprocessor enters infinite time
• ZE – zero error : divisor is zero, dividend is a non infinity or non
zero number.
• DE – denormalized error : least one of the operands is

denormalized
• IE – invalid error : indicate stack underflow/overflow,

indeterminate form or the use of a NAN as an operand. Sqrt of –
ve number
Control register:
- selects – precision, rounding control & infinity control
- masks & unmasked the exception bits that corresponds

to the rightmost 6 bits of the status register
- FLDCW instruction is used to load a value onto the

control register Advanced Microprocessor 17
Infinity control Invalid

0 – projective Operation
1 – affine mask
Precision control
Denormalized
00 – single
Operand mask
Rounding control 01 – reserved
00 – round nearest or even 10 – double
Division by zero
01 – round down towards 11 - extended
mask
minus infinity
10 – round up towards plus
infinity
11 – chop or truncate
towards zero
• IC – infinity control : affine allows +ve or –ve infinity & projective
assumes infinity is unsigned
• RC – rounding control : determine the type of rounding
• PC – precision control : sets the precision of the results
• Exception masks : check error indicated by the exception affects

the error bit in the status register , if logic 1 present in the one of
the exception control bits , corresponding bit in the status register
is masked off
fdiv DATA1
fstsw ax ;Copy status reg to AX
test ax, 4 ;Test bit position 2
jnz DIVIDE_ERROR
fcom DATA1 ;Compare DATA1 to ST0 and set status.
fstsw ax
sahf ;Copy status bits to flags.
je ST_EQUAL
jb ST_BELOW
ja ST_ABOVE
Tag register :
-Indicates the contents of each location in the coprocessor stack
- program can view the tag register by storing the coprocessor

Environment using FSTENV, FSAVE, FRSTOR
- 00 – VALID, 01 – ZERO, 10 – INVALID or INIFINITY, 11 - EMPTY
TAG 7 TAG 6 TAG 5 TAG 4 TAG 3 TAG 2 TAG 1 TAG 0
Instruction Set:
• executes over 68 different instructions
• coprocessor uses the data bus for data transfer during

coprocessor instruction , microprocessor uses during
normal instruction
Types of instruction :-
- data transfer instructions
- arithmetic instructions
- comparison instructions
- transcendental operations
- constant operation
- coprocessor control instructions
i) Data transfer instruction:

- floating-point
- signed-integer
- BCD
- pentium pro thru pentium4 FCMOV instruction
coprocessor stores the data in 80-bit extended precision floating

point number
Floating – point data transfer
FLD (Load Real) :

- Loads floating-point data to Stack Top (ST).
- Stack pointer is then decremented by 1.
- Data can be retrieved from memory, or another stack
position.
Ex :
FLD st2 ;Copies contents of register two to ST
top of the stack is register 0 when coprocessor is reset
or initialized
FLD data7 ;copies the content memory location data7 to

the ;top of stack
size of the transfer is automatically determined by the
assembler thru directives
FST ( store real) :
- Stores a copy of the top of the stack into memory or

another coprocessor register.
- Rounding occurs when the storage operation
completes according to the control register
- copy instruction
FSTP ( floating point store and pop)
- Stores a copy of the top of the stack into memory or
another coprocessor register
- pop the data from the top of stack
- a removal instruction
FXCH ( exchange )
- exchanges the content of register with top of stack
ex : FXCH st2 ; exchanges top of the stack with register 2
Integer data transfer instruction
- FILD ( load integer)

- FIST ( store integer)
- FISTP ( store integer and pop)
While transferring the data , coprocessor automatically converts
extended floating-point number to integer data.
BCD data transfer instruction
- FBLD – loads the top of stack with BCD memory data

- FBSTP – stores top of the stack and does a pop
Pentium pro thru pentium4 instruction

FCMOV
- contains condition
- if condition true, copies the source to destination
- condition are checked for either an ordered or
unordered
- testing for NAN and denormalized numbers are not
checked
FCMOVB - move if below, FCMOVE - move if equal
FCMOVBE - move if below or equal, FCMOVU - move if unordered
FCMOVNB - move if not below, FCMOVNE - move if not equal
FCMOVNBE - move if not below or equal, FCMOVNU - move if not
ordered
ii) Arithmetic instruction:
- addition, subtraction, multiplication, division,

calculating square roots
- arithmetic related – scaling, rounding, absolute value,
changing sign
-Stack addressing mode
Addressing modes is restricted to use ST
Mode Form Example (stack top) and ST1.
Stack ST(1),ST FADD
-The source operand is
Register ST,ST(n) FADD ST,ST(2) ST while the destination
ST(n),ST FADD ST(2),ST operand is ST1.
Register pop ST(n),ST FADDP ST(3),ST
Memory operand FADD data2 -After the operation, the

source is popped,
leaving the dest. at ST.
Stack addressing mode,
• stack, uses top of the stack as the source operand & next to the
top as destination.
• later, top is popped out, result is the top of the stack

ex :
FADD – adds ST and ST1, result will store in ST1
FSUB – subtract ST from ST1, result will be ST,
FSUBR, reverse instruction – subtracts ST1 from ST, result
in ST
to compute reciprocal
FDIVR – result stored in ST
Register addressing mode,
• MUST use ST as one of the operands.
• The other operand can be any register, including ST0 which is ST.
Note that the destination can be either ST or STn.
• unlike stack addressing, non-popping versions can be used.
Memory addressing mode,
• always uses ST as the destination, coprocessor stack oriented

Machine
Arithmetic operation,
The following letters are used to additionally qualify the

operation:
• P: Perform a register pop after the operation, FADD and

FADDP.
• R: Reverse mode for subtraction and division.
• I: Indicates that the memory operand is an integer. I appears as

the second letter in the instruction, e.g., FIADD, FISUB, FIMUL,
FIDIV.
Arithmetic related operations,
• FSQRT: Finds the square root of operand at ST. Leave result there.
Check IE bit for an invalid result, e.g., the operand was negative
using FSTSW AX, and TEST AX, 1.
• FSCALE: Adds contents of ST1 (interpreted as an integer) to the

exponent of ST. value of ST must be between 2-15 and 2+15
• FPREM1: Performs modulo division of ST by ST1. The resultant

remainder is found at ST.
• FRNDINT: Rounds ST to an integer.
• FXTRACT: Decomposes ST into an unbiased exponent and a

significand. Extracted significand is at ST and unbiased exponent at
ST1.
• FABS: Change sign of ST to positive.
• FCHS: Invert sign of ST.
iii) Comparison instruction:
-Instruction examines the data at the top of the stack with other,
return the result of the comparison in status register condition
code c3 to c0 .
• FCOM: Compares ST with an memory or register operand. FCOM

by itself compares ST and ST1.
• FCOMP/FCOMPP: Compare and pop once or twice.
• FICOM/FICOMP: Compare ST with integer memory operand and

optionally pop the stack.
• FTST: Compare ST with 0.0.
• FXAM: Exam ST and modify CC bits to indicate whether contents

are positive, negative, normalized, etc.
• FCOMI/FUCOMI: pentium’s, same as FCOM, has one additional

feature moves the floating point flags register to flag register
FNSTSW AX, and SAHF.
iv) Transcendental operations
• FPTAN – finds partial tangent of y/x = tanθ, θ value on top of the

stack must be between 0 and n/4 for 87 & 287 , must less than 263
for 387 – pentium4
• FPATAN – partial arctangent θ

•F2XM1: Compute 2x -1
Function equation
10y 2y x log2 10
εy 2y x log2 ε
xy 2y x log2 x
•FSIN/FCOS : sin or cosine , result found in ST
•FSINCOS : sin & cosine, ST – sine & ST1 – cosine
•FYL2X: Compute Ylog2X, X – ST & Y – ST1, result on top of the

stack, X range between 0 and infinity & Y range between
•-infinity and 0
•FYL2XP1: Compute Ylog2(X + 1)

V - Constant operation
• coprocessor instruction set include opcodes that return

constants to the top of the stack.
- FLDZ: Store +0.0 to ST.
- FLD1: Store +1.0 to ST.
- FLDPI: Store pi to ST.
- FLDL2T: Store log210 to ST.
- FLDL2E: Store log2e to ST.
- FLDLG2: Store log102 to ST.
- FLDLN2: Store loge2 to ST.

VI . Coprocessor Control instruction
-Control instruction for initialization, exception handling & task

switching
 FINIT/ FNINIT : performs a reset operation, sets register0 as

top of the stack
round, busy,
 FSETPM : changes the addressing mode of the coprocessor

to the protected addressing mode
 FLDCW : loads the control register with the word addressed

by the operands
FSTCW/FNSTCW : store the control register into the word

sized memory operand
• FSTSW AX/ FNSTSW AX : copies the contents of the control
register to AX ( not for 8087)
• FCLEX/FNCLEX : clear the error flags in the status register

and also busy flag
• FSAVE/FNSAVE : writes the entire state of the machine to

memory
• FRSTOR : restores the state of the machine from memory
• FSTENV/FNSTENV : stores the environment of the

coprocessor – real mode or protected mode
• FLDENV : reloads the environment
• FINCST : increments the stack pointer FDECSTP : decrement the stack

pointer Advanced Microprocessor 37
• FFREE : frees a register content
• FNOP : floating point coprocessor NOP
• FWAIT : causes the microprocessor to wait for the

coprocessor to finish an operation, it should be used before
the microprocessor access memory data that are affect by the
coprocessor
Coprocessor instruction:
- lists of the instruction for all coprocessor from 8087 thru

pentium 4, with number of clocking periods required to
execute each instruction.
General:
reg = floating point register, st(0), st(1) ... st(7)

Mem = memory address
mem32 = memory address of 32-bit item
FX = pairs with FXCH

NP = no pairing
Instruction clock cycles
• F2XM1 Compute 2x-1
8087 287 387 486 Pentium

310-630 310 -630 211-476 140-279 13-57 NP
• FABS Absolute value
8087 287 387 486 Pentium

10-17 10-17 22 3 1 FX
• FADD Floating point add

• FADDP Floating point add and pop
variations/
operand 8087 287 387 486 Pentium
fadd 70-100 70-100 23-34 8-20 3/1 FX
fadd mem32 90-120 90-120 24-32 8-20 3/1 FX
+EA
fadd mem64 95-125 95-125 29-37 8-20 3/1 FX
+EA
faddp 75-105 75-105 23-31 8-20 3/1 FX
• FBLD Load BCD

mem (290-310) 290-310 266-275 70-103 48-58 NP
+EA
• FBSTP Store BCD and pop

8087 287 387 486 Pentium
(520-540)+EA 520-540 512-534 172-176 148-154 NP
• FCHS Change sign

8087 287 387 486 Pentium
10-17 10-17 24-25 6 1 FX
• FNCLEX Clear exceptions, no wait

variations 8087 287 387 486 Pentium
fclex 2-8 2-8 11 7 9 NP
fnclex 2-8 2-8 11 7 9 NP
The wait version may take additional cycles
• FCOM Floating point compare

• FCOMP Floating point compare and pop
• FCOMPP Floating point compare and pop twice
variations/
fcom reg 40-50 40-50 24 4 4/1 FX
fcom mem32 (60-70) 60-70 26 4 4/1 FX
+EA
fcom mem64 (65-75) 65-75 31 4 4/1 FX
+EA
fcomp 42-52 42-52 26 4 4/1 FX
fcompp 45-55 45-55 26 5 4/1 FX
FCOS Floating point cosine (387+)

8087 287 387 486 Pentium
- - 123-772 257-354 18-124 NP
Additional cycles required if operand > pi/4 (~3.141/4 =~.785)
•FDISI Disable interrupts (8087 only, others do fnop)

•FNDISI Disable interrupts, no wait (8087 only, others do fnop)
fdisi 2-8 2 2 3 1 NP
fndisi 2-8 2 2 3 1 NP
•FDIV Floating divide

•FDIVP Floating divide and pop
variations/
fdiv reg 193-203 193-203 88-91 73 39 FX
fdiv mem32 (215-225) 215-225 89 73 39 FX
+EA
fdiv mem64 (220-230) 220-230 94 73 39 FX
+EA
fdivp 197-207 197-207 91 73 39 FX
•FDIVR Floating divide reversed

•FDIVRP Floating divide reversed and pop
variations/
fdivr reg 194-204 194-204 88-91 73 39 FX
fdivr mem32 (216-226) 216-226 89 73 39 FX
+EA
fdivr mem64 (221-231) 221-231 94 73 39 FX
+EA
fdivrp 198-208 198-208 91 73 39 FX
•FENI Enable interrupts (8087 only, others do fnop)

•FNENI Enable interrupts, nowait (8087 only, others do fnop)
Variations 8087 287 387 486 Pentium
feni 2-8 2 2 3 1 NP
fneni 2-8 2 2 3 1 NP
• FFREE Free register

8087 287 387 486 Pentium
9-16 9-16 18 3 1 NP
• FIADD Integer add

Mem16 (102-137) 102-137 71-85 20-35 7/4 NP
+EA
mem32 (108-143) 108-143 57-72 19-32 7/4 NP
+EA
•FINIT Initialize floating point processor

•FNINIT Initialize floating point processor, no wait
finit 2-8 2-8 33 17 16 NP
fninit 2-8 2-8 33 17 12 NP
•FICOM Integer compare

•FICOMP Integer compare and pop
variations/
ficom mem16 (72-86) 72-86 71-75 16-20 8/4 NP
+EA
ficom mem32 (78-91) 78-91 56-63 15-17 8/4 NP
+EA
ficomp mem16 (74-88) 74-88 71-75 16-20 8/4 NP
+EA
ficomp mem32 (80-93) 80-93 56-63 15-17 8/4 NP
+EA
• FIMUL Integer multiply

Operand 8087 287 387 486 Pentium
mem16 (124-138) 124-138 76-87 23-27 7/4 NP
+EA
mem32 (130-144) 130-144 61-82 22-24 7/4 NP
+EA Advanced Microprocessor 47
•FIDIV Integer divide

•FIDIVR Integer divide reversed
variations/
fidiv mem16 (224-238) 224-238 136-140 85-89 42 NP
+EA
fidiv mem32 (230-243) 230-243 120-127 84-86 42 NP
+EA
fidivr mem16 (225-239)225-239 135-141 85-89 42 NP
+EA
fidivr mem32 (231-245) 231-245 121-128 84-86 42 NP
+EA
• FILD Load integer

mem16 (46-54)+EA 46-54 61-65 13-16 3/1 NP
mem32 (52-60)+EA 52-60 45-52 9-12 3/1 NP
mem64 (60-68)+EA 60-68Advanced
56-67 10-18
Microprocessor 3/1 NP 48
•FIST Store integer
•FISTP Store integer and pop
variations/
fist mem16 (80-90)+EA 80-90 82-95 29-34 6 NP
fist mem32 (82-92)+EA 82-92 79-93 28-34 6 NP
fistp mem16 (82-92)+EA 82-92 82-95 29-34 6 NP
fistp mem32 (84-94)+EA 84-94 79-93 28-34 6 NP
fistp mem64 (94-105)+EA 94-105 80-97 28-34 6 NP
•FISUB Integer subtract

•FISUBR Integer subtract reversed
variations/
fisub mem16 (102-137)+EA 102-137 71-85 20-35 7/4 NP
fisubr mem32 (108-143)+EA 108-143 57-82 19-32 7/4 NP
• FINCSTP Increment floating point stack pointer
8087 287 387 486 Pentium
6-12 6-12 21 3 1 NP
• FLD Floating point load

reg 17-22 17-22 14 4 1 FX
mem32 (38-56)+EA 38-56 20 3 1 FX
mem64 (40-60)+EA 40-60 25 3 1 FX
mem80 (53-65)+EA 53-65 44 6 3 NP
Load floating point constants
• FLDCW Load control word

mem16 (7-14)+EA 7-14 19 4 7 NP
•FLDZ Load constant onto stack, 0.0

•FLD1 Load constant onto stack, 1.0
•FLDL2E Load constant onto stack, logarithm base 2 (e)
•FLDL2T Load constant onto stack, logarithm base 2 (10)
•FLDLG2 Load constant onto stack, logarithm base 10 (2)
•FLDLN2 Load constant onto stack, natural logarithm (2)
•FLDPI Load constant onto stack, pi (3.14159...)

fldz 11-17 11-17 20 4 2 NP
fld1 15-21 15-21 24 4 2 NP
fldl2e 15-21 15-21 40 8 5/3 NP
fldl2t 16-22 16-22 40 8 5/3 NP
fldlg2 18-24 18-24 41 8 5/3 NP
fldln2 17-23 17-23 41 8 5/3 NP
fldpi 16-22 16-22 40 8 5/3 NP
•FLDENV Load environment state

mem (35-45)+EA 35-45 71 44/34 37/32-33 NP
cycles for real mode/protected mode
•FMUL Floating point multiply

•FMULP Floating point multiply and pop
variations/
fmul reg s 90-105 90-105 29-52 16 3/1 FX
fmul reg 130-145 130-145 46-57 16 3/1 FX
fmul mem32 (110-125)+EA 110-125 27-35 11 3/1 FX
fmul mem64 (154-168)+EA 154-168 32-57 14 3/1 FX
fmulp reg s 94-108 94-108 29-52 16 3/1 FX
fmulp reg 134-148 134-148 29-57 16 3/1 FX
s = register with 40 trailing zeros in fraction
•FNOP no operation
8087 287 387 486 Pentium
10-16 10-16 12 3 1 NP
•FPATAN Partial arctangent

8087 287 387 486 Pentium
250-800 250-800 314-487 218-303 17-173
•FPREM Partial remainder

•FPREM1 Partial remainder (IEEE compatible, 387+)
Variations 8087 287 387 486 Pentium
fprem 15-190 15-190 74-155 70-138 16-64 NP
fprem1 - - 95-185 72-167 20-70 NP
•FPTAN Partial tangent

8087 287 387 486 Pentium
30-540 30-540 191-497 200-273 17-173 NP
Additional cycles required if operand > pi/4 (~3.141/4 =~.785)
•FRNDINT Round to integer

8087 287 387 486 Pentium
16-50 16-50 66-80 21-30 9-20 NP
•FRSTOR Restore saved state

variations/
frstor mem (197-207)+EA 197-207 308 131/120 75-95/70 NP
frstorw mem - - 308 131/120 75-95/70 NP
frstord mem - - 308 131/120 75-95/70 NP
cycles for real mode/protected mode
•FSAVE Save FPU state
•FSAVEW Save FPU state, 16-bit format (387+)
•FSAVED Save FPU state, 32-bit format (387+)
•FSAVE Save FPU state, no wait
•FSAVEW Save FPU state, no wait, 16-bit format (387+)
•FSAVED Save FPU state, no wait, 32-bit format (387+)

fsave (197-207)+EA 197-207 375-376 154/143 127-151/124 NP
fsavew 375-376 154/143 127-151/124 NP
fsaved 375-376 154/143 127-151/124 NP
fnsave (197-207)+EA 197-207 375-376 154/143 127-151/124 NP
fnsavew 375-376 154/143 127-151/124 NP
Fnsaved 375-376 154/143 127-151/124 NP
Cycles for real mode/protected mode
•FSCALE Scale by factor of 2

8087 287 387 486 Pentium
32-38 32-38 67-86 30-32 20-31 NP
FSETPM Set protected mode (287 only, 387+ = fnop)

8087 287 387 486 Pentium
- 2-8 12 3 1 NP
•FSIN Sine (387+)

•FSINCOS Sine and cosine (387+)
fsin - - 122-771 257-354 16-126 NP
fsincos - - 194-809 292-365 17-137 NP
Additional cycles required if operand > pi/4 (~3.141/4 = ~.785)
•FSQRT Square root
8087 287 387 486 Pentium
180-186 180-186 122-129 83-87 70 NP
•FST Floating point store

•FSTP Floating point store and pop
variations/
fst reg 15-22 15-22 11 3 1 NP
fst mem32 (84-90)+EA 84-90 44 7 2 NP
fst mem64 (96-104)+EA 96-104 45 8 2 NP
fstp reg 17-24 17-24 12 3 1 NP
fstp mem32 (86-92)+EA 86-92 44 7 2 NP
fstp mem64 (98-106)+EA 98-106 45 8 2 NP
fstp mem80 (52-58)+EA 52-58 53 6 3 NP
•FSTCW Store control word

•FNSTCW Store control word, no wait
variations/
fstcw mem 12-18 12-18 15 3 2 NP
fnstcw mem 12-18 12-18 15 3 2 NP
•FSTENV Store FPU environment

•FSTENVW Store FPU environment, 16-bit format (387+)
•FSTENVD Store FPU environment, 32-bit format (387+)
•FNSTENV Store FPU environment, no wait
•FNSTENVW Store FPU environment, no wait, 16-bit format (387+)
•FNSTENVD Store FPU environment, no wait, 32-bit format (387+)
variations/
fstenv mem (40-50)+EA 40-50 103-104 67/56 48-50 NP
fstenvw mem 103-104 67/56 48-50 NP
fstenvd mem 103-104 67/56 48-50 NP
fnstenv mem (40-50)+EA 40-50 103-104 67/56 48-50 NP
fnstenvw mem 103-104 67/56 48-50 NP
fnstenvd mem 103-104 67/56 48-50 NP
Cycles for real mode/protected mode
•FSTSW Store status word
•FNSTSW Store status word, no wait
variations/
fstsw mem 12-18 12-18 15 3 2 NP
fstsw ax - 10-16 13 3 2 NP
fnstsw mem 12-18 12-18 15 3 2 NP
fnstsw ax - 10-16 13 3 2 NP
•FSUB Floating point subtract

•FSUBP Floating point subtract and pop
variations/
fsub reg 70-100 70-100 26-37 8-20 3/1 FX
fsub mem32 (90-120)+EA 90-120 24-32 8-20 3/1 FX
fsub mem64 (95-125)+EA 95-125 28-36 8-20 3/1 FX
fsubp reg 75-105 75-105 26-34 8-20 3/1 FX
•FSUBR Floating point reverse subtract
•FSUBRP Floating point reverse subtract and pop
variations/
fsubr reg 70-100 70-100 26-37 8-20 3/1 FX
fsubr mem32 (90-120)+EA 90-120 24-32 8-20 3/1 FX
fsubr mem64 (95-125)+EA 95-125 28-36 8-20 3/1 FX
fsubrp reg 75-105 75-105 26-34 8-20 3/1 FX
FTST Floating point test for zero

8087 287 387 486 Pentium
38-48 38-48 28 4 4/1 FX
FWAIT Wait while FPU is executing

8087 287 387 486 Pentium
4 3 6 1-3 1-3 NP
•FXAM Examine condition flags
8087 287 387 486 Pentium
12-23 12-23 30-38 8 21 NP
FXCH Exchange floating point registers

8087 287 387 486 Pentium
10-15 10-15 18 4 0-1 *
• * FCXH is pairable in the V pipe with all FX pairable instructions
•FXTRACT Extract exponent and significand

8087 287 387 486 Pentium
27-55 27-55 70-76 16-20 13 NP
•FYL2X Compute Y * log2(x)

•FYL2XP1 Compute Y * log2(x+1)
fyl2x 900-1100 900-1100 120-538 196-329 22-111 NP
fyl2xp1 700-1000 700-1000 257-547 171-326 22-103 NP
MMX Technology
• Multi Media eXtensions ( MMX )
• Designed to accelerate multimedia and communication

applications
- motion video, image processing, audio synthesis,
speech synthesis and compression, video conferencing, 2D
and 3D graphics
• Includes new instructions and data types to significantly

improve application performance
• Exploits the parallelism inherent in many multimedia and

communications algorithms
• Maintains full compatibility with existing operating systems

and applications
MMX Technology
Data Types:-
• packed data types

- 8 packed , consecutive 8 bit bytes
- 4 packed , consecutive 16 bit words
- 2 packed , consecutive 32 bit double words
- format have consecutive memory addresses & uses
little endian form
63 56 55 48 47 40 39 32 31 24 23 16 15 87 0
63 48 47 32 31 16 15 0
63 32 31 0
63 0
MMX Technology
• MMX Technology registers have the same format as a 64 bit
quantity in memory
• has 2 data access modes

- 64 bit access mode, for 64 bit memory & register transfer
occur between floating point coprocessor registers
- 32 bit access mode, for 32 bit memory & register transfer
occur between microprocessor registers
MM7
MM6
MM5
MM4
MM3
MM2
MM1
MM0
TAGs
MMX Technology
• adds 57 new instructions to the instructions set of pentium –

pentium4
Instruction Set :-
- arithmetic
- comparison
- conversion
- logical
- shift
- data transfer
• instruction types are similar to microprocessor , MMX

instruction uses packed data types
Arithmetic instruction:
addition, subtraction, multiplication & a special multiplication
with an addition. Advanced Microprocessor 66
MMX Technology
• addition are performed

- packed signed or unsigned packed bytes ( B )
- packed words ( W )
- packed double word data ( D )
• any carry or borrow is generated are dropped
Comparison instruction:
• 2 comparison PCMPEQ( equal) & PCMPGT( greater than)
• compared bytes, words or double word
• do not change the microprocessor flag bits, return 1’s for true & 0’s
for false
• if MM2 compared with MM1 , if equal Least significant byte of MM2

contains FFH otherwise 00H
MMX Technology
Conversion Instruction:
• 2 comparison instruction PACK as signed and unsigned , & PUNPCK as

unpack high data and unpack low data
• packed signed or unsigned packed bytes ( B )

- packed words ( W )
- packed double word data ( D )
• B,W & D – must be used in combination

- WB word to byte
- DW double to word
• in conversion, if unsigned word does not fit , then the

destination byte becomes an FFH
MMX Technology
Logical instruction:
• AND, OR, NAND & XOR
• instruction do not have size extension
• perform bit wise operations on all 64 bits of the data
Shift instruction:
• logical shift & arithmetic shift right instruction
• performed on word (W), double word (D) & quad word (Q)
MMX Technology
Data transfer instruction:
• data transfer done – register to register or register and memory
• only rightmost 32 bits are copied , no instruction to transfer

leftmost 32 bit ,
• to transfer leftmost 32 bit, shift right
EMMS instruction:
• empty MMX state, all the tags in the floating point unit , floating
point register are listed as empty
• this instruction should be executed before the return

instruction at the end of MMX procedure or subsequent floating
point operation will cause interrupt error, crashing window,
application
MMX Technology
• EMMS – empty MMX state
Ex : EMMS
• MOVED – move double word
Ex: MOVED MM3, EAX  reg to xreg

MOVED EAX, MM4  xreg to reg
MOVED MM3, DATA  mem to xreg
MOVED DATA1, MM3  xreg to mem
MOVEQ – move quadword
Ex: MOVEQ MM3, MM1  xreg to xreg

MOVEQ MM3, DATA  mem to xreg
MOVEQ DATA1, MM3  xreg to mem
MMX Technology
• PACKSSDW – pack signed doubleword to word
Ex :
PACKSSDW MM1,MM2  xreg to xreg
PACKSSDW MM1,DATA  mem to xreg
• PACKSSWB – pack signed word to byte
Ex :
PACKSSWB MM1,MM2  xreg to xreg
PACKSSWB MM1,DATA  mem to xreg
• PACKUSDW – pack unsigned word to byte
Ex :
PACKUSDW MM1,MM2  xreg to xreg
PACKUSDW MM1,DATA  mem to xreg
MMX Technology
• PADD – add with truncation : byte, word & doubleword
Ex :
PADDB MM1,MM3  xreg to xreg
PADDW MM1,MM3
PADDD MM1,MM3
PADDB MM1, DATA  mem to xreg

PADDW MM1, DATA
PADDD MM1,DATA
• PADDS – add with signed saturation : byte & word
Ex :
PADDSB MM1,MM3  xreg to xreg
PADDSW MM1,MM3
PADDSB MM1, DATA  mem to xreg

PADDSW MM1, DATA
MMX Technology
• PADDUS – add with unsigned saturation : byte & word
Ex :
PADDUSB MM1,MM3  xreg to xreg
PADDUSW MM1,MM3
PADDUSB MM1, DATA  mem to xreg

PADDUSW MM1, DATA
• PAND – And
•EX :
PAND MM1,MM2  xreg to xreg
PAND MM1,DATA  mem to xreg
• PAND – Nand
EX :
PANDN MM1,MM2  xreg to xreg
PANDN MM1,DATA  mem to xreg
MMX Technology
• PCMPEQU – compare for equality
Ex :
PCMPEQUB MM1,MM2  xreg to xreg
PCMPEQUW MM1,MM2
PCMPEQUD MM1,MM2
PCMPEQUB MM1,DATA  mem to xreg
PCMPEQUW MM1,DATA
PCMPEQUD MM1,DATA
PCMPGT – compare for greater than

Ex :
PCMPGTB MM1,MM2  xreg to xreg
PCMPGTW MM1,MM2
PCMPGTD MM1,MM2
PCMPGTB MM1,DATA  mem to xreg
PCMPGTW MM1,DATA
PCMPGTD MM1,DATA
MMX Technology
• PMADD – multiply and add
Ex :
PMADD MM1,MM4  xreg to xreg
PMADD MM1,DATA  mem to xreg
• PMULH – multiplication - high

Ex :
PMULH MM1,MM4  xreg to xreg
PMULH MM1,DATA  mem to xreg
• PMULL – multiplication - low

Ex :
PMULL MM1,MM4  xreg to xreg
PMULL MM1,DATA  mem to xreg
• POR – or
POR MM1,MM4  xreg to xreg
POR MM1,DATA  mem to xreg
MMX Technology
• PSLL – shift left :word, doubleword and quadword
Ex :
PSLLW MM1,MM3  xreg to xreg
PSLLD MM1,MM3
PSLLQ MM1,MM3
PSLLW MM1,DATA  mem to xreg

PSLLD MM1,DATA
PSLLQ MM1,DATA
PSLLW MM1,5  xreg by count

PSLLD MM1,4
PSLLQ MM1,7
MMX Technology
• PSRA – shift arithmetic right :word, doubleword and quadword
Ex :
PSRAW MM1,MM3  xreg to xreg
PSRAD MM1,MM3
PSRAQ MM1,MM3
PSRAW MM1,DATA  mem to xreg

PSRAD MM1,DATA
PSRAQ MM1,DATA
PSRAW MM1,5  xreg by count

PSRAD MM1,4
PSRAQ MM1,7
MMX Technology
• PSRL – shift right :word, doubleword and quadword
Ex :
PSRLW MM1,MM3  xreg to xreg
PSRLD MM1,MM3
PSRLQ MM1,MM3
PSRLW MM1,DATA  mem to xreg

PSRLD MM1,DATA
PSRLQ MM1,DATA
PSRLW MM1,5  xreg by count

PSRLD MM1,4
PSRLQ MM1,7
MMX Technology
• PSUB – subtraction with truncation : byte, word & doubleword
Ex :
PSUBB MM1,MM3  xreg to xreg
PSUBW MM1,MM3
PSUBD MM1,MM3
PSUBB MM1, DATA  mem to xreg

PSUBW MM1, DATA
PSUBD MM1,DATA
• PSUBS – subtraction with signed saturation: byte, word &

doubleword
Ex :
PSUBSB MM1,MM3  xreg to xreg
PSUBSW MM1,MM3
PSUBSD MM1,MM3
PSUBSB MM1, DATA  mem to xreg

PSUBSW MM1, DATA Advanced Microprocessor 80
PSUBSD MM1,DATA
MMX Technology
• PSUBUS – subtraction with unsigned saturation
: byte, word & doubleword
Ex :
PSUBUSB MM1,MM3  xreg to xreg
PSUBUSW MM1,MM3
PSUBUSD MM1,MM3
PSUBUSB MM1, DATA mem to xreg

PSUBUSW MM1, DATA
PSUBUSD MM1,DATA
• PXOR – exclusive Or
Ex :
PXOR MM1,MM3  xreg to xreg
PXOR MM4,DATA  mem to xreg
MMX Technology
• PUNPCKH – unpack high : byte, word & doubleword

Ex
PUNPCKHB MM1,MM3  xreg to xreg
PUNPCKHW MM1,MM3
PUNPCKHD MM1,MM3
PUNPCKHB MM1,DATA  mem to xreg
PUNPCKHW MM1,DATA
PUNPCKHD MM1,DATA
PUNPCHL – unpack LOW : byte, word & doubleword

Ex
PUNPCHLB MM1,MM3  xreg to xreg
PUNPCHLW MM1,MM3
PUNPCHLD MM1,MM3
PUNPCHLB MM1,DATA  mem to xreg
PUNPCHLW MM1,DATA
PUNPCHLD MM1,DATA
MMX Technology

Arithmetic Coprocessor Coprocessor Basic

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Arithmetic Coprocessor Coprocessor Basic

Uploaded by

Copyright:

Available Formats

Arithmetic Coprocessor

• The 80x87 is able to multiply, divide, add, subtract, find the

• Data types include

• The operation performed by the 80x87 generally executes

Data Formats for the Arithmetic Coprocessor:

3 forms of signed integers-

• The directives dw, dd and dq are used for declaring signed

 for every microprocessor their will be a coprocessor

• BCD form requires 80 bits of memory.

• Each number is stored as an 18-digit packed integer in 9 bytes

• Both positive & negative numbers are stored in true form

• Hold signed integers, fractions & mixed numbers.

• Floating point numbers has 3 parts

• Intel family arithmetic coprocessor supports 3 types of

- Short (32 bit) : single precision, with a bias of 7FH

Converting Decimal to Floating-point form:

- Convert the decimal number into binary.

- The number 0 is stored as all 0s (except for the sign bit).

significand of all 0s. Sign bit is used to represent +/-

Converting Floating-point to Decimal:

- Separate the sign-bit, biased exponent and significand.

Ex: convert floating-point to decimal:

The 8087 Architecture:

• 8087 designed to operate concurrently with microprocessor

• 8087 executes 68 different instructions

• Both microprocessor & coprocessor can execute their

• The numeric or arithmetic coprocessor is a special purpose

• Microprocessor intercepts & executes normal instruction

• Numeric execution unit ( NEU ) :-

- Has 8 register stack, hold arithmetic instruction & results

- Also other register status, tag, control & exception pointers

- Stack within the coprocessor contain 8-registers each 80

- Coprocessors converted data are moved between memory &

• Coprocessor is accessed by executing, FSTSW

• The coprocessor/microprocessor communications are

B – busy bit: indicate coprocessor is busy, can be checked

• TOP - top of stack (ST) : bit indicate the current register

• ES – error summary : bit is set if any unmasked error bit

• PE – precision error : result exceed the precision

• UE – underflow : non-zero result , which is too small to

• OE – overflow : result is too large. If error is masked,

• DE – denormalized error : least one of the operands is

• IE – invalid error : indicate stack underflow/overflow,

- masks & unmasked the exception bits that corresponds

- FLDCW instruction is used to load a value onto the

Infinity control Invalid

• RC – rounding control : determine the type of rounding

• PC – precision control : sets the precision of the results

• Exception masks : check error indicated by the exception affects

-Indicates the contents of each location in the coprocessor stack

- program can view the tag register by storing the coprocessor

- 00 – VALID, 01 – ZERO, 10 – INVALID or INIFINITY, 11 - EMPTY

TAG 7 TAG 6 TAG 5 TAG 4 TAG 3 TAG 2 TAG 1 TAG 0

• executes over 68 different instructions

• coprocessor uses the data bus for data transfer during

i) Data transfer instruction:

coprocessor stores the data in 80-bit extended precision floating

Floating – point data transfer

FLD (Load Real) :

FLD data7 ;copies the content memory location data7 to

FST ( store real) :

- Stores a copy of the top of the stack into memory or

ex : FXCH st2 ; exchanges top of the stack with register 2