You are on page 1of 6

Q1) What are floating-point number?

Give examples A real number ( that is, a number that can contain a fractional part)
The following are floating point numbers 3.0 -111.5 3E-5 The last example is a computer shorthand for scientific notation. It means 3*10-5 (or 10 to the negative 5th power multiplied by 3). Floating point representation the most common solution basically represents real in scientific notation. Scientific notation represents number as a base number and an exponent. For example 123.456 could be represented as 1.23456 x 10**2. In hexadecimal, the number 123.abc might be represented as 1.23abcx 16**2 In essence, computers are integer machines and are capable of representing real numbers only by using complex codes. The most popular code for representing real numbers is called the IEEE Floating Point Standard. The term floating point is derived from the fact that there is no fixed number of digits before and after the decimal point; that is, the decimal point can float. There are also representations in which the number of digits before and after the decimal point is set, called fixed-point representations. In general, floating-point representations are slower and less accurate than fixed-point representations, but they can handle a larger range of numbers. Such as 1,000,000,000,000 to 0.0000000000000001 Note that most floating-point numbers a computer can represent are just approximations. One of the challenges in programming with floating-point values is ensuring that the approximations lead to reasonable results. If the Programmer not careful, small discrepancies in the approximations can snowball to the point where the final results become meaningless. Because mathematics with floating-point numbers requires a great deal of computing power, many microprocessors with a chip, called a floating point unit (FPU), specialized for performing floatingpoint arithmetic. FPUs are also called math coprocessor and numeric coprocessors.

Q2).Briefly explain the following terms Significand(Mantissa), Exponent, Normalization Siginificand Exponent the string of digits power of the base by which the significand is multiplied.

Noramalization The requirement that the leftmost digit of the significand be nonzero

Q3).How do you represent the IEEE single precision and double precision floating point standard IEEE floating point numbers have three basic components: The sign The exponent The Mantissa / Significand The mantissa is composed of the fraction and an implicit leading digit (explained below). The exponent base (2) is implicit and need not be stored. The following figure shows the layout for single (32-bit) and double (64-bit) precision floating-point values. The number of bits for each field are shown (bit ranges are in square brackets): Sign Single Precision 1 [31] Exponent 8 [30-23] Fraction 23 [22-00] Bias 127

Double Precision 1 [63] 11 [62-52] 52 [51-00] 1023 The Sign Bit The sign bit is as simple as it gets. 0 denotes a positive number; 1 denotes a negative number. Flipping the value of this bit flips the sign of the number. The Exponent The exponent field needs to represent both positive and negative exponents. To do this, a bias is added to the actual exponent in order to get the stored exponent. For IEEE single-precision floats, this value is 127. Thus, an exponent of zero means that 127 is stored in the exponent field. A stored value of 200 indicates an exponent of (200-127), or 73. For reasons discussed later, exponents of -127 (all 0s) and +128 (all 1s) are reserved for special numbers. For double precision, the exponent field is 11 bits, and has a bias of 1023.

The Mantissa The mantissa, also known as the significand, represents the precision bits of the number. It is composed of an implicit leading bit and the fraction bits. To find out the value of the implicit leading bit, consider that any number can be expressed in scientific notation in many different ways. For example, the number five can be represented as any of these: 5.00 100 0.05 102 5000 10-3

In order to maximize the quantity of representable numbers, floating-point numbers are typically stored in normalized form. This basically puts the radix(decimal point) point after the first non-zero digit. In normalized form, five is represented as 5.0 100. A nice little optimization is available to us in base two, since the only possible non-zero digit is 1. Thus, we can just assume a leading digit of 1, and don't need to represent it explicitly. As a result, the mantissa has effectively 24 bits of resolution, by way of 23 fraction bits. Putting it All Together So, to sum up: 1. The sign bit is 0 for positive, 1 for negative. 2. The exponent's base is two. 3. The exponent field contains 127 plus the true exponent for single-precision, or 1023 plus the true exponent for double precision. 4. The first bit of the mantissa is typically assumed to be 1.f, where f is the field of fraction bits.

Q4). Define the formula to find number value in IEEE Standard single precision and double
precision floating- point numbers

Common Formula Where,

N=1s x 2EB x 1.F

S=sign bit (0 for positive and I for negative) E=exponent biased by B F=significand with implied 1 Single Precision

N=1s x 2E127 x 1.F

Double Precision

N=1s x 2E1023 x 1.F

Q5).

Single Precision
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
01 8 9 31

The value V represented by the word may be determined as follows:

If E=255 and F is nonzero, then V=NaN ("Not a number")


0 11111111 00000100000000000000000 1 11111111 00100010001001010101010

If E=255 and F is zero and S is 1, then V=-Infinity


1 11111111 00000000000000000000000

If E=255 and F is zero and S is 0, then V=Infinity


0 11111111 00000000000000000000000 If E=0 and F is zero and S is 0, then V=0 0 00000000 00000000000000000000000

If E=0 and F is zero and S is 1, then V=-0


1 00000000 00000000000000000000000
If 0<E<255 then V=(-1)**S * 2 ** (E-127) * (1.F) where "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point. If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values

Double Precision

S EEEEEEEEEEE
01

FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
52

11 12

The value V represented by the word may be determined as follows: If E=2047 and F is nonzero, then V=NaN ("Not a number")

If E=2047 and F is zero and S is 1, then V=-Infinity If E=2047 and F is zero and S is 0, then V=Infinity If E=0 and F is zero and S is 0, then V=0 If E=0 and F is zero and S is 1, then V=-0
If 0<E<255 then V=(-1)**S * 2 ** (E-1023) * (1.F) where "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point. If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-1022) * (0.F) These are "unnormalized" values Q6). According to the IEEE specification, How do you represent the exponent range

Single Precision

to 2n-1 - 1 -126 to 127


Double Precision -2n-1 +2 to 2n-1 - 1

-2n-1 +2

-1022 to 1023
Q7). Add following decimal value given in scientific notation. 3.25 x 10 ** 3 2.63 x 10 ** -1 -------------------first step: align decimal points second step: add 3.25 x 10 ** 3 + 0.000263 x 10 ** 3 -------------------3.250263 x 10 ** 3 (presumes use of infinite precision, without regard for accuracy) Q8). Add this floating point value. Convert it into binary before addition. 0.25+100 .25 = 0 01111101 00000000000000000000000 100 = 0 10000101 10010000000000000000000

to add these fl. pt. representations, step 1: align radix points shifting the mantissa LEFT by 1 bit DECREASES THE EXPONENT by 1 shifting the mantissa RIGHT by 1 bit INCREASES THE EXPONENT by 1 we want to shift the mantissa right, because the bits that fall off the end should come from the least significant end of the mantissa -> choose to shift the .25, since we want to increase it's exponent. 0 01111101 00000000000000000000000 (original value) 0 01111110 10000000000000000000000 (shifted 1 place) (note that hidden bit is shifted into msb of mantissa) 0 01111111 01000000000000000000000 (shifted 2 places) 0 10000000 00100000000000000000000 (shifted 3 places) 0 10000001 00010000000000000000000 (shifted 4 places) 0 10000010 00001000000000000000000 (shifted 5 places) 0 10000011 00000100000000000000000 (shifted 6 places) 0 10000100 00000010000000000000000 (shifted 7 places) 0 10000101 00000001000000000000000 (shifted 8 places) step 2: add (don't forget the hidden bit for the 100) 0 10000101 1.10010000000000000000000 (100) + 0 10000101 0.00000001000000000000000 (.25) ------------------------------------------------------------0 10000101 1.10010001000000000000000 step 3: normalize the result (get the "hidden bit" to be a 1) result is 0 10000101 10010001000000000000000

Q9).Find number values for following IEEE standard single precision floating point numbers a). 0 10000000 00000000000000000000000 = + 1x 2(128-127) x 1.0 = 2 b). 0 10000001 10100000000000000000000 = + 1x 2(129-127) x 1.101 = 6.5 c). 1 10000001 10100000000000000000000 = + 1x 2(129-127) x 1.101 = - 6.5

You might also like