You are on page 1of 59

Stuff

• Assn 5 is available and due Monday, 7pm.

• Pre-exam tutorial time. (Exam is April 27). April


18, room TBA. Office hours Wed. 26th, time TBA.

Winter 2006 CISC121 - Prof. McLeod 1


Last Time
• Complexity of Mergesort.

• Started Quicksort.

Winter 2006 CISC121 - Prof. McLeod 2


Today
• Finish up Quicksort.

• How numbers are stored in Java.


• Sources and Effects of Roundoff Error.

Winter 2006 CISC121 - Prof. McLeod 3


Quicksort – Cont.
• Code for “preparation” method:

public static void quickSort(int[] A) {

int max = 0;
for (int i = 1; i < A.length; i++)
if (A[i] > A[max]) max = i;
swap(A, A.length-1, max);
quickSort(A, 0, A.length-2);

} // end quickSort(array)

Winter 2006 CISC121 - Prof. McLeod 4


public static void quickSort (int[] A, int first, int
last) {

int lower = first + 1;


int upper = last;
swap(A, first, (first+last)/2);

int pivot = A[first];


while (lower <= upper) {
while (A[lower] < pivot) lower++;
while (A[upper] > pivot) upper--;
if (lower < upper) swap(A, lower++, upper--);
else lower++;
}
swap(A, upper, first);
if (first < upper - 1) quickSort(A, first, upper-1);
if (upper + 1 < last) quickSort(A, upper+1, last);
} // end quickSort(subarrays)
Winter 2006 CISC121 - Prof. McLeod 5
• Example - sorting array:
8 5 4 7 6 1 6 3 8 12 10
• After “preparation”:
Max value
8 5 4 7 6 1 6 3 8 10 12
• After selection of pivot value and initial assignment of lower
and upper:
pivot
6 5 4 7 8 1 6 3 8 10 12
lower upper
• Upper and lower move in until a swap has to be made:

pivot
6 5 4 7 8 1 6 3 8 10 12
lower upper

• Make the swap and move to next swap to be made:

pivot
6 5 4 3 8 1 6 7 8 10 12
lower upper
Winter 2006 CISC121 - Prof. McLeod 6
• Make the swap, and then upper and lower pass each
other:
pivot

6 5 4 3 6 1 lower
upper 8 7 8 10 12

• Put pivot value back in, and divide into two subarrays:

1 5 4 3 6 6 8 7 8 10 12

• (Note how old pivot and max value are already in correct
positions and do not need to be included in next recursive
call).

Winter 2006 CISC121 - Prof. McLeod 7


• Two recursive calls with subarrays:

pivot pivot
4 5 1 3 6 6 7 8 8 10 12

• After upper and lower have moved through arrays and


pivots have been replaced:
1 3 4 5 6 6 7 8 8 10 12

• Last set of recursive calls with subarrays:

1 3 4 5 6 6 7 8 8 10 12

• Array is sorted:

1 3 4 5 6 6 7 8 8 10 12
Winter 2006 CISC121 - Prof. McLeod 8
Quicksort – Cont.
• Note that new arrays are not created, only the
bounds of the subarrays in the array A are
changed. The recursive calls will only contain
different bounding values. And the subarrays get
smaller for each call.
• The anchor case is when the size of the subarray
is one.

Winter 2006 CISC121 - Prof. McLeod 9


Quicksort - Cont.
• The worst case is when a near-median value is
not chosen – the pivot value is always a maximum
or a minimum value. Now the algorithm is O(n2).
• However, if the pivot values are always near the
median value of the arrays, the algorithm is
O(nlog(n)) – which is the best case. (See the
derivation of this complexity for merge sort).
• The average case also turns out to be O(nlog(n)).

Winter 2006 CISC121 - Prof. McLeod 10


Choice of Pivot Value
• Several techniques can be used:
– Choose first value
– Choose middle value
– Calculate “pseudo-median”. This is how quicksort is
implemented in Arrays.sort().
– Why not calculate the actual median?

• The technique you use will depend how random


the data set is.

Winter 2006 CISC121 - Prof. McLeod 11


Quicksort - Cont.
• So, the choice of the pivot value can be critical.
Knowledge of the nature of the data to be sorted
might suggest another algorithm to use in
choosing the pivot value.
• Experiment has shown that Quicksort is faster
than any other efficient sort.
• It has also been suggested that Quicksort can be
made even faster by choosing to use a simple
sort like insertion sort for subarray sizes less than
about 30.

Winter 2006 CISC121 - Prof. McLeod 12


Comparison of Quicksort and Mergesort
• Mergesort is slower than Quicksort primarily
because it does not do any preliminary shuffling of
the data when it divides it.
• It also requires more memory because of the
creation of temporary arrays (or lists) to complete
the merge operation.

Winter 2006 CISC121 - Prof. McLeod 13


Comparison of Quicksort and Mergesort
• However, merge sort is very useful when sorting
data that does not have all be in memory at once
(“in place”). For example, if the data must be
saved on disk because it will not all fit in memory,
then mergesort will make the minimum number of
file reads and writes.
• If you have to sort lists, it is very easy to code
mergesort for them.

Winter 2006 CISC121 - Prof. McLeod 14


Comparison of Sorts
• Sorting 50,000 random int’s (average case) on a
PIII using Ready To Program.

Sort Time (msec) # Comparisons # Swaps

Insertion 11000 ~623,000,000 49,999


O(n2) O(n)
simple

Selection 18000 ~1,250,000,000 50,000


O(n2) O(n)
Bubble 40000 ~623,000,000 ~623,000,000
O(n2) O(n2)

• Selection sort is not influenced by the state of the


data, where bubble and insertion are.
Winter 2006 CISC121 - Prof. McLeod 15
Comparison of Sorts – Cont.
Sort Time (msec) # Comparisons # Swaps

Shell 50 ~1,300,000 ~1,784,000


O(n )
1.25
O(n1.25)
Quicksort 40 360,000 218,000
O(nlog(n)) O(nlog(n))
Efficient

Merge (array) 90 718,100 784,464


O(nlog(n)) O(nlog(n))
Merge (lists) 1810 718,100 784,464
O(nlog(n)) O(nlog(n))
Radix (queues) 960 N/A 1,000,000
O(n)
Arrays.sort 40 ? ?
Java

Collections.sort 250 ? ?
(Vectors)
Winter 2006 CISC121 - Prof. McLeod 16
Number Representation
• Binary numbers or “base 2” is a natural
representation of numbers to a computer.
• As a transition, hexadecimal (or “hex”) numbers
are also used.
• Octal (base 8) numbers are used to a lesser
degree.
• Decimal (base 10) numbers are *not* naturally
represented in computers.

Winter 2006 CISC121 - Prof. McLeod 17


Number Representation - Cont.
• If the number is in base “r” (the “radix”!) then the
number is represented as:

An-1×rn-1 +An-2×rn-2 + … A1×r1 + A0 ×r0+


+A-1×r-1 + A-2 ×r-2 + …A-m+1 ×r-m+1 + A-m ×r-m

Winter 2006 CISC121 - Prof. McLeod 18


Number Representation - Cont.
• For example, in base 10:

when r=10, a decimal number: 365.24219878 is:


3×102+6 ×101+5×100+2×10-1+4×10-2+2×10-3+ 300
60
5
+1×10 +9×10 +8×10 +7×10 +8×10 =
-4 -5 -6 -7 -8
0.2
0.04
365.24219878 (in base 10) 0.002
0.0001
0.00009
0.000008
0.0000007
+ 0.00000008
_______________
Winter 2006 CISC121 - Prof. McLeod 365.2421987819
Number Representation - Cont.
• In base 2 (digits either 0 or 1):

r=2, a binary number: (110101.11)2=


1×25+1×24+0×23+1×22+0×21+1×20 +1×2-1 +1×2-2 =
=53.75 (in base 10)

Winter 2006 CISC121 - Prof. McLeod 20


Number Representation - Cont.
• Octal Numbers: a base-8 system with 8 digits: 0,
1, 2, 3, 4, 5, 6 and 7:

• For example:

(127.4)8 = 1×82+2×81+7×80+4×8-1=87.5.

Winter 2006 CISC121 - Prof. McLeod 21


Number Representation - Cont.
• Hexadecimal Numbers: a base-16 system with
16 digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E,
and F:

• For example:

(B65F)16 = 11×163+6×162+5×161+15×160 = 46687.

Winter 2006 CISC121 - Prof. McLeod 22


Number Representation - Cont.
• The above series show how you can convert from
binary, octal or hex to decimal.
• How to convert from decimal to one of the other
bases?:

• integral part: divide by r and keep the remainder.


• decimal part: multiply by r and keep the carry
• “r” is the base - either 2, 8 or 16

Winter 2006 CISC121 - Prof. McLeod 23


Number Representation - Cont.
• For example, Divisor(r) Dividend Remainder

convert 625.7610 2 625


least
2 312 (quotient) 1 significant
to binary: digit
2 156 0
2 78 0
• So, 62510 is 2 39 0
2 19 1
2 9 1
10011100012
2 4 1
2 2 0
2 1 0 most
significant
0 1
digit
Winter 2006 CISC121 - Prof. McLeod 24
Number Representation - Cont.
• For the “0.7610” Multiplier(r) Multiplicand Carry
2 0 .76
part: 2 1 .52 (product) 1
2 1 .04 1
• So, 0.7610 is 2 0 .08 0
2 0 .16 0
2 0 .32 0
0.110000101…2 2 0 .64 0
2 1 .28 1
2 0 .56 0
• 625.76 is: 2 1 .02 1

(1001110001.110000101...)2 ...
Winter 2006 CISC121 - Prof. McLeod 25
Aside - Roundoff Error
• From the previous example, you can see that
exact base 10 decimals cannot always be exactly
represented in binary.
• For example 0.110 is:

0.0001100110011001100110011… in base 2.

• Since numbers are stored in a finite amount of


computer memory, some of this number will be
lost - this is the source of “Roundoff Error”.
Winter 2006 CISC121 - Prof. McLeod 26
Number Representation - Cont.
• Converting between binary, octal and hex is much
easier - done by “grouping” the numbers:
• For example:

(010110001101011.111100000110)2=(?)8

010 110 001 101 011 . 111 100 000 110


(2 6 1 5 3 . 7 4 0 6)8

Winter 2006 CISC121 - Prof. McLeod 27


Number Representation - Cont.
• Another example:

(2C6B.F06)16=(?)2

(2 C 6 B . F 0 6)16
( 0010 1100 0110 1011 . 1111 0000 0110)2

Winter 2006 CISC121 - Prof. McLeod 28


From Before: Integer Primitive Types in
Java
• For byte, from -128 to 127, inclusive (1 byte).
• For short, from -32768 to 32767, inclusive (2
bytes).
• For int, from -2147483648 to 2147483647,
inclusive (4 bytes).
• For long, from -9223372036854775808 to
9223372036854775807, inclusive (8 bytes).

• A “byte” is 8 bits, where a “bit” is either 1 or 0.

Winter 2006 CISC121 - Prof. McLeod 29


Storage of Integers
• An “un-signed” 8 digit binary number can range
from 00000000 to 11111111
• 00000000 is 0 in base 10.
• 11111111 is 1x20 + 1x21 + 1x22 + … + 1x27 = 255,
base 10.

Winter 2006 CISC121 - Prof. McLeod 30


Storage of Integers - Cont.
• So, how can a negative binary number be stored?
• One way is to use the “two’s complement” system
of storage.
• Make the most significant bit a negative number:
• So, the lowest “signed” binary 8 digit number is
now: 10000000, which is -1x27, or -128 base 10.

Winter 2006 CISC121 - Prof. McLeod 31


Storage of Integers - Cont.
• Two’s Compliment System:
binary base 10
10000000 -128
10000001 -127
11111111 -1
00000000 0
00000001 1
01111111 127

Winter 2006 CISC121 - Prof. McLeod 32


Storage of Integers - Cont.
• For example, the binary number

10010101 is
1x20 + 1x22 + 1x24 - 1x27
= 1 + 4 + 16 - 128
= -107 base 10

• Now you can see how the primitive integer type,


byte, ranges from -128 to 127.

Winter 2006 CISC121 - Prof. McLeod 33


Storage of Integers - Cont.
• Suppose we wish to add 1 to the largest byte
value:
01111111
+00000001
• This would be equivalent to adding 1 to 127 in
base 10 - the result would normally be 128.
• In base 2, using two’s compliment, the result of
the addition is 10000000, which is -128 in base
10!
• So integer numbers wrap around, in the case of
overflow - no warning is given in Java!

Winter 2006 CISC121 - Prof. McLeod 34


Storage of Integers - Cont.
• An int is stored in 4 bytes using “two’s
complement”.
• An int ranges from:

10000000 00000000 00000000 00000000


to
01111111 11111111 11111111 11111111

or -2147483648 to 2147483647 in base 10

Winter 2006 CISC121 - Prof. McLeod 35


Real Primitive Types
• For float, (4 bytes) roughly ±1.4 x 10-38 to ±3.4
x 1038 to 7 significant digits.
• For double, (8 bytes) roughly ±4.9 x 10-308 to ±1.7
x 10308 to 15 significant digits.

Winter 2006 CISC121 - Prof. McLeod 36


Storage of Real Numbers
• The system used to store real numbers in Java
complies with the IEEE standard number 754.
• Like an int, a float is stored in 4 bytes or 32
bits.
• These bits consist of 24 bits for the mantissa and
8 bits for the exponent:

mantissa exponent

00000000 00000000 00000000 00000000

Winter 2006 CISC121 - Prof. McLeod 37


Storage of Real Numbers - Cont.
• So a value is stored as:

value = mantissa  2exponent

• The exponent for a float can range from -128 to


128, which is about 10-38 to 1038.
• The float mantissa must lie between -1.0 and
1.0, and will have about 7 significant digits when
converted to base 10.

Winter 2006 CISC121 - Prof. McLeod 38


Storage of Real Numbers - Cont.
• The double type is stored using 8 bytes or 64
bits - 53 bits for the mantissa, and 11 bits for the
exponent.
• The exponent gives numbers between 2-1024 and
21024, which is about 10-308 and 10308.
• The mantissa allows for the storage of about 16
significant digits in base 10.
• (Double.MAX_VALUE is:
1.7976931348623157E308)

Winter 2006 CISC121 - Prof. McLeod 39


Storage of Real Numbers - Cont.
• See the following web site for more info:

http://grouper.ieee.org/groups/754/

Winter 2006 CISC121 - Prof. McLeod 40


Storage of Real Numbers - Cont.
• So, a real number can only occupy a finite
amount of storage in memory.
• This effect is very important for two kinds of
numbers:
– Numbers like 0.1 that can be written exactly in base
10, but cannot be stored exactly in base 2.
– Real numbers (like  or e) that have an infinite number
of digits in their “real” representation can only be
stored in a finite number of digits in memory.
• And, we will see that it has an effect on the
accuracy of mathematical operations.
Winter 2006 CISC121 - Prof. McLeod 41
Storage of “Real” or “Floating-Point”
Numbers - Cont.
• Consider 0.1:

(0.1)10 = (0.0 0011 0011 0011 0011 0011…)2

• What happens to the part of a real number that


cannot be stored?
• It is lost - the number is either truncated or
rounded (truncated in Java).
• The “lost part” is called the “roundoff error”.

Winter 2006 CISC121 - Prof. McLeod 42


Storage of “Real” or “Floating-Point”
Numbers - Cont.
• Back to the storage of 0.1:
• Compute: 10000
 0.1
i 1
• And, compare to 1000.

float sum = 0;
for (int i = 0; i < 10000; i++)
sum += 0.1;
System.out.println(sum);
Winter 2006 CISC121 - Prof. McLeod 43
Storage of “Real” or “Floating-Point”
Numbers - Cont.
• Prints a value of 999.9029 to the screen.
• If sum is declared to be a double then the
value: 1000.0000000001588 is printed to the
screen.
• So, the individual roundoff errors have piled up to
contribute to a cumulative error in this
calculation.
• As expected, the roundoff error is smaller for a
double than for a float.

Winter 2006 CISC121 - Prof. McLeod 44


Roundoff Error

• This error is referred to in two different ways:

• The absolute error:

absolute error = |x - xapprox|

• The relative error:

relative error = (absolute error)  |x|

Winter 2006 CISC121 - Prof. McLeod 45


Roundoff Error - Cont.

• So for the calculation of 1000 as shown above,


the errors are:
Type Absolute Relative
float 0.0971 9.71E-5
double 1.588E-10 1.588E-13

• The relative error on the storage of 0.1 is the


absolute error divided by 1000.

Winter 2006 CISC121 - Prof. McLeod 46


The Effects of Roundoff Error
• Roundoff error can have an effect on any
arithmetic operation carried out involving real
numbers.
• For example, consider subtracting two numbers
that are very close together:
• Use the function

f ( x )  1  cos( x )
for example. As x approaches zero, cos(x)
approaches 1.

Winter 2006 CISC121 - Prof. McLeod 47


The Effects of Roundoff Error
• Using double variables, and a value of x of
1.0E-12, f(x) evaluates to 0.0.
• But, it can be shown that the function f(x) can also
be represented by f’(x):
2
sin ( x )
f ' ( x)  f ( x) 
1  cos( x )
• For x = 1.0E-12, f’(x) evaluates to 5.0E-25.
• The f’(x) function is less susceptible to roundoff
error.
Winter 2006 CISC121 - Prof. McLeod 48
The Effects of Roundoff Error - Cont.
• Another example. Consider the smallest root of
the polynomial: ax2+bx+c=0:

 b  b  4ac 2
x1 
2a
• What happens when ac is small, compared to b?
• It is known that for the two roots, x1 and x2:

c
x1 x2 
a
Winter 2006 CISC121 - Prof. McLeod 49
The Effects of Roundoff Error - Cont.
• Which leads to an equation for the root which is
not as susceptible to roundoff error:

 2c
x1 
b  b  4ac 2

• This equation approaches –c/b instead of zero


when ac << b2.

Winter 2006 CISC121 - Prof. McLeod 50


The Effects of Roundoff Error - Cont.
• The examples above show what can happen
when two numbers that are very close are
subtracted.
• Remember that this effect is a direct result of
these numbers being stored with finite accuracy in
memory.

Winter 2006 CISC121 - Prof. McLeod 51


The Effects of Roundoff Error - Cont.
• A similar effect occurs when an attempt is made to add a
comparatively small number to a large number:

boolean aVal = ((1.0E10 + 1.0E-20)==1.0E10);


System.out.println(aVal);

• Prints out true to the screen


• Since 1.0E-20 is just too small to affect any of the bit
values used to store 1.0E10. The small number would
have to be about 1.0E-5 or larger to affect the large
number.
• So, keep this behaviour in mind when designing
expressions!

Winter 2006 CISC121 - Prof. McLeod 52


The Effect on Summations
• Taylor Series are used to approximate many
functions. For example:

( 1)i 1 x i

ln(1  x )  
i 1 i

• For ln(2):

( 1)i 1 1 1 1
ln(2)    1     ...
i 1 i 2 3 4
Winter 2006 CISC121 - Prof. McLeod 53
The Effect on Summations
• Since we cannot loop to infinity, how many terms
would be sufficient?
• Since the sum is stored in a finite memory space,
at some point the terms to be added will be much
smaller than the sum itself.
• If the sum is stored in a float, which has about 7
significant digits, a term of about 1x10-8 would not
be significant. So, i would be about 108 - that’s a
lot of iterations!

Winter 2006 CISC121 - Prof. McLeod 54


The Effect on Summations - Cont.
• On testing using a float, it took 33554433
iterations and 25540 msec to compute! (sum no
longer changing, value = 0.6931375)
• Math.log(2) = 0.6931471805599453
• So, roundoff error had a significant effect and the
summation did not even provide the correct value.
A float could only provide about 5 correct
significant digits, tops.
• For double, about 1015 iterations would be
required! (I didn’t try this one…)
• So, this series does not converge quickly!

Winter 2006 CISC121 - Prof. McLeod 55


The Effect on Summations - Cont.
• Here is another way to compute natural logs:

 1  x  
1
  2
2 i 1
ln  x
1 x  i 0 2i  1

• Using x = 1/3 will provide ln(2).

Winter 2006 CISC121 - Prof. McLeod 56


The Effect on Summations - Cont.
• For float, this took 8 iterations and <1msec
(value = 0.6931472).
• Math.log(2) = 0.6931471805599453
• For double, it took 17 iterations, <1 msec to give
the value = 0.6931471805599451
• Using the Windows calculator ln(2) =
0.693147180559945309417232121458177 (!!)
• So, the use of the 17 iterations still introduced a
slight roundoff error.
• Note both loops are O(n) - but one is sure faster
than the other!
Winter 2006 CISC121 - Prof. McLeod 57
Numeric Calculations
• Error is introduced into a calculation through two
sources (assuming the formulae are correct!):
– The inherent error in the numbers used in the
calculation.
– Error resulting from roundoff error.
• Often the inherent error dominates the roundoff
error.
• But, watch for conditions of slow convergence or
ill-conditioned matrices, where roundoff error will
accumulate and end up swamping out the
inherent error.

Winter 2006 CISC121 - Prof. McLeod 58


Numeric Calculations - Cont.
• Once a number is calculated, it is very important
to be able to estimate the error using both
sources, if necessary.
• The error must be known in order that the number
produced by your program can be reported in a
valid manner.
• This is a non-trivial topic in numeric calculation
that we will not discuss in this course.

Winter 2006 CISC121 - Prof. McLeod 59

You might also like