You are on page 1of 8

J. Symbolic Computation (2002) 33, 385392 doi:10.1006/jsco.2001.0518 Available online at http://www.idealibrary.

com on

A Fast Euclidean Algorithm for Gaussian Integers


GEORGE E. COLLINS
Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, U.S.A.

A new version of the Euclidean algorithm is developed for computing the greatest common divisor of two Gaussian integers. It uses approximation to obtain a sequence of remainders of decreasing absolute values. The algorithm is compared with the new (1+i)ary algorithm of Weilert and found to be somewhat faster if properly implemented. c 2002 Elsevier Science Ltd

1. Introduction We present a new algorithm for computing a greatest common divisor of any two non-zero Gaussian integers. It is Euclidean in the sense that it computes a sequence of quotients and remainders. It is approximative in the sense that it approximates the exact quotient of any two successive remainders, Ai and Ai+1 , and then uses a nearest Gaussian integer, Qi , to that approximate quotient to compute the next remainder, Ai+2 = Ai Qi Ai+1 . The approximation is suciently accurate that one always has |Ai+2 | < c|Ai+1 | for some constant c < 1. We further improve this method by utilizing the fact that the quotients Qi are frequently so small that the product Qi Ai+1 can be computed more quickly by addition and shifting. Without this improvement the method is somewhat slower than Weilerts (1 + i)-ary method, but with it it is somewhat faster. Recently Weilert (2000a) presented a new gcd algorithm for Gaussian integers, the (1 + i)-ary algorithm, that is analogous to the binary gcd algorithm for rational integers. In that paper he also briey described an algorithm for Gaussian integers that he referred to as Lehmer-type. This algorithm was also an approximative Euclidean algorithm in the above sense. In a personal communication, Weilert explained that he called this algorithm Lehmer-type because it uses only leading parts of the operands to compute quotients. We would prefer to reserve the name Lehmer-type for algorithms that share the properties of Lehmers method that the correct quotients are always computed and several small quotients are combined, as in Caviness and Collins (1976). Weilert presented experimental results showing the (1 + i)-ary algorithm to be about three times as fast as his Lehmer-type algorithm. In this paper we describe our approximative Euclidean algorithm in full detail and present experimental results showing it to be faster than Weilert reported for the (1 + i)-ary algorithm by a factor of about 1.6 and about 5.5 times as fast as Weilerts Lehmer-type algorithm. His Lehmer-type algorithm was not described in sucient detail that we can explain this discrepancy. We have implemented both the (1 + i)-ary algorithm and our approximative Euclidean algorithm in the SACLIB computer algebra system. We describe both implementations in full detail to show how we have been able to make each algorithm very ecient. Our
07477171/02/040385 + 08 $35.00/0 c 2002 Elsevier Science Ltd

386

G. E. Collins

implementation of the (1+i)-ary algorithm takes only about 80% of the time reported by Weilert for his implementation, and our implementation of our approximative Euclidean algorithm takes about 75% of the time taken by our implementation of the (1 + i)-ary algorithm. The length of any Gaussian integer A, denoted by L(A), is dened to be the maximum of the bit lengths of its real and imaginary parts. We dene gcd(A, B) to be the unique greatest common divisor of A and B that is either real or in the rst quadrant. We show that the computing time of our algorithm is dominated by m(m n + 1) + n(n k + 1), where m and n are the lengths of the inputs A and B, m n, and k is the length of C = gcd(A, B). We observe that the same is easy to prove for Weilerts algorithm. In Weilert (2000a) the computing time of the (1 + i)-ary algorithm is compared with that of the asymptotically faster descent method of Schnhage. Weilert nds that the o (1 + i)-ary algorithm is faster for inputs of lengths up to about 13 500. Referring to Weilerts table, we nd that our approximative Euclidean algorithm is faster for lengths up to about 32 000. See Weilert (2000b) for more about the descent method. In Section 2 we present the approximative Euclidean algorithm, prove its validity and analyze its computing time. In Section 3 we discuss our implementation in SACLIB of both the approximative Euclidean algorithm and the (1 + i)-ary algorithm, and compare their observed computing times. In Section 4 we present additional observations regarding the characteristics of both algorithms. 2. The Approximative Euclidean Algorithm Our algorithm is such that whenever we divide A by B with L(A) = m and L(B) = n, we have m n 1. A nearest Gaussian integer, Q, to the quotient A/B will have L(Q) approximately equal to m n. For an arbitrary complex number x + yi, let {x + yi} = {x} + {y}i, where {x} is the nearest integer to the real number x (the one of least absolute value in case of a tie). Our algorithm uses a parameter d, a positive integer, to control the accuracy with which quotients of Gaussian integers are computed. Let = max(m n, 0). If + d n, let a = {A/2n(+d) } and b = {B/2n(+d) }; otherwise let a = A and b = B. a/b is then an approximation to A/B. We proceed to compute a/b in the usual way, dividing a conj(b) by norm(b), obtaining the rational complex number q. We then let Q = {q}. Q may not be the same as {A/B}, but if R = A QB and d is large enough, we will nevertheless have, for some positive c < 1, |R| < c|B|. We now proceed to prove a sequence of four theorems that will establish that d 8 suces. Theorems 1 and 2 are used as lemmas in the proof of Theorem 3, which bounds the error in the quotient resulting from truncating A and B. Theorem 4 then uses Theorem 3 to bound the remainder R. Theorem 1. Let A and B be Gaussian integers, m and k integers, with m L(A) k and m L(B) k. Let A = A/2k , B = B/2k , A = {A/2k } and B = {B/2k }. Then |re(A B) re(A B)| < 2mk+1 and |im(A B) im(A B)| < 2mk+1 . Proof. Let A = a1 + a2 i, B = b1 + b2 i, A = a1 + a2 i, and B = b1 + b2 i. Then |11 a11 | = |(1 a1 )1 + (1 1 )1 | |1 a1 ||1 | + |1 1 ||1 | (1/2)|1 | + a b b a b b b a a b b b a b (1/2)|1 | < 2mk since |1 | < 2mk and |1 | 2mk . Similarly |22 a22 | < 2mk . a b a a b b It then follows immediately that |re(A B) re(A B)| < 2mk+1 . The proof for the imaginary parts is similar.2

A Fast Euclidean Algorithm

387

Theorem 2. Let a, a, b and be real numbers. Assume that | b| |b| with 1/4. b b Then |/ a/b| (4/3)|a/b| + (1 + 4/3)|( a)/b|. a b a Proof. Let = (1 1 )b. Then |1 | so |/ a/b| = |/((1 1 )b) a/b| = b a b a 2 2 2 |(1 + 1 + 1 /(1 1 ))/b a/b| = |(1 + 1 /(1 1 ))a/b + (1 + 1 + 1 /(1 1 ))( a)/b| a a 2 2 |(1 +1 /(11 ))a/b|+|(1+1 +1 /(11 ))( a)/b| (4/3)|a/b|+(1+4/3)|( a)/b| a a 2 since 1 + 1 /(1 1 ) = 1 /(1 1 ) /(1 ) 4/3.2 Henceforth let m = L(A) and n = L(B). In the following theorems we will assume that m n 1. If A and B are the two inputs to our algorithm and L(A) < L(B), we interchange A and B so that m n. Then we compute a sequence of remainders A1 , A2 , . . . , Ar with A1 = A, A2 = B, Ar = 0, Ai+2 = Ai Qi Ai+1 , Qi a Gaussian integer for 1 i r 2, and |Ai+1 | < |Ai | for i 2, so that Ar1 is a gcd of A and B. From |Ai+1 | < |Ai | it follows easily that L(Ai ) L(Ai+1 ) 1. Theorem 3. Let A and B be Gaussian integers with L(A) = m, L(B) = n, m n 1, d 6, = max(m n, 0) and k = n ( + d). Let A = {A/2k }, B = {B/2k }. Then |re(A/B) re(A/B)| < 3.06 2d+4 and |im(A/B) im(A/B)| < 3.06 2d+4 . Proof. Let A = A/2k , B = B/2k . Let a = re(A conj(B)), a = re(A conj(B)), b b = norm(B), = norm(B). Then a/b = re(A/B) and a/ = re(A/B). By Theorem 1, b since L(B) = n m + 1, | a| < 2mk+2 and | b| < 2nk+1 . Next we will apply a b Theorem 2. Let = | b|/b. |B| 2n1 so |B| 2nk1 . Therefore b 22n2k2 b and | b|/b < 2nk+1 /22n2k2 = 2n+k+3 = 2n+(nd)+3 = 2d+3 2d+3 b 1/8. |a/b| = |re(A/B)| |A/B| = |A|/|B| < 2m+1/2 /2n1 = 2mn+3/2 2+3/2 . So (4/3)|a/b| (4/3)2d+3 2+3/2 = (4/3)2d+9/2 . Since 1/8, 1 + 4/3 1 + 1/6. Therefore |a|/b < 2mk+2 /22n2k2 = 2m2n+k+4 = 2(mn)(nk)+4 2(+d)+4 = a 2d+4 . Also (1 + 4/3)| a|/b (7/6)2d+4 . Therefore, by Theorem 2, |/ a/b| a a b (4/3)2d+9/2 + (7/6)2d+4 = (4 2/3 + 7/6)2d+4 < 3.06 2d+4 . The proof for the imaginary parts is identical. 2 Theorem 4. Assume the hypotheses of Theorem 3 and let d 8. Let Q = A/B, Q = {Q}, R = A BQ . Then |R| 0.956 |B|. Proof. Let Q = A/B. By Theorem 3, |re(Q)re(Q)| < 3.0624 . Also |re(Q)re(Q )| < 1/2. So |re(Q) re(Q )| < 1/2 + 3.06 24 . Likewise |im(Q) im(Q )| < 1/2 + 3.06 24 . Since R = A BQ = A BQ + B(Q Q ) = B(Q Q ), norm(R) < 2(1/2 + 3.06 24 )2 norm(B) < 0.956 norm(B). 2 Theorem 5. Let m n k 1. For L(A) = m, L(B) = n and L(C) = k, where C is any gcd of A B. The computing time of the approximative Euclidean algorithm is dominated by m(m n + 1) + n(n k + 1). Proof. Let A1 = A, A2 = B, Ai = Ai+1 Qi + Ai+2 for 1 i r, where Q1 , . . . , Qr are the quotients computed by the algorithm and A3 , . . . , Ar+2 are the remainders. Let ni = L(Ai ) and i = max(ni ni+1 , 0). Let ai and ai+1 be the approximations to Ai and Ai+1 , respectively. Then L(a1 ) 2i + d and L(ai+1 ) i + d. It follows that the time to compute Qi is dominated by (i + 1)2 and hence by ni (i + 1). The time to then

388

G. E. Collins

multiply Ai+1 by Qi is dominated by ni (i + 1). The time to subtract Ai+1 Qi from Ai is dominated by ni . Therefore the time for all three operations is dominated by ni (i + 1) and thus by ni (ni ni+1 + 1). Taking i = 1, the time to compute Q1 and A3 is dominated by m(m n + 1). The r time to compute the remaining Qi s and the remaining Ri s is dominated by 2 ni (ni r ni+1 + 1) n 2 (ni ni+1 + 1) n(n k + r). But it is clear from Theorem 4 that r is dominated by n k + 1. So the total computing time for the algorithm is dominated by m(m n + 1) + n(n k + 1). 2 We remark that it is not dicult to show that the computing time of Weilerts algorithm is also dominated by m(mn+1)+n(nk +1). Compare this with the time n(m k+1) that was derived in Collins (1974) for the Euclidean algorithm for ordinary integers. 3. Implementation and Performance We have implemented both the (1 + i)-ary algorithm and the approximative Euclidean algorithm in the SACLIB computer algebra system (version 2.1). As background for our subsequent presentation of experimentally observed computing times of both algorithms, we provide some important details about our implementations. We begin with the (1 + i)-ary algorithm. That algorithm rst divides both inputs, A and B, by the largest possible powers of 1 + i. Then it repeats a loop in which, rst, A and B are interchanged, if necessary, so that |A| |B|, approximately. Weilert does not specify exactly how his implementation achieves this. Our implementation does so by rst computing k = L(A) L(B). If k > 1 then |A| > |B| and if k < 1 then |A| < |B|. Otherwise 10-bit approximations of the norms of A and B are used for the comparison. The next computation in the loop is to nd the Gaussian integer unit that minimizes |A B|, if now |A| > |B|. Here again approximation is permissible. In our implementation, we rst determine the quadrant and half-quadrant of both A and B. Quadrants are determined from the signs of the real and imaginary parts, and half-quadrants are then determined by comparing the magnitudes of the real and imaginary parts. If is the unit such that A and B are in opposite quadrants, and if, moreover, A and B are in opposite half-quadrants, then is the desired unit. Otherwise an approximate norm comparison for two competing units is required, again with 10-bit approximations. A B is then divisible by 1 + i, and it is divided by the largest possible power of 1 + i. This is accomplished as follows. If 2k is the largest power of 2 that divides A B then (1+i)2k+1 is the largest power of 1 + i that divides A B since (1 + i)2 = 2i. The division is performed by multiplying by 1 i, using additions and subtractions, then dividing by 2k+1 , using shifting. Thus the (1 + i)-ary method never requires multiplying a multi-precision integer by a single-precision integer as in the approximative Euclidean method. In our implementation of both algorithms, the real and imaginary parts of all Gaussian integers are stored in arrays. The rst word of such an array contains the length of the integer in words, and the second word of the array contains the sign of the integer. Each subsequent word of the array contains 29 bits of the integer, for consistency with other parts of SACLIB. Two features of the implementation of our Euclidean algorithm are important. First, in computing approximate quotients, if 3 + 2d 29 then all required arithmetic involves operands of at most 31 bits and thus can be performed with hardware arithmetic. So we treat this as a special case, without the use of arrays or software arithmetic. Second, many of the approximate nearest quotients are very small.

A Fast Euclidean Algorithm Table 1. Inputs of equal bit lengths n. n 320 1600 3200 6400 16 000 32 000 64 000 (1 + i)-ary (TP) 2.7 22 67 222 1232 4723 18 454 (1 + i)-ary (SACLIB) 2.7 15 45 155 927 3854 15 054 Euclidean 2.1 17 60 233 1408 5742 23 330 Euclidean (add/shift) 1.6 10 32 110 655 2715 11 332

389

If the real or imaginary part of any quotient is less than or equal to 4 in absolute value, we perform the multiplication of the divisor by this integer by addition and/or shifting. We have observed that this accounts for more than 90% of all multiplications, and that this feature decreases the total computing time of the algorithm by about 52%. Table 1 shows the computing times in milliseconds for the (1 + i)-ary algorithm and for the approximative Euclidean algorithm for inputs of various bit lengths n. The column (1 + i)-ary (TP) shows the times reported by Weilert for his TP implementation of the (1 + i)-ary algorithm. Weilert reported the time in Mtus and we have converted them to milliseconds using his conversion factor of 300 Mtus per second. Weilert used an Intel Pentium II running at 400 MHz; our SACLIB programs were run on a SUN4U/400 Ultra-450 operating at 400 MHz. The column (1 + i)-ary (SACLIB) shows the times for our SACLIB implementation of the (1 + i)-ary method. The column Euclidean shows the times for our implementation of the approximative Euclidean algorithm when all multiplications are performed without adding and shifting. The column Euclidean (add/shift) shows the times when adding and shifting are used for integer multipliers less than or equal to 4. The real and imaginary parts of the inputs were generated as random n-bit integers. In our program for the approximative Euclidean algorithm, d is a parameter, which we have set to 8 for our timings. However, we have observed that increasing d to as much as 12 makes very little dierence in its computing time. Also, although we have only proved its validity for d 8, we have not observed any faulty performance for d as small as 3. Perhaps a more careful analysis could prove validity for some value of d less than 8. However, we have also not observed any decrease in computing time when using values of d that are smaller than 8. This table shows that for very large inputs our implementation of the (1 + i)-ary algorithm takes about 80% of the time taken by Weilerts TP implementation, that the use of addition and shifting for multiplications by small integers saves about 52% of the time in the Euclidean algorithm, and that for large integers the Euclidean algorithm, with adding and shifting, takes about 75% of the time taken by our implementation of the (1 + i)-ary algorithm. We have found that when one input is signicantly longer than the other, the Euclidean algorithm gains a somewhat stronger advantage. Table 2 shows the computing times when the bit lengths of the inputs are n and 2n. Here the times for the approximative Euclidean algorithm are about 30% of those for the (1 + i)-ary algorithm. 4. Further Observations It is to be expected that the approximative Euclidean algorithm does not always compute a nearest quotient and least remainder. We wanted to know how frequently the

390

G. E. Collins Table 2. Inputs of unequal lengths. n 320 1600 3200 6400 16 000 32 000 64 000 (1 + i)-ary (SACLIB) 3 48 174 645 4370 18 107 75 565 Euclidean (add/shift) 1 15 53 193 1183 5310 21 869

Table 3. Numbers of deviant quotients and numbers of divisions. d=8 0 3 1 0 2 4 1 3 2 3 640 622 646 634 637 617 630 660 625 624 1 4 3 1 6 8 1 5 3 5 d=7 640 622 646 634 637 617 630 660 625 624 d=6 2 5 2 4 13 8 9 11 8 5 640 622 646 634 637 617 630 660 625 626 d=5 7 14 11 10 9 20 17 16 17 13 640 622 646 634 637 617 632 661 625 624 d=4 29 29 33 23 27 31 29 24 24 26 643 624 652 640 638 620 633 661 628 631 d=3 74 67 64 57 57 75 62 60 59 53 657 646 666 649 647 641 641 676 640 639

computed quotient failed to be nearest, and how this would aect the number of divisions. Rolletschek (1986) proved that if each remainder is minimal (equivalently, each quotient is nearest) then the number of divisions is minimal. The converse is easily seen to be false by example. Let 4 + 4i and 2 + 3i be the two inputs. The least remainder sequence is 4 + 4i, 2 + 3i, 2 + i, 1 with quotients q1 = 1 and q2 = 1 + i. If we change the rst quotient to q1 = 2 we obtain the remainder sequence 4 + 4i, 2 + 3i, 2i, i with q2 = 1 + i. q1 is not nearest since (4 + 4i)/(2 + 3i) = (20 4i)/13, but the number of divisions is three in each case. We generated 10 pairs of Gaussian integers with real and imaginary parts each consisting of 1000 random bits. For each of the pairs we counted the number of divisions in a minimal remainder sequence. Then we constructed a variant of our program for the approximative Euclidean algorithm that also computed a nearest quotient at each division step, and counted the number of times that the approximate Gaussian integer quotient diered from the nearest quotient. We ran this variant program for each of the 10 input pairs, and for values of the parameter d = 8, 7, 6, 5, 4, 3. Table 3 displays, for each value of d, the numbers of deviant approximated quotients and the number of divisions. The numbers of divisions shown for d = 8 and 7 are the same as those for a minimal remainder sequence. Although we have not proved that the algorithm will always terminate with d < 8, it is remarkable that the method tolerates many incorrect quotients with little or no increase in the number of divisions. This variant program also compared the norm of each remainder with the norm of the divisor. For d 4 the norm of the remainder never equaled or exceeded the norm of the divisor. For r = 3 it did so in just two cases. We experimentally determined the average number of iterations of the main loop required by each of the two algorithms, as a function of the number of bits in two inputs

A Fast Euclidean Algorithm Table 4. Small integer quotient probabilities. j 0 1 2 3 4 >4 Percentage 10.7 31.7 30.6 13.4 5.1 8.5

391

Table 5. First quadrant quotient frequencies. 1 5i 4i 3i 2i 1i 0i 0.6 1.5 4.4 13.9 10.4 0 2 0.6 1.1 2.5 6.6 13.9 10.5 3 0.4 0.7 1.3 2.5 4.4 6.6 4 0.2 0.4 0.7 1.1 1.5 2.0 5 0.2 0.2 0.4 0.6 0.6 0.9

of equal lengths. We generated 10 pairs of Gaussian integers whose real and imaginary parts were random 10 000-bit integers, and counted the number of iterations for each of the 10 algorithm executions. For Weilerts algorithm the average was 11 476, or about 1.148 iterations per bit. For the approximative Euclidean algorithm the average was 6316, or about 0.632 iterations per bit, which is 55% as many as for Weilerts algorithm. It is interesting to compare the average for the approximative Euclidean algorithm, presumably the same as the number for the Gaussian integer least remainder algorithm, with the number of iterations for the integer least remainder algorithm. The latter (Knuth, 1997), is approximately 0.405 iterations per bit, so the ratio is 0.632/0.405 = 1.56. Similarly, we might wish to compare the number of iterations for Weilerts algorithm with the number for the binary algorithm for ordinary integers. According to Knuth (1997) the latter is approximately 0.71, and the ratio in this case is 1.149/0.71 = 1.62. Thus the ratio for the two kinds of Gaussian integer gcd algorithms is nearly the same as for the two kinds of integer gcd algorithms, namely about 1.6. Also, the number of iterations per bit is about 1.6 times as large for the Gaussian integer least remainder algorithm as for the integer least remainder algorithm, and the number of iterations for Weilerts Gaussian integer gcd algorithm is also about 1.6 times as large as for the binary gcd algorithm for integers. As noted in Section 3, our approximative Euclidean algorithm capitalizes on the very small size of most quotients, avoiding many multiplications. Table 4 shows the experimentally observed percentage of real and imaginary parts, counted separately, of integer quotients having the value k, k {0, 1, 2, 3, 4}, and k > 4. The table clearly demonstrates the importance of this feature. The rst quadrant associate of a non-zero Guassian integer is the associate that is either real or in the rst quadrant. Applying the approximative Euclidean algorithm to two random Gaussian integers of lengths 10 000, we computed the rst quadrant associate of each quotient that was computed. Table 5 shows the frequency, expressed as a percentage, of each rst quadrant quotient a + bi such that a, b 5.

392

G. E. Collins

Acknowledgements I wish to thank the referees for useful criticisms and suggestions. This work was partially supported by NSF Grant CCR-9712246. References
Caviness, B. F., Collins, G. E. (1976). Algorithms for Gaussian integer arithmetic. In Jenks, R. D. ed., Proceedings of 1976 ACM Symposium on Symbolic and Algebraic Computation (SYMSAC76), pp. 3645. New York, ACM Press. Collins, G. E. (1974). The computing time of the Euclidean algorithm. SIAM J. Comput., 3, 110. Knuth, D. E. (1997). The Art of Computer Programming, 3rd edn, volume 2, Seminumerical Algorithms, Reading, MA, Addison-Wesley. Rolletschek, H. (1986). On the number of divisions of the Euclidean algorithm applied to Gaussian integers. J. Symb. Comput., 2, 261291. Weilert, A. (2000). (1 + i)-ary GCD computation in Z[i] as an analogue to the binary GCD algorithm. J. Symb. Comput., 30, 605617. Weilert, A. (2000). Asymptotically fast GCD computation in Z[i]. In Bosma, W. ed., Proceedings of the Fourth International Number Theory Symposium ANTS IV (Leiden, The Netherlands, July 27, 2000), LNCS 1838, pp. 595613. Berlin, Springer.

Received 10 May 2001 Accepted 9 November 2001

You might also like