You are on page 1of 19

Libya free High study academy / Misrata

Design and Analysis of Algorithms

Report of address:-

The Standard Matrix Multiplicationn & Strassens Algorithm for Matrix Multiplication

preface from student :Basma Mohammed Algubii Lecturer:Dr. Adel Smeda


0

Autumn 2011-2012

content
introduction ..2 Standard Matrix Multiplication....3 Divide-and-Conquer Multiplication Algorithm....5 Strassen's Method 7 Comparison..10 Can we do better?....................................................................................10 Calculat time...11 Conclusion13 REFRENCE..13 Code..14

Introduction
The multiplication of two matrices is one of the most basic operations of linear algebra and scientific computing and has provided an important focus in the search for methods to speed up scientific computation. any speedup in matrix multiplication can improve the performance of a wide variety of numerical algorithms. Much less effort has been given towards the investigation of alternative algorithms whose asymptotic complexity is less than the 0(m3) operations required by the conventional algorithm to multiply m x m matrices. Standard algorithm for multiplying two square matrices of size
2 requires n n multiplications and n n 1 additions

Strassens algorithm reduces the number of required multiplications to if

n is a power of 2 n log

Standard Matrix Multiplication


Given two n x n matrices A = (aij )ni;j=1 and B = (bij )ni ;j=1, their product is defined as follows:

Therefore, to compute the matrix product, we need to compute n2 matrix entries. A naive approach takes n multiplications and n 1 additions for each entry.

IDEA : n x n matrix = 2 x 2 matrix of ( n / 2 ) x ( n / 2 ) submatrices :[ C r =a s= a t=c u =c .e +b. .f +b. .e+d. .f +d. g h h g ]=[ = ].[ A . B ]

8 recursive mults of (n /2 )x(n/2) submatrices 4 add of (n/2)x(n/2) submatrices

analysis of D& C algorithm T(n)= 8T(n/2) + O(n2)

#submatrices

submatrix size

work odding submatrices

( ) No better then the ordinary algorithm

Pseudo code for i:=0..n-1 do for j:=0..n-1 do C[i, j]:=0; for k:=0..n-1 do C[i, j]:=C[i, j] + A[i, k] B[k, j]; end end end return C; Running time is o(n3). Example :A=[ ] , B=[ ] [ C r =a s= a t=c u =c .e +b. .f +b. .e+d. .f +d. g h h g r=5x1+3x3 s=5x7+3x4 t=2x1+1x3 u=2x7+1x4 ]=[ ].[ ]

= A . B r=14 s=47 t=5 u=18

c=[

Divide-and-Conquer Multiplication Algorithm


For simplicity assume that n is a power of 2. To compute the product of matrices, we subdivide each of the matrices into four (n/2) x (n/2) submatrices so that the equation C = A . B takes form:

This matrix equation corresponds to the following four equations on the submatrices:

Divide-and-Conquer Multiplication Pseudocode Matrix-Multiply-Recursive(A;B) 1. n = A:rows 2. let C be a new n x n matrix 3. if n == 1 4. c11 = a11 _ b11 5. else partition each of A;B; C into four submatrices 6. C11 = Matrix-Multiply-Recursive(A11; B11) + Matrix-Multiply-Recursive(A12; B21) 7. C12 = Matrix-Multiply-Recursive(A11; B12) + Matrix-Multiply-Recursive(A12; B22) 8. C21 = Matrix-Multiply-Recursive(A21;B11) + Matrix-Multiply-Recursive(A22; B21)

9. C22 = Matrix-Multiply-Recursive(A21;B12)+ Matrix-Multiply- Recursive(A22; B22) 10. return C

Divide-and-Conquer Multiplication Running Time Using index calculation, we can execute Step 5 in O(1) time (in contrast to O(n2) that would be required if we created submatrices and copied their entries). However, that does not make a difference asymptotically. The running time T(n) for Matrix-Multiply-Recursive on n x n matrices satisfy the recurrence:

By Master Theorem, T(n) = o(n3) which is unfortunately not faster than the naive method Matrix-Multiply. Divide-and-Conquer Multiplication Drawback Each time we split matrix sizes in half, but do not actually reduce the total amount of time. If we assume that naive matrix multiplication takes c _ n3 time. Then computing each product of submatrices takes c .(n/2)3 = c . n3 /8 and we need eight such products, resulting in total time of 8 . c . n3 8 = c . n3 (plus overhead) that is no better than simply doing multiplication in the naive way. In contrast, let us consider Merge-Sort with the running time recurrence T(n) = 2T(n/2)+ O(n): Even if we did naive quadratic (that is, of time c . n2) sorting for each of the subproblems, the total time would be 2 . c . (n/2)2 = c . n2/2 (plus overhead of O(n)) that is faster than naive sorting of the whole problem by factor of 2. This tells us that Divide-and-Conquer sorting may be more efficient than naive sorting (and it is indeed such as Master Theorem proves).
6

Strassen's Method
The idea behind Strassen's method is to reduce the number of multiplications at each recursive call from eight to seven. That makes the recursion tree slightly less bushy. Strassen's method has four steps: 1- Divide the input matrices A and B into submatrices as before, using index calculations in O(1) time. 2- Create ten (n/2) x (n/2) matrices S1; S2; : : : ; S10, each equal the sum or difference of two submatrices created in Step 1. This step takes O(n2) time. 3- Using the submatrices created in Steps 1 and 2, compute seven products P1; P2; : : : ; P7, each of size (n/2) x (n/2). 4- Compute the submatrices of C by adding or subtracting various combinations of the matrices Pi . This step takes O(n2). The running time for Strassen's Method satisfies the recurrence:

Strassen's IDEA Multiply 2x2 matrices with only 7 recursive mults. P1=a . (f-h) r =P5+P4-P2+P6 P2=(a+b) . h s =P1+P2 P3=(c+d) . e t =P3+P4 P4=d . (g-e) u=P5+P1-P3-P7 P5=(a+d) . (e+h) 7 mults , 8 add/subs. P6=(b-d) . (g+h) P7=(a-c) . (e+f) Note :- no reliance commutativty of mult
7

S = P1+P2 = a . (f h) + (a + b) . h = af ah + ah +bh = af + bh Strassens algorithm 1. Divide: Partition A and B into (n/2)(n/2) submatrices. Form P-terms to be multiplied using + and . 2. Conquer: Perform 7 multiplications of (n/2)(n/2) submatrices recursively. 3. Combine: Form C using + and on (n/2)(n/2) submatrices. T(n) = 7 T(n/2) + O(n2) Analysis of Strassen T(n) = 7 T(n/2) + O(n2) nlogba = nlog7 n2.81 CASE 1 T(n) = O(nlog 7). The number 2.81may not seem much smaller than 3, but because the difference is in the exponent, the impact on running time is significant. In fact, Strassens algorithm beats the ordinary algorithm on todays machines for n 30 or so. Best to date (of theoretical interest only): O(n2.376). Pseudo code StrassenMatrixMultiplication(matrix A, matrix B) if size < cross-over point then calculate matrix using naive method else divide matrix A in sub-matrices A11, A12, A21, A22 divide matrix B in sub-matrices B11, B12, B21, B22 P1 is calculated as (A11+A22)*(B11+B22) by recursive StrassenMatrixMultiplication P2 is calculated as (A21+A22)*B11 by recursive StrassenMatrixMultiplication P3 is calculated as A11*(B12-B22) by recursive StrassenMatrixMultiplication P4 is calculated as A22*(B21-B11) by recursive StrassenMatrixMultiplication P5 is calculated as (A11+A12)*B22 by recursive StrassenMatrixMultiplication P6 is calculated as (A21-A11)*B11+B12) by recursive StrassenMatrixMultiplication

P7 is calculated as (A12-A22)*(B21+B22) by recursive StrassenMatrixMultiplicatio C11 is calculated as P1+P4-P5+P7 C12 is calculated as P3+P5 C21 is calculated as P2+P4 C22 is calculated as P1-P2+P3+P6 combine sub-matrices C11, C12, C21, C22 in matrix C and return it as result of function StrassenMatrixMultiplication Example :A=[ ] , B=[ ] r =P5+P4-P2+P6 s =P1+P2 t =P3+P4 u=P5+P1-P3-P7

P1=a . (f-h) P2=(a+b) . h P3=(c+d) . e P4=d . (g-e) P5=(a+d) . (e+h) P6=(b-d) . (g+h) P7=(a-c) . (e+f) P1=5X(7-3) =15 P3=(2+1)x1 =3 P5=(5+1)x(1+4) =30 P7=(5-2)x(1+8) = 24 r= 30+2-32+14 =14 s=15+32 =47 t=3+2 =5 u=30+15-3-24 =18

p2=(5+3)x4 = 32 P4=1x(3-1) =2 P6=(3-1)x(3+4) =14

c=[

Comparison
Multi Standrad Strassen N3 N2.81 Add/Sub N3-N2 6N2.81-6N2 T. Complexity

O( N 3 ) O( N 2.801 )

Can we do better?
Is o(n3) the best we can do? Can we multiply matrices in o(n3) time? Seems like any algorithm to multiply matrices must take (n3) time: Must compute n2 entries. Each entry is the sum of n terms. But with Strassen's method, we can multiply matrices in o(n3) time: Strassen's algorithm runs in O(n lg 7) time. 2.80 < log 7 < 2.81. Hence, runs in O(n2.81) time.

Notes : Strassen's algorithm was the first to beat O(n3) time, but it is not the asymptotically fastest known. A method by Coppersmith and Winograd runs in O(n2.376) time. Practical issues against Strassen's algorithm: Higher constant factor than the obvious O(n3)-time method. Not good for sparse matrices. Not numerically stable: larger errors accumulate than in the naive method. Submatrices consume space, especially if copying. Various researchers have tried to find the crossover point, where Strassen's algorthm runs faster than the nave O(n3)-time method.

10

Theoretical analyses (that ignore caches and hardware pipelines) have produced crossover points as low as n = 8, and practical experiments have found crossover points as low as n = 400.

Calculation time comparison by matrix size (less is better)


11

12

The standard method of matrix multiplication of two n n , matrices takes O(n3) operations. Strassens algorithm is a Divide-and-Conquer algorithm that is asymptotically faster, i.e. O(nlg 7). The usual multiplication of two 2 2 matrices takes 8 multiplications and 4 additions. Strassen showed how two2 2 matrices can be multiplied using only 7 multiplications and 18 additions. For 2 2 matrices, there is no benefit in using the method. To see where this is of help, think about multiplication two (2k) (2k) matrices. For this problem, the scalar multiplications and additions become matrix multiplications and additions. An addition of two matrices requires O(k2) time, a multiplication requires O(k3). Hence, multiplications are much more expensive and it makes sense to trade one multiplication operation for 18 additions.

conclusion

REFRENCE
1- Analysis of Algorithms -Max Alekseyev -University of South Carolina 2345September 8, 2010 Design and Analysis of Algorithms - Dr. Waleed Alsalih- Computer Science Department, College of Computer and Information Sciences, - King Saud University NAVE MATRIX MULTIPLICATION VERSUS STRASSEN ALGORITHM IN MULTI-THREAD ENVIRONMENT Filip Beli, Domagoj everdija, eljko Hocenski Matrix Multiplication - Carola Wenk - Slides courtesy of Charles Leiserson with small - changes by Carola Wenk Matrix Multiplication Dr.Adel smaeda

13

Code of standard matrix multiplication by c++ language


#include<iostream.h> int main() { int a[3][3] , b[3][3] , c[3][3]; int i , j , k; cout<<"Enter Matrix A"; for( i = 0 ; i < 3 ; i++) for( j = 0 ; j < 3 ; j++) cin>>a[i][j]; cout<<"Enter Matrix B"; for( i = 0 ; i < 3 ; i++) for( j = 0 ; j < 3 ; j++) cin>>b[i][j]; for( i = 0 ; i < 3 ; i++) for( j = 0 ; j < 3 ; j++) { c[i][j] = 0; for( k = 0 ;k < 3 ; k++) c[i][j] += a[i][k]*b[k][j]; } cout<<"The resultant matrix is "; for( i = 0 ; i < 3 ; i++) { for( j = 0 ; j < 3 ; j++) cout<<a[i][j]<<" "; cout<<endl; } }

Code of stresson matrix multiplication by c++ language


#include<stdio.h> int main(){ int a[2][2],b[2][2],c[2][2],i,j; int m1,m2,m3,m4,m5,m6,m7; cout<<"Enter the 4 elements of first matrix: "; for(i=0;i<2;i++) for(j=0;j<2;j++) cin>>"%d",&a[i][j]; cout<<"Enter the 4 elements of second matrix: "; for(i=0;i<2;i++) for(j=0;j<2;j++)
14

cin>>"%d",&b[i][j]; cout<<"\nThe first matrix is\n"; for(i=0;i<2;i++){ cout<<"\n"; for(j=0;j<2;j++) cout<<"%d\t",a[i][j]; } cout<<"\nThe second matrix is\n"; for(i=0;i<2;i++){ cout<<"\n"; for(j=0;j<2;j++) cout<<"%d\t",b[i][j]; } m1= (a[0][0] + a[1][1])*(b[0][0]+b[1][1]); m2= (a[1][0]+a[1][1])*b[0][0]; m3= a[0][0]*(b[0][1]-b[1][1]); m4= a[1][1]*(b[1][0]-b[0][0]); m5= (a[0][0]+a[0][1])*b[1][1]; m6= (a[1][0]-a[0][0])*(b[0][0]+b[0][1]); m7= (a[0][1]-a[1][1])*(b[1][0]+b[1][1]); c[0][0]=m1+m4-m5+m7; c[0][1]=m3+m5; c[1][0]=m2+m4; c[1][1]=m1-m2+m3+m6; cout("\nAfter multiplication using \n"; for(i=0;i<2;i++){ cout <<"\n"; for(j=0;j<2;j++) cut<<"%d\t",c[i][j]; } return 0; }

15

Code of stresson matrix multiplication by c++ language


#include<iostream.h> #include<conio.h> #include<alloc.h> int **cremem(int); int **readmat(int); int **strassen(int **,int **,int); int **addmat(int **,int **,int,int); int **collect(int**,int**,int**,int**,int); void dismat(int**,int); int main() { int n,**a,**b,**ree; cout<<"\n\t\tSTRASSEN'S MATRIX MULTIPLICATION USING D AND C"; cout<<"\n\t\t..............................."; cout<<"\nEnter the order of matrix:"; cin>>n; cout<<"\nEnter the first matrix element:"; a=readmat(n); cout<<"\nEnter the second matrix"; b=readmat(n); ree=strassen(a,b,n); cout<<"\nFirst matrix\n"; dismat(a,n); cout<<"\nSecond matrix\n"; dismat(b,n); cout<<"\nResult matrix\n"; dismat(ree,n); free(ree); getch(); } int **readmat(int no) { int r,c,**mat; mat=(int **)cremem(no+(no%2)); for(r=0;r<no;r++) for(c=0;c<no;c++) cin>>mat[r][c]; return mat; } int **cremem(int no) {

16

int i,**a; a=(int**)calloc(no,sizeof(int*)); for(i=0;i<no;i++) a[i]=(int*)calloc(no,sizeof(int)); return a; } int **strassen(int **a,int **b,int no) { int r,c,**res,mv; int **a11,**a12,**a21,**a22; int **b11,**b12,**b21,**b22; int **c11,**c12,**c21,**c22; int **P,**Q,**R,**S; int **T ,**U,**V,**temp; if(no==1) { temp=(int **)cremem(no); temp[0][0]=a[0][0]*b[0][0]; return temp; } no=no+(no%2); mv=no/2; a11=cremem(mv); a12=cremem(mv); a21=cremem(mv); a22=cremem(mv); b11=cremem(mv); b12=cremem(mv); b21=cremem(mv); b22=cremem(mv); for(r=0;r<mv;r++) { for(c=0;c<mv;c++) { a11[r][c]=a[r][c]; a21[r][c]=a[r+mv][c]; a12[r][c]=a[r][c+mv]; a22[r][c]=a[r+mv][c+mv]; b11[r][c]=b[r][c]; b21[r][c]=b[r+mv][c]; b12[r][c]=b[r][c+mv]; b22[r][c]=b[r+mv][c+mv]; } } P=strassen(addmat(a11,a22,mv,1),addmat(b11,b22,mv,1),mv);

17

Q=strassen(addmat(a21,a22,mv,1),b11,mv); R=strassen(a11,addmat(b12,b22,mv,-1),mv); S=strassen(a22,addmat(b21,b11,mv,-1),mv); T=strassen(addmat(a11,a12,mv,1),b22,mv); U=strassen(addmat(a21,a11,mv,-1),addmat(b11,b12,mv,1),mv); V=strassen(addmat(a12,a22,mv,-1),addmat(b21,b22,mv,1),mv); c11=addmat(addmat(addmat(P,S,mv,1),T,mv,-1),V,mv,1); c12=addmat(R,T,mv,1); c21=addmat(Q,S,mv,1);

c22=addmat(addmat(addmat(P,R,mv,1),Q,mv,-1),U,mv,1); res=collect(c11,c12,c21,c22,no); return res; } int **addmat(int**a,int **b,int no,int oper) { int **res,r,c; res=(int **)cremem(no); for(r=0;r<no;r++) for(c=0;c<no;c++) res[r][c]=a[r][c]+oper*b[r][c]; return res; } int **collect(int**c11,int**c12,int**c21,int **c22,int no) { int **res,r,c; res=(int **)cremem(no); for(r=0;r<(no/2);r++) for(c=0;c<(no/2);c++) { res[r][c]=c11[r][c]; res[r+(no/2)][c]=c21[r][c]; res[r][c+(no/2)]=c12[r][c]; res[r+(no/2)][c+(no/2)]=c22[r][c]; } return res; } void dismat(int**d,int no) { int r,c; for(r=0;r<no;r++) for(c=0;c<no;c++) cout<<"\t"<<d[r][c]; cout<<"\n"; }

18

You might also like