Description
-
Below I have appended the three loop versions of matrix multiply. Please do the following experiments (preferably C or C++) using matrix sizes of 200 x 200, 400 x 400 and 800 x 800:
-
-
Implement Loop I and run for various values (sizes) of loop variable n. Make sure that the compiler optimization is turned off.
-
-
-
Compare the performance time of Loop I to Loop II. Again make sure that the optimization is turned off.
-
Recompile and rerun Loop I with the optimization turned on. Compare the execution time Loop I with the compiler optimization turned on and turned off (you will need to look at different values of the problem size n) for Loop I.
-
-
-
Compare the Loop I time with the optimization turned on with the Loop II time with the compiler optimization turned off.
-
With the compiler optimization option turned off and then on, compile and execute Loop III and compare the times.
-
Find the optimum size for the parameter s – i.e., the block size for your cache.
-
-
Loop I
for i = 1 to n do
for j = 1 to n do
for k = 1 to n do
C[i,j] = C[i,j] + A[i,k] x B[k,j]
endfor
endfor
endfor
Loop II
for i = 1 to n do
for k = 1 to n do
for j = 1 to n do
C[i,j] = C[i,j] + A[i,k] x B[k,j]
endfor
endfor
Loop III
for it = 1 to n by s do
for kt = 1 to n by s do
for jt = 1 to n by s do
for i = it to min(it+s-1,n) do
for k = kt to min(kt+s-1,n) do
for j = jt to min(jt+s-1,n) do
C[i,j] = C[i,j] + A[i,k] x B[k,j]
endfor
endfor
endfor
endfor
endfor
endfor
Note: For your convenience, I have attached a matrix times vector program that should compile and run. Use this working program as the basis for you matrix times matrix exercise.