Description
Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions. Assume that each processor has a 2 GHz clock frequency.
Assume that, as the program is parallelized to run over multiple cores, the number of arithmetic and load/ store instruc-tions per processor is divided by 0:7 p (where p is the number of processors) but the number of branch instructions per processor remains the same.
-
Find the total execution time for this program on 1, 2, 4, and 8 processors, and show the relative speedup of the 2, 4, and 8 processor result relative to the single processor result.
-
If the CPI of the arithmetic instructions was doubled, what would the impact be on the execution time of the program on 1, 2, 4, or 8 processors?
-
To what should the CPI of load/store instructions be reduced in order for a single processor to match the performance of four processors using the original CPI values?
Solution: (1) Consider the program on one processor, the total cycle count is 1 2:56 109 + 12 1:28 109 + 5
2:56 108 = 1:92 1010. Then consider the total excution time, that is, the cycle count=cycle frequency:
Excution time for one processor is 1:92 1010=(2 109) = 9:6 109
By the same way, the total execution time for this program on 2, 4, and 8 processors are shown in the following table.
-
processor count
arithmetic inst.
L/S inst.
branch inst.
cycles
excution time
speed up rate
1
2.56E9
1.28E9
2.56E8
1.92E10
9.60E9
1.00
2
1.83E9
9.14E8
2.56E8
1.41E10
7.05E9
1.36
4
9.14E8
4.57E8
2.56E8
7.68E9
3.84E9
2.50
8
4.57E8
2.29E8
2.56E8
3.46E9
1.73E9
5.55
Table 1: Excution time and speed-up rate
(2) The answer is shown in following table.
processor count excution time
-
9.84E9
-
7.95E9
-
4.30E9
-
2.47E9
Table 2: Excution after double the CPI of arithmetic instructions
-
Assume the CP IL=S inst: is ruduce to x. Then the excution time (Hint: the performance is measured by the excution time) of one processor is
2:56 109 + 1:28 109x + 2:56 108 5 = 3:84 109 + 1:28 108x With original CPI values, the excution time of four processors is
2:56 109=(0:7 4) + 1:28 109 12=(0:7 4) + 2:56 108 5 = 7:68 109
To minimize the di erence of performance, that is to ruduce the abs di erence of the two result above:
Told
Tnew
Course Name:计算机组成原理 |
Assignment 1 |
易翔 |
|
Problem 2 (1.11) Score: |
. |
The results of the SPEC CPU2006 bzip2 benchmark running on an AMD Barcelona has an instruction count of 2.389E12, an execution time of 750 s, and a reference time of 9650 s.
-
Find the CPI if the clock cycle time is 0.333 ns.
-
Find the SPECratio.
-
Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% without a ecting the CPI.
-
Find the increase in CPU time if the number of instructions of the benchmark is increased by 10% and the CPI is increased by 5%.
-
Find the change in the SPECratio for this change.
-
Suppose that we are developing a new version of the AMD Barcelona processor with a 4 GHz clock rate. We have added some additional instructions to the instruction set in such a way that the number of instructions has been reduced by 15%. The execution time is reduced to 700 s and the new SPECratio is 13.7. Find the new CPI.
-
This CPI value is larger than obtained in 1.11.1 as the clock rate was increased from 3 GHz to 4 GHz. Determine whether the increase in the CPI is similar to that of the clock rate. If they are dissimilar, why?
-
By how much has the CPU time been reduced?
-
For a second benchmark, libquantum, assume an execution time of 960 ns, CPI of 1.61, and clock rate of 3 GHz. If the execution time is reduced by an additional 10% without af ecting to the CPI and with a clock rate of 4 GHz, determine the number of instructions.
-
Determine the clock rate required to give a further 10% reduction in CPU time while maintaining the number of instructions and with the CPI unchanged.
-
Determine the clock rate if the CPI is reduced by 15% and the CPU time by 20% while the number of instructions is unchanged.
Solution: (1) CPU time = Instruction count CPI clock cycle time, then CP I = |
750s |
0:943. |
|||||
2:389 1012 0:333 10 9s |
|||||||
(2) The SPECratio is |
Tref |
= |
9650s |
12:87. |
|||
Tactual |
750s |
(3) From CPU time = Instruction count CPI clock cycle time, hence if number of instructions of the benchmark is increased by 10% then the CPU time is insreased by 10%.
(4) From (3) we can obtain that = 1:1 1:05 = 1:155(Told represents the CPU time before change, and Tnew
represents the CPU time after change). That is, the CPU time is increased by 15.5%.
(5) The change of SPECratio can extract from the change of excution time because:
-
-
-
-
-
-
Tref
Tactualold
SP ECrationew
=
Tactualnew
=
=
1
0:866
SP ECratioold
Tref
Tactualnew
1:155
T
actualold
-
-
-
-
-
That is, the SPECratio is decreased by 13.4%.
(6) CPU time = Instruction count |
CPI |
clock cycle time, then CPI = |
700 4 109 |
12 |
1:38 |
|||
0:85 2:389 10 |
-
Clock rate ratio is 43GHzGhz 1:33. CPI ratio is 10::3894 1:47. So they are dissimilar. The reason is when the number of instructions has been reduced by 15%, the CPU time is decreased from 750s to 700s at the same time, then there is a di erence between clock rate ratio and CPI ratio.
-
700750ss 0:933, so the CPU time has been reduced by 6.7%.
(9) Number of instructions = CPU time |
CP I |
= 960 0:9 4 10 |
9 |
=1:61 |
2:147 10 |
12 |
CP Utime |
-
Only change clock rate to rudece the CPU time. Clock rate = Number of instructions CPI / CPU time. The new clock is 3GHz 01:9 3:33GHz.
-
Change clock rate and CPI to rudece the CPU time. Clock rate = Number of instructions CPI / CPU time. The new clock is 3GHz 0:85 01:8 3:19GHz.
2 / 2