Description
Notes: You may discuss the problems with your peers but the submitted work must be your own work. No late assignment will be accepted. Submit a SOFT copy of your assignment to the blackboard *AND* submit a HARD copy of it to the COMP 303 mailboxes or bring it to the FINAL EXAM. This assignment is worth 4% of your total grade.
Problem 1
(10 pts) Assume a multiprocessor environment that implements a sequentially consistent memory model. We have two processors P 1 and P 2.
1 |
=====P1===== |
=====P2====== |
||||
2 |
a1: st 0x1, (0x1000) |
b1: st 0x3, (0x1000) |
||||
3 |
a2: ld $r1, (0x1000) |
b2: ld $r2, (0x1000) |
||||
4 |
a3: st |
0x2, |
(0x1000) |
b3: st |
0x4, |
(0x1000) |
5 |
a4: ld |
$r3, |
(0x1000) |
b4: ld |
$r4, |
(0x1000) |
Assume that the value at address 0x1000 is initialized to zero.
-
What are the possible values for $r2 after both processors complete their execution. List all possible values.
-
After both P1 and P2 have nished executing, we realized that ($r1, $r2, $r3, $r4) = (1, 3, 2, 4). List all the di erent instruction interleavings that produce this result.
1
COMP 303 – Comp Arch ( Due Date: Dec 29th, 2019 3.00 pm): HW4 PROBLEM 2
Problem 2
(12 points) Consider a memory system with the following parameters:
-
Translation Lookaside Bu er has 128 entires and it is 2-way set associative.
-
Page size is 8 KB.
-
The tag bits for TLB is 31 bits.
-
Physical memory is 32GB.
-
What is the virtual address size (in bits)?
-
What is the physical address size (in bits)?
-
How many physical pages are there?
-
What is the size of virtual memory in TB?
-
If the hit rate is 99% and hit time is 1 cycle to TLB, what is the average memory access time to the TBL if its miss penalty is 20 cycles when there is no page fault?
-
Assume that 0.02% of the time, there is a page fault, and the disk latency is 10,000 cycles. Then, what is the average access time for TLB?
Student Name: Page 2 of 4
COMP 303 – Comp Arch ( Due Date: Dec 29th, 2019 3.00 pm): HW4 PROBLEM 3
1
2
3
4
5
Problem 3
(10 points – 2 pts each)
for(i=0; i< N; i++){
for(j=0; j< N; j++){
A[i] = A[i] * B[j];
}
}
Assume that this loop nest is executed on a machine with one word cache blocks and a cache that always replaces the least recently used (LRU) block when space is needed. Assume A and B are very large arrays that they cannot t into the cache. Give concise answers to the following questions and use only the space provided under the question.
-
Explain the cache behaviour for A.
-
Explain the cache behaviour for B.
-
Explain whether the nested loop would bene t from a multi-word cache.
-
Explain whether the nested loop would bene t from a fully-set associative cache.
e)Explain whether the nested loop would bene t from random replacement.
Student Name: Page 3 of 4
COMP 303 – Comp Arch ( Due Date: Dec 29th, 2019 3.00 pm): HW4 PROBLEM 4
Problem 4
(10 points) Assuming a directed-mapped cache with 16 one-word blocks (4 byte blocks).
-
Find the number of tag bits, index bits, and byte o set bits in a 10-bit address.
-
Here is a series of byte address references in base ten: 4, 16, 32, 4, 6, 68, 144, 4. For the cache described above, label each reference in the list as a hit or a miss. Show the nal content of the cache.
-
Valid
tag
data
-
Byte Address
Hit or Miss
4
16
32
4
6
68
144
4
Student Name: Page 4 of 4