CS 2506 Spring 2016 MIPS03 --------------------------------------------------------------------------------- 1. Consider two caches. Both are 16 cache lines in size, each cache line is 16 bytes and both start out with all lines marked invalid. The only difference is one is two-way associative and another is direct-mapped. Assume DRAM uses 16 bit addresses. a) [8 points] Find a shortest possible reference stream of addresses where the two-way associative cache would get a hit and the direct-mapped cache would get no hits. Provide addresses of the reference stream in hex. Answer: There are 16 cache lines in size. There are 8 sets in a 2-way cache; 16 sets in a direct-mapped cache. Thus, the 16 bit address is divided as follows. 2-way (8 set) : tag 9 bit, set index 3 bit, block offset 4 bit direct (16 set): tag 8 bit, set index 4 bit, block offset 4 bit We should find a case where two addresses are mapped to a single set in both 2-way and direct-mapped cache. As 2-way associative cache can hold 2 lines in a set, the reference stream can result in a cache hit in 2-way cache, but no hit in direct-mapped cache. An example includes: 2-way direct A: 0x00 000000000 000 0000 miss (set 0, way 0) 00000000 0000 0000 miss (set 0) B: 0x10 000000010 000 0000 miss (set 0, way 1) 00000001 0000 0000 miss (set 0) A: 0x00 000000000 000 0000 hit (set 0, way 0) 00000000 0000 0000 miss (set 0) b) [8 points] Find a shortest possible reference stream of addresses where the direct-mapped cache would get a hit and the two-way associative cache would get no hits. Provide addresses of the reference stream in hex. Answer: There are 16 cache lines in size. There are 8 sets in a 2-way cache; 16 sets in a direct-mapped cache. Thus, the 16 bit address is divided as follows. 2-way (8 set) : tag 9 bit, set index 3 bit, block offset 4 bit direct (16 set): tag 8 bit, set index 4 bit, block offset 4 bit We should find a case where a sequence of addresses are mapped to a single set in a 2-way cache so that there will be a cache miss; while they are mapped to different sets in a direct-mapped cache, resulting in a cache hit. An example includes: 2-way direct A: 0x00 000000000 000 0000 miss (set 0, way 0) 00000000 0000 0000 miss (set 0) B: 0x08 000000001 000 0000 miss (set 0, way 1) 00000000 1000 0000 miss (set 8) C: 0x18 000000011 000 0000 miss (set 0, way 0) 00000001 1000 0000 miss (set 8) A: 0x00 000000000 000 0000 miss (set 0, way 1) 00000000 0000 0000 hit (set 0) 2. Consider the following access pattern: A, B, C, A. Assume that A, B, and C are memory addresses each of which are in a different block of memory. Suppose A, B, and C are generated in a uniformly random way and that a LRU (least recently used) replacement algorithm is used. LRU replacement means that if we have to choose a block to replace, we replace the one (in the set) that has not been accessed for the longest time; ties are resolved by making a random choice. Further, assume that any given block has an equal chance of being placed in either “way”. What is the probability that the second instance of A will be a hit if: a) [6 points] The cache has 2 lines and is fully-associative. Answer: A fully-associative cache has a single set, and a set has a two slots. [A, ] → [A,B] → [C,B] → miss always -- 0% b) [6 points] The cache has 4 lines and is fully-associative. Answer: A fully-associative cache has a single set, and a set has a four slots enough to hold all A, B, and C. [A, , , ] → [A,B, , ] → [A,B,C, ] → hit always -- 100% c) [6 points] The cache has 4 lines and is direct-mapped. Answer: A direct-mapped cache has 4 sets (one line per set). With random accesses, there will be 3/4 of chances that A remain in the acess after another access. [A] [A] [A] [ ] →A remain w/ prob 3/4 → … → 3/4 → … -- hit 3/4 * 3/4 = 9/16 … [B] [B] [ ] [ ] [C] d) [6 points] The cache has 4 lines and is two way associative. Answer: A 2-way cache has 2 sets (two lines per set). With random accesses, there will be 1/2 of chances that the next access belongs to the A's set. [A, ] → 1/2 → [A,B] → 1/2 → [C,B] → miss [ , ] [ , ] [ , ] ↘ 1/2 → [C,B] → hit -- 3/4 (75%) [A, ] ↘ 1/2 → [A, ] → 1/2 → [A,C] → hit [B, ] [B, ] ↘ 1/2 → [A, ] → hit [B,C] 3. [8 points] A system has an (unified) L1 cache and an L2 cache. What is the AMAT (in cycles) for the system with the following parameters? L1 hit time = 2 cycles L1 miss rate = 10% L2 hit time = 5 cycles L2 miss rate = 5% DRAM access time = 100 cycles Answer: AMAT = 2 + 0.1 * 5 + 0.1 * 0.05 * 100 = 3 4. [8 points] A system has a separate L1 cache for instructions (L1i), a L1 cache for data (L1d), and an unified L2 cache. Assume both L1i and L1d access time is 2 clock cycles; the L2 access time is 10 clock cycles; and the DRAM access time is 100 clock cycles. The L1i cache miss rate is 4%; the L1d cache miss rate is 10%; and the L2 cache miss rate is 1%. On average, 30% of instructions are loads or stores. The base CPI, assuming ideal cache performance, is 3. What is the actual average CPI for this system? Answer: Miss penalty cycles per instruction i-cache: 0.04 (L1i miss) * 0.99 (L2 hit) * 10 + 0.04 (L1i miss) * 0.01 (L2 miss) * 100 (mem access) = 0.436 d-cache: 0.30 (load/store) * 0.10 (L1d miss) * 0.99 (L2 hit) * 10 + 0.30 (load/store) * 0.10 (L1i miss) * 0.01 (L2 miss) * 100 (mem access) = 0.327 Actual CPI = 3 + 0.436 + 0.327 = 3.763 5. A system uses 32-bit addresses, and a direct-mapped cache. Byte offsets are determined using bits 4-0, set numbers using bits 9-5, and tags using bits 31-10. Suppose the following byte-addressed cache references are recorded. 0, 24, 6, 150, 238, 172, 1024, 24, 150, 3100, 190, 2200 a) [8 points] How many bytes of user data can the cache hold? Express your answers in terms of powers of 2 Answer: # of bytes per line: 2^5 # of lines per set: 1 # of sets: 2^5 Cache size = 2^5 * 1 * 2^5 = 2^10 b) [8 points] How many blocks are replaced? Answer: A directed map has a single line per set. A caceh block replacement happens on a conflicting access to a set with a valid entry. Therefore, we are interested in a set number (5bits in the middle). 0 …00 00000 00000 tag …00 set 00000 miss 24 …00 00000 11000 tag …00 set 00000 hit 6 …00 00000 00110 tag …00 set 00000 hit 150 …00 00100 10110 tag …00 set 00100 miss 238 …00 00111 01110 tag …00 set 00111 miss 172 …00 00101 01100 tag …00 set 00101 miss 1024 …01 00000 00000 tag …01 set 00000 miss replace …00 24 …00 00000 11000 tag …00 set 00000 miss replace …01 150 …00 00100 10110 tag …00 set 00100 hit 3100 …11 00000 11100 tag …11 set 00000 miss replace …00 190 …00 00101 11110 tag …00 set 00101 hit 2200 …10 00100 11000 tag …10 set 00100 miss replace …00 Four blocks are replaced. c) [8 points] What is the hit ratio? Answer: 4/12 = 1/3 6. [20 points] Consider a two-dimensional, N x N array of words A. This array is laid out in memory so that A[0][0] is next to A[0][1], and so on; A[0][N-1] is next to A[1][0], and so on (i.e., in row-major order). Assume that the cache is initially empty, but that A[0][0] maps to the first word of cache line 0. Consider the following matrix transpose algorithm: int tmp = 0; for (int i = 0; i < N; i++) { for (int j = i+1; j < N; j++) { tmp = A[ i ][ j ]; A[ i ][ j ] = A[ j ][ i ]; A[ j ][ i ] = tmp; } } For the following questions, assume that the variable tmp is register allocated (i.e., an access to tmp does not trigger a memory access, and thus has no cache effects); assume that the blocks of the array A are mapped into the cache so that block 0 goes into cache set 0, block 1 goes into cache set 1, and so forth. Suppose the algorithm is executed with a 4 x 4 matrix (4 rows, 4 columns); and suppose we have a direct-mapped cache with 4 sets, indexed 0 to 3, where each cache block encompasses 2 words (i.e., 2 integer values). For each memory access, find 1) its type (read or write), 2) accessed word, 3) whether it result in cache hit or miss, 4) newly brought cache block (on a miss), and 5) replaced cache block (if a valid block exists on a miss) similar to the following format. Note that the answer below is NOT correct for this question. R/W Accessed word Cache hit/miss Newly brought cache block Replaced cache block R A[0][1] miss A[0][0], A[0][1] R A[0][1] hit W A[2][0] miss A[2][0], A[2][1] A[0][0], A[0][1] … … … … … Answer: R/W Access Hit/Miss (Set) New block Replaced block R A[0][1] miss set 0 A[0][0],A[0][1] R A[1][0] miss set 2 A[1][0],A[1][1] W A[0][1] hit set 0 W A[1][0] hit set 2 R A[0][2] miss set 1 A[0][2],A[0][3] R A[2][0] miss set 0 A[2][0],A[2][1] A[0][0],A[0][1] W A[0][2] hit set 1 W A[2][0] hit set 0 R A[0][3] hit set 1 R A[3][0] miss set 2 A[3][0],A[3][1] A[1][0],A[1][1] W A[0][3] hit set 1 W A[3][0] hit set 2 R A[1][2] miss set 3 A[1][2],A[1][3] R A[2][1] hit set 0 W A[1][2] hit set 3 W A[2][1] hit set 0 R A[1][3] hit set 3 R A[3][1] hit set 2 W A[1][3] hit set 3 W A[3][1] hit set 2 R A[2][3] miss set 1 A[2][2],A[2][3] A[0][2],A[0][3] R A[3][2] miss set 3 A[3][2],A[3][3] A[1][2],A[1][3] W A[2][3] hit set 1 W A[3][2] hit set 3