INF 559 |
This is an individual project. You must run this lab on a 64-bit x86-64 machine. Clarifications and corrections will be posted on the course moodle.
This lab will help you understand the impact that cache memories can have on the performance of your C programs.
The lab consists of two parts. In the first part you will write a small C program (about 200-300 lines) that simulates the behavior of a cache memory. In the second part, you will optimize a small matrix rotate function, with the goal of minimizing the number of cache misses.
Start by downloading the cachelab-handout.tar archive. Copy cachelab-handout.tar to a protected Linux directory in which you plan to do your work. Then give the command
linux> tar xvf cachelab-handout.tar
This will create a directory called cachelab-handout that contains a number of files. You will be modifying two files: csim.c and rotate.c. To compile these files, type:
linux> make clean linux> make
WARNING: Do not let the Windows WinZip program or other unarchivers open up your .tar file (many Web browsers are set to do this automatically). Instead, save the file to your Linux directory and use the Linux tar program to extract the files. In general, for this class you should NEVER use any platform other than Linux to modify your files. Doing so can cause loss of data (and important work!).
The lab has two parts. In Part A you will implement a cache simulator. In Part B you will write a matrix rotate function that is optimized for cache performance.
The traces subdirectory of the handout directory contains a collection of reference trace files that we will use to evaluate the correctness of the cache simulator you write in Part A. The trace files are generated by a Linux program called valgrind. For example, typing
linux> valgrind --log-fd=1 --tool=lackey -v --trace-mem=yes ls -l
on the command line runs the executable program “ls -l”, captures a trace of each of its memory accesses in the order they occur, and prints them on stdout.
Valgrind memory traces have the following form:
I 0400d7d4,8 M 0421c7f0,4 L 04f6b868,8 S 7ff0005c8,8
Each line denotes one or two memory accesses. The format of each line is
[space]operation address,size
The operation field denotes the type of memory access: “I” denotes an instruction load, “L” a data load, “S” a data store, and “M” a data modify (i.e., a data load followed by a data store). There is never a space before each “I”. There is always a space before each “M”, “L”, and “S”. The address field specifies a 64-bit hexadecimal memory address. The size field specifies the number of bytes accessed by the operation.
In Part A you will write a cache simulator in csim.c that takes a valgrind memory trace as input, simulates the hit/miss behavior of a cache memory on this trace, and outputs the total number of hits, misses, and evictions.
We have provided you with the binary executable of a reference
cache simulator, called csim-ref
, that simulates the
behavior of a cache with arbitrary size and associativity on a
valgrind trace file. It uses the LRU (least-recently used)
replacement policy when choosing which cache line to evict.
The reference simulator takes the following command-line arguments:
Usage: ./csim-ref [-hv] -s <s> -E <E> -b <b> -t <tracefile>
The command-line arguments are based on the notation (s, E, and b) from page 597 of the CS:APP2e textbook. For example:
linux> ./csim-ref -s 4 -E 1 -b 4 -t traces/yi.trace hits:4 misses:5 evictions:3
The same example in verbose mode:
linux> ./csim-ref -v -s 4 -E 1 -b 4 -t traces/yi.trace L 10,1 miss M 20,1 miss hit L 22,1 hit S 18,1 hit L 110,1 miss eviction L 210,1 miss eviction M 12,1 miss eviction hit hits:4 misses:5 evictions:3
Your job for Part A is to fill in the csim.c file so that it takes the same command line arguments and produces the identical output as the reference simulator. Notice that this file is almost completely empty. You’ll need to write it from scratch.
malloc
function. Type “man
malloc” for information about this function. printSummary(hit_count, miss_count, eviction_count);
In Part B you will write a rotate function in rotate.c that causes as few cache misses as possible. The rotate function must rotate a 2D array (N x N matrix) 90 degrees clockwise.
For example, given the matrix
[[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12], [13, 14, 15, 16]]
the rotate function must return the matrix:
[[ 13, 9, 5, 1 ], [ 14, 10, 6, 2 ], [ 15, 11, 7, 3 ], [ 16, 12, 8, 4 ]]
To help you get started, we have given you an example rotate function in rotate.c that computes the rotate of N × N matrix A storing the results in N × N matrix B:
char rotate_desc[] = "Simple row-wise scan rotate"; void rotate(int N, int A[N][N], int B[N][N])
The example rotate function is correct, but it is inefficient because the access pattern results in relatively many cache misses.
Your job in Part B is to write a similar function, called
rotate_submit
, that minimizes the number of cache misses
across different sized matrices:
char rotate_submit_desc[] = "Rotate submission"; void rotate_submit(int N, int A[N][N], int B[N][N]);
Do not change the description string
(“Rotate submission
”) for your rotate_submit
function. The autograder searches for this string to determine which
rotate function to evaluate for credit.
int
per rotate function.1long
or by using any bit tricks to store more
than one value to a single variable.This section describes how your work will be evaluated. The full score for this lab is 60 points:
For Part A, we will run your cache simulator using different cache parameters and traces. There are eight test cases, each worth 3 points, except for the last case, which is worth 6 points:
linux> ./csim -s 1 -E 1 -b 1 -t traces/yi2.trace linux> ./csim -s 4 -E 2 -b 4 -t traces/yi.trace linux> ./csim -s 2 -E 1 -b 4 -t traces/dave.trace linux> ./csim -s 2 -E 1 -b 3 -t traces/trans.trace linux> ./csim -s 2 -E 2 -b 3 -t traces/trans.trace linux> ./csim -s 2 -E 4 -b 3 -t traces/trans.trace linux> ./csim -s 5 -E 1 -b 5 -t traces/trans.trace linux> ./csim -s 5 -E 1 -b 5 -t traces/long.trace
You can use the reference simulator csim-ref
to obtain the
correct answer for each of these test cases. During debugging, use
the -v option for a detailed record of each hit and miss.
For each test case, outputting the correct number of cache hits, misses and evictions will give you full credit for that test case. Each of your reported number of hits, misses and evictions is worth 1/3 of the credit for that test case. That is, if a particular test case is worth 3 points, and your simulator outputs the correct number of hits and misses, but reports the wrong number of evictions, then you will earn 2 points.
For Part B, we will evaluate the correctness and performance of your
rotate_submit
function on three different-sized output matrices:
For each matrix size, the performance of your rotate_submit
function is evaluated by using valgrind to extract the address
trace for your function, and then using the reference simulator to
replay this trace on a cache with parameters (s=5, E=1, b=5).
Your performance score for each matrix size scales linearly with the number of misses, m, up to some threshold:
Your code must be correct to receive any performance points for a particular size. Your code only needs to be correct for these three cases and you can optimize it specifically for these three cases. In particular, it is perfectly OK for your function to explicitly check for the input sizes and implement separate code optimized for each case.
There are 7 points for coding style. These will be assigned manually by the course staff. Style guidelines can be found on the course website.
The course staff will inspect your code in Part B for illegal arrays and excessive local variables.
We have provided you with an autograding program, called
test-csim
, that tests the correctness of your cache simulator
on the reference traces. Be sure to compile your simulator before
running the test:
linux> make
linux> ./test-csim
Your simulator Reference simulator
Points (s,E,b) Hits Misses Evicts Hits Misses Evicts
3 (1,1,1) 9 8 6 9 8 6 traces/yi2.trace
3 (4,2,4) 4 5 2 4 5 2 traces/yi.trace
3 (2,1,4) 2 3 1 2 3 1 traces/dave.trace
3 (2,1,3) 167 71 67 167 71 67 traces/trans.trace
3 (2,2,3) 201 37 29 201 37 29 traces/trans.trace
3 (2,4,3) 212 26 10 212 26 10 traces/trans.trace
3 (5,1,5) 231 7 0 231 7 0 traces/trans.trace
6 (5,1,5) 265189 21775 21743 265189 21775 21743 traces/long.trace
27
For each test, it shows the number of points you earned, the cache parameters, the input trace file, and a comparison of the results from your simulator and the reference simulator.
Here are some hints and suggestions for working on Part A:
traces/dave.trace
.#include <getopt.h> #include <stdlib.h> #include <unistd.h>See “man 3 getopt” for details.
We have provided you with an autograding program, called test-rotate.c, that tests the correctness and performance of each of the rotate functions that you have registered with the autograder.
You can register up to 100 versions of the rotate function in your rotate.c file. Each rotate version has the following form:
/* Header comment */ char rotate_simple_desc[] = "A simple rotate"; void rotate_simple(int N, int A[N][N], int B[N][N]) { /* your rotate code here */ }
Register a particular rotate function with the autograder by making a call of the form:
registerRotateFunction(rotate_simple, rotate_simple_desc);
in the registerFunctions routine in rotate.c. At runtime,
the autograder will evaluate each registered rotate function and
print the results. Of course, one of the registered functions must be
the rotate_submit
function that you are submitting for
credit:
registerRotateFunction(rotate_submit, rotate_submit_desc);
See the default rotate.c function for an example of how this works.
The autograder takes the matrix size as input. It uses valgrind to generate a trace of each registered rotate function. It then evaluates each trace by running the reference simulator on a cache with parameters (s=5, E=1, b=5).
For example, to test your registered rotate functions on a 32
× 32 matrix, rebuild test-rotate
, and then run it with the
appropriate value for N:
linux> make
linux> ./test-rotate -N 32
Step 1: Evaluating registered rotate funcs for correctness:
func 0 (Rotate submission): correctness: 1
func 1 (Simple row-wise scan rotate): correctness: 1
func 2 (column-wise scan rotate): correctness: 1
func 3 (using a zig-zag access pattern): correctness: 1
Step 2: Generating memory traces for registered rotate funcs.
Step 3: Evaluating performance of registered rotate funcs (s=5, E=1, b=5)
func 0 (Rotate submission): hits:1766, misses:287, evictions:255
func 1 (Simple row-wise scan rotate): hits:870, misses:1183, evictions:1151
func 2 (column-wise scan rotate): hits:870, misses:1183, evictions:1151
func 3 (using a zig-zag access pattern): hits:1076, misses:977, evictions:945
Summary for official submission (func 0): correctness=1 misses=287
In this example, we have registered four different rotate
functions in rotate.c. The test-rotate
program tests each
of the registered functions, displays the results for each, and
extracts the results for the official submission.
Here are some hints and suggestions for working on Part B.
linux> ./csim-ref -v -s 5 -E 1 -b 5 -t trace.f0 S 68312c,1 miss L 683140,8 miss L 683124,4 hit L 683120,4 hit L 603124,4 miss eviction S 6431a0,4 miss ...
http://csapp.cs.cmu.edu/public/waside/waside-blocking.pdffor more information.
We have provided you with a driver program, called
./driver.py
, that performs a complete evaluation of your
simulator and rotate code. This is the same program your instructor
uses to evaluate your handins. The driver uses test-csim
to evaluate your simulator, and it uses test-rotate to evaluate
your submitted rotate function on the three matrix sizes. Then it
prints a summary of your results and the points you have earned.
To run the driver, type:
linux> ./driver.py
Each time you type make in the cachelab-handout
directory,
the Makefile creates a tarball, called userid-handin.tar,
that contains your current csim.c and rotate.c files.
Submit this tarball via the form below.
IMPORTANT: Do not create the handin tarball on a Windows or Mac machine, and do not handin files in any other archive format, such as .zip, .gzip, or .tgz files.
This document was translated from LATEX by HEVEA.