I am working on an Intel Xeon Gold 6338 server running Ubuntu 20.04.6 LTS. I am trying to monitor cache events when running a certain C workload. In particular, I am trying to measure the amount of cache references and cache misses across all cache levels (regardless of whether they are caused by a load or by a store).
My CPU supports the following cache events:
L1-dcache-load-misses [Hardware cache event]
L1-dcache-loads [Hardware cache event]
L1-dcache-stores [Hardware cache event]
L1-icache-load-misses [Hardware cache event]
LLC-load-misses [Hardware cache event]
LLC-loads [Hardware cache event]
LLC-store-misses [Hardware cache event]
LLC-stores [Hardware cache event]
branch-load-misses [Hardware cache event]
branch-loads [Hardware cache event]
dTLB-load-misses [Hardware cache event]
dTLB-loads [Hardware cache event]
dTLB-store-misses [Hardware cache event]
dTLB-stores [Hardware cache event]
iTLB-load-misses [Hardware cache event]
node-load-misses [Hardware cache event]
node-loads [Hardware cache event]
node-store-misses [Hardware cache event]
node-stores [Hardware cache event]
As well as these other cache related events:
cache-misses [Hardware event]
cache-references [Hardware event]
My first question is: what exactly do cache-misses, cache-references measure? Why are they not listed as cache events?
I am using libpfm to measure these cache events. I have integrated it into my code in the following way:
- Background definitions:
#define NUM_EVENTS 9
const char *events[NUM_EVENTS] = {
"L1-dcache-loads",
"L1-dcache-stores",
"L1-dcache-load-misses",
"LLC-loads",
"LLC-stores",
"LLC-load-misses",
"LLC-store-misses",
"cache-references",
"cache-misses"
};
enum event_num {
L1_CACHE_LOADS,
L1_CACHE_STORES,
L1_CACHE_MISSES,
L3_CACHE_LOADS,
L3_CACHE_STORES,
L3_CACHE_LOAD_MISSES,
L3_CACHE_STORE_MISSES,
CACHE_REFS,
CACHE_MISSES
};
- Setup: Right before my actual workload begins to run-
int fds[NUM_EVENTS];
unsigned long counts[NUM_EVENTS];
struct perf_event_attr pe[NUM_EVENTS];
pfm_initialize();
for (i = 0; i < NUM_EVENTS; i++) {
memset(&pe[i], 0, sizeof(struct perf_event_attr));
pe[i].size = sizeof(struct perf_event_attr);
pe[i].type = PERF_TYPE_RAW;
pe[i].disabled = 1;
pe[i].exclude_kernel = 1;
pe[i].exclude_hv = 1;
pfm_perf_encode_arg_t arg;
memset(&arg, 0, sizeof(arg));
arg.attr = &pe[i];
if (pfm_get_os_event_encoding(events[i], PFM_PLM3, PFM_OS_PERF_EVENT, &arg) != PFM_SUCCESS) {
fprintf(stderr, "Error encoding event %s\n", events[i]);
exit(1);
}
fds[i] = perf_event_open(&pe[i], 0, -1, -1, 0);
if (fds[i] == -1) {
perror("perf_event_open failed");
exit(1);
}
}
for (i = 0; i < NUM_EVENTS; i++) ioctl(fds[i], PERF_EVENT_IOC_RESET, 0);
for (i = 0; i < NUM_EVENTS; i++) ioctl(fds[i], PERF_EVENT_IOC_ENABLE, 0);
- Counting and cleanup: After my workload finishes-
for (i = 0; i < NUM_EVENTS; i++) ioctl(fds[i], PERF_EVENT_IOC_DISABLE, 0);
for (i = 0; i < NUM_EVENTS; i++) read(fds[i], &counts[i], sizeof(uint64_t));;
L1_cache_accesses = counts[L1_CACHE_LOADS] + counts[L1_CACHE_STORES];
L1_cache_misses = counts[L1_CACHE_MISSES];
L3_cache_accesses = counts[L3_CACHE_LOADS] + counts[L3_CACHE_STORES];
L3_cache_misses = counts[L3_CACHE_LOAD_MISSES] + counts[L3_CACHE_STORE_MISSES];
cache_accesses = counts[CACHE_REFS];
cache_misses = counts[CACHE_MISSES];
for (i = 0; i < NUM_EVENTS; i++) close(fds[i]);
This code works relatively well (in the sense that it compiles and does not crash ;)), but sometimes the results it generates are weird. In particular, in some cases it reports more L1 cache misses than L1 cache accesses, which makes me believe I did something wrong.
So my questions are:
- What exactly is measured by cache-misses, cache-references and why are they not considered cash events?
- Is it correct to count L1_cache_accesses as L1-dcache-loads+L1-dcache-stores and L1_cache_misses as L1-dcache-load-misses (considering that I do not care about the instruction cache)? If not, what perf events will better reflect them?
- Is it correct to measure L3_cache_accesses as LLC-loads+LLC-stores and L3_cache_misses as LLC-load-misses+LLC-store-misses? If not, what perf events will better reflect them?
Thank you very much for your help!