最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

linux - Monitoring Cache Events Using libpfm - Stack Overflow

programmeradmin7浏览0评论

I am working on an Intel Xeon Gold 6338 server running Ubuntu 20.04.6 LTS. I am trying to monitor cache events when running a certain C workload. In particular, I am trying to measure the amount of cache references and cache misses across all cache levels (regardless of whether they are caused by a load or by a store).

My CPU supports the following cache events:

  L1-dcache-load-misses                              [Hardware cache event]
  L1-dcache-loads                                    [Hardware cache event]
  L1-dcache-stores                                   [Hardware cache event]
  L1-icache-load-misses                              [Hardware cache event]
  LLC-load-misses                                    [Hardware cache event]
  LLC-loads                                          [Hardware cache event]
  LLC-store-misses                                   [Hardware cache event]
  LLC-stores                                         [Hardware cache event]
  branch-load-misses                                 [Hardware cache event]
  branch-loads                                       [Hardware cache event]
  dTLB-load-misses                                   [Hardware cache event]
  dTLB-loads                                         [Hardware cache event]
  dTLB-store-misses                                  [Hardware cache event]
  dTLB-stores                                        [Hardware cache event]
  iTLB-load-misses                                   [Hardware cache event]
  node-load-misses                                   [Hardware cache event]
  node-loads                                         [Hardware cache event]
  node-store-misses                                  [Hardware cache event]
  node-stores                                        [Hardware cache event]

As well as these other cache related events:

  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]

My first question is: what exactly do cache-misses, cache-references measure? Why are they not listed as cache events?

I am using libpfm to measure these cache events. I have integrated it into my code in the following way:

  1. Background definitions:
    #define NUM_EVENTS 9

    const char *events[NUM_EVENTS] = {
        "L1-dcache-loads",
        "L1-dcache-stores",
        "L1-dcache-load-misses",
        "LLC-loads",
        "LLC-stores",
        "LLC-load-misses",
        "LLC-store-misses",
        "cache-references",
        "cache-misses"
    };

    enum event_num {
        L1_CACHE_LOADS,
        L1_CACHE_STORES,
        L1_CACHE_MISSES,
        L3_CACHE_LOADS,
        L3_CACHE_STORES,
        L3_CACHE_LOAD_MISSES,
        L3_CACHE_STORE_MISSES,
        CACHE_REFS,
        CACHE_MISSES
    };
  1. Setup: Right before my actual workload begins to run-
        int fds[NUM_EVENTS];
        unsigned long counts[NUM_EVENTS];
        struct perf_event_attr pe[NUM_EVENTS];
        pfm_initialize();

        for (i = 0; i < NUM_EVENTS; i++) {
            memset(&pe[i], 0, sizeof(struct perf_event_attr));
            pe[i].size = sizeof(struct perf_event_attr);
            pe[i].type = PERF_TYPE_RAW;
            pe[i].disabled = 1;
            pe[i].exclude_kernel = 1;
            pe[i].exclude_hv = 1;

            pfm_perf_encode_arg_t arg;
            memset(&arg, 0, sizeof(arg));
            arg.attr = &pe[i];

            if (pfm_get_os_event_encoding(events[i], PFM_PLM3, PFM_OS_PERF_EVENT, &arg) != PFM_SUCCESS) {
                fprintf(stderr, "Error encoding event %s\n", events[i]);
                exit(1);
            }

            fds[i] = perf_event_open(&pe[i], 0, -1, -1, 0);
            if (fds[i] == -1) {
                perror("perf_event_open failed");
                exit(1);
            }
        }
        for (i = 0; i < NUM_EVENTS; i++) ioctl(fds[i], PERF_EVENT_IOC_RESET, 0);
        for (i = 0; i < NUM_EVENTS; i++) ioctl(fds[i], PERF_EVENT_IOC_ENABLE, 0);
  1. Counting and cleanup: After my workload finishes-
        for (i = 0; i < NUM_EVENTS; i++) ioctl(fds[i], PERF_EVENT_IOC_DISABLE, 0);
        for (i = 0; i < NUM_EVENTS; i++) read(fds[i], &counts[i], sizeof(uint64_t));;

        L1_cache_accesses = counts[L1_CACHE_LOADS] + counts[L1_CACHE_STORES];
        L1_cache_misses = counts[L1_CACHE_MISSES];
        L3_cache_accesses = counts[L3_CACHE_LOADS] + counts[L3_CACHE_STORES];
        L3_cache_misses = counts[L3_CACHE_LOAD_MISSES] + counts[L3_CACHE_STORE_MISSES];
        cache_accesses = counts[CACHE_REFS];
        cache_misses = counts[CACHE_MISSES];

        for (i = 0; i < NUM_EVENTS; i++) close(fds[i]);

This code works relatively well (in the sense that it compiles and does not crash ;)), but sometimes the results it generates are weird. In particular, in some cases it reports more L1 cache misses than L1 cache accesses, which makes me believe I did something wrong.

So my questions are:

  1. What exactly is measured by cache-misses, cache-references and why are they not considered cash events?
  2. Is it correct to count L1_cache_accesses as L1-dcache-loads+L1-dcache-stores and L1_cache_misses as L1-dcache-load-misses (considering that I do not care about the instruction cache)? If not, what perf events will better reflect them?
  3. Is it correct to measure L3_cache_accesses as LLC-loads+LLC-stores and L3_cache_misses as LLC-load-misses+LLC-store-misses? If not, what perf events will better reflect them?

Thank you very much for your help!

发布评论

评论列表(0)

  1. 暂无评论