最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

C++ Profiling - Called method from coroutine function has a higher hit count than its caller - Stack Overflow

programmeradmin0浏览0评论

I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals.

I have profiled an application and found that I have some called-into methods that have a higher hit count than their calling parents

I have searched for clone .actor with reference to profiling and found nothing useful.

It is easy to tell that for classic synchronous code elsewhere - children are always <= their parent costs, in comparison to the coroutine code.

What is clone .actor in this context and why do the "children" cost more than their parents in this case? Is there anyway to tell what this operation actually is doing?

For context on how I gathered my profiling data

  1. Get a profiling dump by running
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so.0 CPUPROFILE=./prof.out ./my-program
  1. Run that through Google Perftools (gperftools) to make a callgrind style file
/usr/bin/google-pprof --callgrind "$(realpath ./my-program)"  ./prof.out > ./callgrind.out
  1. Open that dump using kcachegrind

I am profiling some code using the cppgraphgqlgen library - which uses C++20 coroutines extensively in its internals.

I have profiled an application and found that I have some called-into methods that have a higher hit count than their calling parents

I have searched for clone .actor with reference to profiling and found nothing useful.

It is easy to tell that for classic synchronous code elsewhere - children are always <= their parent costs, in comparison to the coroutine code.

What is clone .actor in this context and why do the "children" cost more than their parents in this case? Is there anyway to tell what this operation actually is doing?

For context on how I gathered my profiling data

  1. Get a profiling dump by running
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libprofiler.so.0 CPUPROFILE=./prof.out ./my-program
  1. Run that through Google Perftools (gperftools) to make a callgrind style file
/usr/bin/google-pprof --callgrind "$(realpath ./my-program)"  ./prof.out > ./callgrind.out
  1. Open that dump using kcachegrind
Share Improve this question edited Feb 3 at 22:00 Andrew Lipscomb asked Feb 3 at 21:01 Andrew LipscombAndrew Lipscomb 1,05810 silver badges21 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1
  1. Please (strongly) consider switching to much better and more capable go pprof implementation (github/google/pprof). Sadly, distros continue to ship our old perl implementation, but upcoming 2.17 release already had that pprof implementation amputated. So, hopefully, it will encourage distros some more.

  2. .clone thingy is artifact of optimizations and demangling. Sometimes compilers create optimized copies of certain functions (e.g. constant propagating some things). pprof is supposed to remove this detail from function name. (But sometimes you want to see those details and more, such as template arguments; see --symbolize option for that)

  3. if or when you see mixed up parent/child relations, consider checking your stack trace capturing method. Skipping "first parent" stack frame is a known issue with frame-pointers-based stacktrace capturing. See here: https://github/gperftools/gperftools/wiki/gperftools'-stacktrace-capturing-methods-and-their-issues#frame-pointers

发布评论

评论列表(0)

  1. 暂无评论