最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

profiling - How do I profile the inside of a CUDA kernel? - Stack Overflow

programmeradmin3浏览0评论

I have a really big CUDA kernel which does a lot of stuff. Like

_global_ void bigkernel(args)
{
func1();
func2();
func3();
func4();
func5();
....
}

I want to profile each one of those functions and visualize them in a Nsight. When I run this in Nsight, it only shows the bigkernel and not the details of the func1() and the rest.

Right now I use the built-in clock64() to time each of the functions, use a structure to keep track and store.

struct time_stuff
{
    uint64_t start, end,
        func1, func2, func3, func4,...
};

To visualize I use python but I would like to inquire if there is better method?

I can use Nsight Compute and Systems to understand my program and how it affects functions but using clock seems the easiest.

nsys profile --trace=nvtx,cuda --sample=cpu -o cu_trace ./cu_alg /datasets/collisions.txt 15000000 \\s 96 128

发布评论

评论列表(0)

  1. 暂无评论