最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

graphics - Cache-Friendly Scene Graph Storage for Hierarchical Transforms - Stack Overflow

programmeradmin5浏览0评论

Context

I am optimizing hierarchical transformation updates in a scene graph. My goal is to minimize cache misses during the parent-child transform propagation.

A common object-oriented scene graph suffers from lots of indirect accesses, so I tried a topologically sorted (i.e. the nodes are sorted by non decreasing ParentID) vector representation:


struct Mat4x4 {
    float e00, e01, e02, e03;
    float e10, e11, e12, e13;
    float e20, e21, e22, e23;
    float e30, e31, e32, e33;
};

struct Node {
    Mat4x4 Transform;
    int ParentID;
};

std::vector<Node> nodes;

The topological sort of the nodes vector is necessary to ensure correct order of propagation, i.e. not updating a children transform before its parent; the update process propagates transforms linearly:

for (size_t i = 0; i < nodes.size(); ++i) {
    if (nodes[i].ParentID >= 0) { // the node has a parent
        const Node &parent = nodes[nodes[i].ParentID];
        nodes[i].Transform = ApplyTransform(parent.Transform, nodes[i].Transform);
    }
}

The hot point within this loop seems to be the access to the parent node, and not the actual ApplyTransform computations. This observation comes from running a simple benchmark of the update function on a scene containing ~2000 characters, each represented as a tree of ~50 elements (a total of ~100000 nodes). In that case perf stat gives

stalled-cycles-backend:u         #   48.04% backend cycles idle

and by looking at recorded profiling data, most of the time is spent on movups and movaps operations, indicating frequent cache misses when loading the data for the computations. The benchmark code was compiled using g++ -O3, and is run on a single core.

Question

I was wondering: How can I improve the memory layouts or traversal strategies for better cache efficiency for hierarchical transforms? Are there any specific techniques or patterns used in large scale real-time rendering for solving this problem?

I am not looking for open-ended discussions on scene graphs but rather concrete data structures or patterns that improve memory locality.

发布评论

评论列表(0)

  1. 暂无评论