c++ - Why does setting a class member not work consistently inside a kernel

In CUDA I have a simple class fpunion5 which has two members: rows and cols. There is a constructor which takes two ints and sets rows and cols to their values. I then have a kernel which calls a function with a value for rows and cols, and that returns an fpunion5 variable. Yes this function is a lot like a constructor, and in fact all the function does is call into the constructor and then returns that variable. Oddly when stepping through the code in the NSight debugger I see something strange: inside the constructor rows and cols for "this" are set correctly to 1, but upon returning from the constructor the variable returned has rows and cols of 0. Then inside the kernel itself, after calling the function and returning a variable, rows and cols are back to 1. This seems like inconsistent behavior. And if I have a second fpuion5 variable in that function, it has rows and cols set correctly. The problem seems to happen only with variables that are being returned. This is a small example:

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

#define mydevice __device__

class FP_union5
{
public:
    // member variables
    int rows;
    int cols;


    // destructor
    mydevice ~FP_union5();

    // default constructors




    // other constructors

    mydevice    inline FP_union5(void) : rows(0), cols(0)
    {
    }

    mydevice FP_union5(int r, int c);

    mydevice FP_union5(const FP_union5& other); // Copy constructor
    mydevice FP_union5& operator=(const FP_union5& other); // Assignment operator



};

mydevice FP_union5::~FP_union5() {}

mydevice FP_union5::FP_union5(int r, int c)
{
    rows = r;
    cols = c;

    if (rows * cols <= 0)
    {
        rows = cols = 0;
        return;
    }

}

mydevice FP_union5::FP_union5(const FP_union5& other) {
    rows = other.rows;
    cols = other.cols;
}

mydevice FP_union5& FP_union5::operator=(const FP_union5& other) {
    if (this != &other) {
        rows = other.rows;
        cols = other.cols;
    }
    return *this;
}

mydevice FP_union5 callfpu5(int i)

{
    FP_union5 myfp2(i, i);
    FP_union5 myfp3(i, i);
 
    return(myfp3);

}

__global__ void addKernel(int j)
{
    int i = threadIdx.x;
    FP_union5 myfp5 = callfpu5(1);

}


int main()
{
    const int arraySize = 5;

    addKernel << <1, arraySize >> > (arraySize);

}

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include <stdio.h>

#define mydevice __device__

class FP_union5
{
public:
    // member variables
    int rows;
    int cols;


    // destructor
    mydevice ~FP_union5();

    // default constructors




    // other constructors

    mydevice    inline FP_union5(void) : rows(0), cols(0)
    {
    }

    mydevice FP_union5(int r, int c);

    mydevice FP_union5(const FP_union5& other); // Copy constructor
    mydevice FP_union5& operator=(const FP_union5& other); // Assignment operator



};

mydevice FP_union5::~FP_union5() {}

mydevice FP_union5::FP_union5(int r, int c)
{
    rows = r;
    cols = c;

    if (rows * cols <= 0)
    {
        rows = cols = 0;
        return;
    }

}

mydevice FP_union5::FP_union5(const FP_union5& other) {
    rows = other.rows;
    cols = other.cols;
}

mydevice FP_union5& FP_union5::operator=(const FP_union5& other) {
    if (this != &other) {
        rows = other.rows;
        cols = other.cols;
    }
    return *this;
}

mydevice FP_union5 callfpu5(int i)

{
    FP_union5 myfp2(i, i);
    FP_union5 myfp3(i, i);
 
    return(myfp3);

}

__global__ void addKernel(int j)
{
    int i = threadIdx.x;
    FP_union5 myfp5 = callfpu5(1);

}


int main()
{
    const int arraySize = 5;

    addKernel << <1, arraySize >> > (arraySize);

}

Share Improve this question asked 22 hours ago Rich Tanenbaum 393 bronze badges

just curious - those i,jvariables in addKernel - what is their purpose? Looks like not used – PiotrNycz Commented 22 hours ago
2 The compiler will get rid of code that has no observable side effects like printing or writing to memory. In theory debug mode (-G) should avoid optimizations, but in my experience it does not avoid all of them. – paleonix Commented 21 hours ago
Also please brush up on your C++: Constructors should not initialize data members in their general code block but in initializers before that code block. You don't need to define any special member functions (destructor, copy constructor, assignment operator) in this minimal reproducible example at all, see the rule of zero. – paleonix Commented 20 hours ago
i and j are for an example for this problem. Also, Copilot suggested adding the copy and assignment operator, but it didn't help. I kept it in to preclude anyone here from suggesting them. I also did initially have initializers but the problem exists either way. I did compile with -G and no optimizations, though I appreciate the compiler may not follow that 100%. But it does not explain why it works OK for a variable that is never used (myfp2) and not for a variable that is returned (myfp3). That seems like the opposite of optimizing it away. – Rich Tanenbaum Commented 14 hours ago
You may be trying to inspect a variable that is out of scope. The debugger will not be able to correctly show variables that are not in scope. Inside the "constructor", i.e. stopped at a breakpoint on this line: return(myfp3); then myfp3 will be in scope. In other settings (e.g. outside that constructor) it is not in scope, and you won't be able to inspect it. And, as indicated in the answer, you should make sure (even in a debug setting) that all desired visibility is enforced by a dependency chain to something visible at global scope. myfp5 is not at global scope. – Robert Crovella Commented 12 hours ago

| Show 5 more comments

1 Answer 1

Sorted by: Reset to default 2

CUDA's nvcc compiler fully supports C++, so all the normal rules apply.
However, like many C++ compilers it will optimize away code (including assignments) that is never executed, or that has no observable effect on the final outcome.

If you disable optimizations this does not mean that all non-reachable, non-executed and no-effect-on-outcome code gets to be preserved.
The CUDA compilation model first compiles to PTX byte-code (similar to Java, or LLVM) and then a further compilation happens to SASS. This later compilation can either be static (if a single target has been given to the compiler) or on the fly using the Just-In-Time compiler.

At this second stage further optimization can be applied.

The best way to ensure stuff is not optimized out is to print it.

#include <cuda.h>
#include <stdio.h> //for printf

__global__ void do_stuff() {
    auto keepme = int(clock64());
    printf("keepme = %i\n", keepme);
}

Because printf has a side effect, nothing used by the print statement (directly or indirectly) can be optimized away.

TL;DR
CUDA/nvcc never optimizes anything away cannot safely be done away with.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

c++ - Why does setting a class member not work consistently inside a kernel - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)