In CUDA I have a simple class fpunion5 which has two members: rows and cols. There is a constructor which takes two ints and sets rows and cols to their values. I then have a kernel which calls a function with a value for rows and cols, and that returns an fpunion5 variable. Yes this function is a lot like a constructor, and in fact all the function does is call into the constructor and then returns that variable. Oddly when stepping through the code in the NSight debugger I see something strange: inside the constructor rows and cols for "this" are set correctly to 1, but upon returning from the constructor the variable returned has rows and cols of 0. Then inside the kernel itself, after calling the function and returning a variable, rows and cols are back to 1. This seems like inconsistent behavior. And if I have a second fpuion5 variable in that function, it has rows and cols set correctly. The problem seems to happen only with variables that are being returned. This is a small example:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#define mydevice __device__
class FP_union5
{
public:
// member variables
int rows;
int cols;
// destructor
mydevice ~FP_union5();
// default constructors
// other constructors
mydevice inline FP_union5(void) : rows(0), cols(0)
{
}
mydevice FP_union5(int r, int c);
mydevice FP_union5(const FP_union5& other); // Copy constructor
mydevice FP_union5& operator=(const FP_union5& other); // Assignment operator
};
mydevice FP_union5::~FP_union5() {}
mydevice FP_union5::FP_union5(int r, int c)
{
rows = r;
cols = c;
if (rows * cols <= 0)
{
rows = cols = 0;
return;
}
}
mydevice FP_union5::FP_union5(const FP_union5& other) {
rows = other.rows;
cols = other.cols;
}
mydevice FP_union5& FP_union5::operator=(const FP_union5& other) {
if (this != &other) {
rows = other.rows;
cols = other.cols;
}
return *this;
}
mydevice FP_union5 callfpu5(int i)
{
FP_union5 myfp2(i, i);
FP_union5 myfp3(i, i);
return(myfp3);
}
__global__ void addKernel(int j)
{
int i = threadIdx.x;
FP_union5 myfp5 = callfpu5(1);
}
int main()
{
const int arraySize = 5;
addKernel << <1, arraySize >> > (arraySize);
}
In CUDA I have a simple class fpunion5 which has two members: rows and cols. There is a constructor which takes two ints and sets rows and cols to their values. I then have a kernel which calls a function with a value for rows and cols, and that returns an fpunion5 variable. Yes this function is a lot like a constructor, and in fact all the function does is call into the constructor and then returns that variable. Oddly when stepping through the code in the NSight debugger I see something strange: inside the constructor rows and cols for "this" are set correctly to 1, but upon returning from the constructor the variable returned has rows and cols of 0. Then inside the kernel itself, after calling the function and returning a variable, rows and cols are back to 1. This seems like inconsistent behavior. And if I have a second fpuion5 variable in that function, it has rows and cols set correctly. The problem seems to happen only with variables that are being returned. This is a small example:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
#define mydevice __device__
class FP_union5
{
public:
// member variables
int rows;
int cols;
// destructor
mydevice ~FP_union5();
// default constructors
// other constructors
mydevice inline FP_union5(void) : rows(0), cols(0)
{
}
mydevice FP_union5(int r, int c);
mydevice FP_union5(const FP_union5& other); // Copy constructor
mydevice FP_union5& operator=(const FP_union5& other); // Assignment operator
};
mydevice FP_union5::~FP_union5() {}
mydevice FP_union5::FP_union5(int r, int c)
{
rows = r;
cols = c;
if (rows * cols <= 0)
{
rows = cols = 0;
return;
}
}
mydevice FP_union5::FP_union5(const FP_union5& other) {
rows = other.rows;
cols = other.cols;
}
mydevice FP_union5& FP_union5::operator=(const FP_union5& other) {
if (this != &other) {
rows = other.rows;
cols = other.cols;
}
return *this;
}
mydevice FP_union5 callfpu5(int i)
{
FP_union5 myfp2(i, i);
FP_union5 myfp3(i, i);
return(myfp3);
}
__global__ void addKernel(int j)
{
int i = threadIdx.x;
FP_union5 myfp5 = callfpu5(1);
}
int main()
{
const int arraySize = 5;
addKernel << <1, arraySize >> > (arraySize);
}
Share
Improve this question
asked 22 hours ago
Rich TanenbaumRich Tanenbaum
393 bronze badges
10
|
Show 5 more comments
1 Answer
Reset to default 2CUDA's nvcc compiler fully supports C++, so all the normal rules apply.
However, like many C++ compilers it will optimize away code (including assignments) that is never executed, or that has no observable effect on the final outcome.
If you disable optimizations this does not mean that all non-reachable, non-executed and no-effect-on-outcome code gets to be preserved.
The CUDA compilation model first compiles to PTX byte-code (similar to Java, or LLVM) and then a further compilation happens to SASS.
This later compilation can either be static (if a single target has been given to the compiler) or on the fly using the Just-In-Time compiler.
At this second stage further optimization can be applied.
The best way to ensure stuff is not optimized out is to print it.
#include <cuda.h>
#include <stdio.h> //for printf
__global__ void do_stuff() {
auto keepme = int(clock64());
printf("keepme = %i\n", keepme);
}
Because printf
has a side effect, nothing used by the print statement (directly or indirectly) can be optimized away.
TL;DR
CUDA/nvcc never optimizes anything away cannot safely be done away with.
i,j
variables in addKernel - what is their purpose? Looks like not used – PiotrNycz Commented 22 hours ago-G
) should avoid optimizations, but in my experience it does not avoid all of them. – paleonix Commented 21 hours agoreturn(myfp3);
thenmyfp3
will be in scope. In other settings (e.g. outside that constructor) it is not in scope, and you won't be able to inspect it. And, as indicated in the answer, you should make sure (even in a debug setting) that all desired visibility is enforced by a dependency chain to something visible at global scope.myfp5
is not at global scope. – Robert Crovella Commented 12 hours ago