I am trying to downscale an image using a kernel launched through cupy (python). When I use a square image (the same height than width) my kernel works very well, down-scaling the image correctly according to the entered scaling factor.
The problem comes when I use rectangular images, because I have to maintain an aspect ratio. When I have these kind of images, then of execute the kernel, my returned image is completely distorted.
I am sure that the problem comes from the kernel, because I have tested the same resize, with standard functions (opencv) executed on CPU, and, with the same image, the down-scaling works well in rectangular images, the opposite which happens with my kernel. The kernel that does the resize is as follows:
__global__
void nn_resize(const float *src, int srcWidth, int srcHeight,
float *dst, int dstWidth, int dstHeight){
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int total = dstHeight * dstWidth;
if (idx >= total) return;
int x = idx % dstWidth;
int y = idx / dstWidth;
float src_x = (x * ((float)srcWidth / dstWidth));
float src_y = (y * ((float)srcHeight / dstHeight));
int src_x_int = min((int)round(src_x) , srcWidth-1);
int src_y_int = min((int)round(src_y) , srcHeight-1);
dst[y * dstWidth + x] = src[src_y_int * srcWidth + src_x_int];
}
The main python script is as follows
factor=0.4
resize_kernel = load_cuda_kernel('resize.cu', 'nn_resize')
srcImg=cv2.imread('./images/lena.png', cv2.IMREAD_GRAYSCALE).astype(np.float32)
dev_srcImg , _ = resize(factor,srcImg,resize_kernel) #This is the function which launches the kernel
hst_srcImg=cp.asnumpy(dev_srcImg).reshape(newShape) #Here is where I check that the resized (non-square) image is distorted
The function for doing the resizing are as follows:
def resize(factor, srcImg, resize_kernel):
resizedShape = (int(srcImg.shape[0] * factor), int(srcImg.shape[1] * factor))
print(f"Factor={factor}. Resized Shape={resizedShape}")
dev_srcImg = cp.array(srcImg.flatten()) # Convertir imagen a 1D
dev_resizedImg = cp.zeros(resizedShape[0] * resizedShape[1], dtype=cp.float32)
threads_per_block = 1024
blocks_per_grid = (resizedShape[0] * resizedShape[1] + threads_per_block - 1) // threads_per_block
resize_kernel((blocks_per_grid,), (threads_per_block,), (dev_srcImg, srcImg.shape[0], srcImg.shape[1], dev_resizedImg, resizedShape[0], resizedShape[1]))
return dev_resizedImg , dev_resizedImg.size
I hope you can help me because I have been racking my brains on this issue for several days and I have not been able to make a simple resize for rectangular images using the GPU.