c++ - OpenMP does not offload when compiled with clang from inside docker image

I have this toy program that I use just to check if OpenMP is working from inside my docker containers. the Dockerfile is the following:

FROM nvidia/cuda:12.6.0-devel-ubuntu24.04

# Set ambient variables
ENV CC=gcc
ENV CXX=g++

# Install dependencies
RUN apt-get update
RUN apt-get install -y build-essential
RUN apt-get install -y gcc-offload-nvptx
RUN apt-get install -y gfortran
RUN apt-get install -y libopenmpi-dev
RUN apt-get install -y openmpi-bin
RUN apt-get install -y cppcheck
RUN apt-get install -y clang-tidy-18
RUN apt-get install -y clang-format-18
RUN apt-get install -y fftw2
RUN apt-get install -y fftw-dev
RUN apt-get install -y pkg-config
RUN apt-get install -y valgrind
RUN apt-get install -y wget
RUN apt-get install -y cmake
RUN apt-get install -y python3
RUN apt-get install -y python3-pip
RUN apt-get install -y git
RUN apt-get install -y libomp-dev
RUN apt-get install -y libc++-18-dev
RUN apt-get install -y libc++abi-18-dev

ENV LD_LIBRARY_PATH="/usr/lib/llvm-18/lib:$LD_LIBRARY_PATH"

# Clean up
RUN apt-get clean
RUN rm -rf /var/lib/apt/lists/*

# Install Miniconda
RUN wget .sh \
  && bash Miniconda3-latest-Linux-x86_64.sh -b

# Sets work directory
WORKDIR /app

# Copy application code
COPY . /app

# Set the PATH to include Miniconda
ENV PATH="/root/miniconda3/bin:$PATH"

# Initialize conda
RUN conda init

# Install Python dependencies
RUN conda install -c conda-forge --yes --file scripts/requirements.txt

So, I use the image provided by NVIDIA. My personal computer has a RTX 4060 which has compute capability 8.9, and building my toy code with both gcc and clang works and runs normally, as expected. Here's some snippets:

void ipsum::computation(lorem &lor) {
    int *lx = lor.x;
    int *ly = lor.y;
    int *lz = z;
    size_t _sz = sz;
    #pragma omp target teams distribute parallel for simd 
    for (size_t i = 0; i < _sz; i++) {
        ly[i] = lx[i] + lz[i];
    }

    std::swap<int*>(lor.x, lor.y);
}

it's a simple code that updates some vector by summing it with another, as I said, it's a toy code.

The problem arises when I push this image to dockerhub and then pull it in another machine. This other machine has a RTX A4500 and a GTX 980 Ti. This machine has CUDA version 12.8 but I have found here that it's not an issue. nvidia-smi says the drivers are 12.8 but the runtime is 12.6 because of the docker image. Now, in this other machine, the code compiles and runs fine with gcc, but when I compile with clang and try to run it, it doesn't run on any GPU at all. When I set OMP_TARGET_OFFLOAD=MANDATORY, this error comes up:

omptarget error: Consult .html for debugging options.
omptarget error: No images found compatible with the installed hardware. Found 1 image(s): (sm_86)
ipsum.cpp:13:5: omptarget fatal error 1: failure of target construct while offloading is mandatory
Aborted (core dumped)

I compiled with these flags in clang:

-Wall -Wextra -fopenmp -fopenmp-targets=nvptx64 -g -stdlib=libc++

This error is pretty strange to me because the RTX A4500 has compute capability 8.6, and GCC offloads to it just fine. What is wrong with clang? I have checked and libomptarget-nvptx-sm_86.bc is in my container. Additionally, omp_get_num_devices() returns zero.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

c++ - OpenMP does not offload when compiled with clang from inside docker image - Stack Overflow

与本文相关的文章

评论列表(0)