What is the typical way of invoking multiple OpenCL devices for multiple compute nodes that uses job schedulers such as SLURM or PBS? Let's say I requested 64 GPUs in total where each computes node is containing 8 GPUs using SLURM or PBS script.
What I want to know is that do I need to use something like MPI to invoke all the GPUs (and having to create contexts for each nodes)? Or would I see all 64 GPUs appear when I use clGetDeviceID()
?