I have a kernel with a highly divergent control flow, which is not suitable for parallelizing by mapping into individual SIMD lanes. I would like to know how many independent threads, each with its own program counter, can be run on a single XVE of Intel Xe2 core? What are the compiler flags or pragmas to prevent mapping work items to SIMD16 lanes of a XVE?
I browsed "oneAPI GPU Optimization Guide" and didn't find an answer.