The following code works as expected on my Mac
from joblib import Parallel, delayed
with Parallel(n_jobs=num_workers) as parallel:
for _ in range(0, dataset_size, batch_size):
batch = parallel(
delayed(sampler)()
for i in range(batch_size)
)
save_batch(batch)
and with htop
I can see num_workers
processes using close to 100% cpu time.
On an AWS EC2 machine running AWS Linux, the same code uses at most 2 CPUs, i.e., with num_workers=8
, I get 8 processes, but each only uses about 25% of CPU time.
All of these processes have different PIDs and testing with taskset -cp PID
I get pid xxxxx's current affinity list: 0, 8
for all processes, indicating that processes are pinned to CPUs 0 and 8.
I can remove this cpu pinning by defining (thanks ChatGPT!):
def remove_cpu_affinity():
""" Allow Joblib workers to use all CPU cores (fix AWS CPU pinning). """
try:
p = psutil.Process(os.getpid())
# this should remove all restrictions according to docs,
# but I used p.cpu_affinity(range(16)) on a 16 core machine
p.cpu_affinity([])
except AttributeError:
pass # Some systems don’t support cpu_affinity()
and calling this in the snippet above for every process started using this hideous one-liner:
delayed(lambda: (remove_cpu_affinity(), sampler())[-1])()
First Question: Is there a portable way to turn off cpu-pinning system wide on an AWS EC2 machine in a managed service? I'm not provisioning the machine.
Second Question: Is there a more elegant and portable way of removing the restriction in python code than what I have above? I'm removing the restriction with every call, but the context manages processes, so I would have to do that only once, at least in principle.
Optional: Is this a rare problem or is it common to have CPU pinning enabled? I can only speculate on the reasons.
Thanks for any help!