最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

amazon ec2 - python Joblib Parallel restricted by cpu pinning on AWSEC2 - Stack Overflow

programmeradmin4浏览0评论

The following code works as expected on my Mac

from joblib import Parallel, delayed

with Parallel(n_jobs=num_workers) as parallel:
    for _ in range(0, dataset_size, batch_size):
        batch = parallel(
            delayed(sampler)()
            for i in range(batch_size)
        )
        save_batch(batch)

and with htop I can see num_workers processes using close to 100% cpu time.

On an AWS EC2 machine running AWS Linux, the same code uses at most 2 CPUs, i.e., with num_workers=8, I get 8 processes, but each only uses about 25% of CPU time.

All of these processes have different PIDs and testing with taskset -cp PID I get pid xxxxx's current affinity list: 0, 8 for all processes, indicating that processes are pinned to CPUs 0 and 8.

I can remove this cpu pinning by defining (thanks ChatGPT!):

def remove_cpu_affinity():
    """ Allow Joblib workers to use all CPU cores (fix AWS CPU pinning). """
    try:
        p = psutil.Process(os.getpid())
        # this should remove all restrictions according to docs, 
        # but I used  p.cpu_affinity(range(16)) on a 16 core machine
        p.cpu_affinity([])  
    except AttributeError:
        pass  # Some systems don’t support cpu_affinity()

and calling this in the snippet above for every process started using this hideous one-liner:

delayed(lambda: (remove_cpu_affinity(), sampler())[-1])()

First Question: Is there a portable way to turn off cpu-pinning system wide on an AWS EC2 machine in a managed service? I'm not provisioning the machine.

Second Question: Is there a more elegant and portable way of removing the restriction in python code than what I have above? I'm removing the restriction with every call, but the context manages processes, so I would have to do that only once, at least in principle.

Optional: Is this a rare problem or is it common to have CPU pinning enabled? I can only speculate on the reasons.

Thanks for any help!

发布评论

评论列表(0)

  1. 暂无评论