How can i use multiprocessing in python for performing millions of comparisons?

My goal is to run a comparison between 2 different states of a rubiks cube class, which on its own is simple. The issue comes when you need to compute something on the order of 900 million comparisons in order to go over all of them. For reference, what we are comparing is if 2 cubestates from 2 different lists is exactly the same.

The solutions I've tried so far are as follows:

Bruteforce comparisons
Pool object using multiprocess (fork of multiprocessing)
Variations of Compiling a list of simpler operations for Pool multiprocess (caused a memory issue)
Setting pool process count, and/or chunksize for pool.imap_unordered

I want desperately to make this work using multiprocessing, and my current code looks like this:

print("Beginning pooling, with a new pool for each start pair. This may take a while...")
final_algs=[]
startcomparisons=0
for pairA in startcube_pairs:
    comparisonlist=[]
    startcomparisons+=1
    print(f"Starting comparison block {startcomparisons}/{len(startcube_pairs)}")
    for pairB in solvedcube_pairs:
        comparisonlist.append([pairA, pairB])
    # now we have a small list of comparisons to make, and we will start comparing what we currently have. If we just put all possible pairs in a list, we encounter a memory problem.
    with Pool() as pool:
        results = pool.imap_unordered(comparecubes_returnsolution_ifexists, comparisonlist, chunksize=4096)
        # imap_unordered means the processes will not necessarily start in order, which is fine since we don't care what order we get algorithms out
        
        for result in results: # this portion existing shows all the subprocesses in the task manager so we will keep it
            if result != None: 
                print(f"Found: [{alg_as_str(result)}] length: {len(result)}")
                final_algs.append(result)
            else: pass
    print(f"Finished comparison block {startcomparisons}/{len(startcube_pairs)}\n\n")

With this version of the code, the python process and its subprocesses only collectively use up 7-9% cpu, with most of the workers from the pool using less than 0.5% of my cpu. My goal is that if I were able to better utilize multiprocessing, I'd be able able to use a target 40-60% of my cpu while running these comparisons.

Is multiprocessing even the correct solution to this problem? If so, can I allow python to allocate more processing power without nuking my system's memory?

I've searched a while using the similar questions dropdown and also searching on google, but please direct me to an answer if I have missed something.

EDIT: Will replace above code with reproducible code soon.

from multiprocess import Pool
import math as m
import random

class basicdata:
    datavalues=[]
    def __init__(self):
        self.datavalues = [[[random.randint(1,7) for i in range(3)] for j in range(3)] for k in range(6)]

dataset_1=[basicdata() for i in range(155000)]
dataset_2=[basicdata() for i in range(155000)]

def isdataequal(args):
    dataset_1=args[0]
    dataset_2=args[1]
    isequal=True
    for i in range(len(dataset_1)):
        for y in range(3):
            for x in range(3):
                if dataset_1[i][y][x] != dataset_2[i][y][x]:
                    if dataset_1[i][y][x] != 7: isequal=False # 7 represents a value who's comparison is unimportant to us

    if isequal: return dataset_1
    else: return None

if __name__ == "__main__":
    final_results=[]
    print("Beginning comparisons")
    for data_1 in dataset_1:
        comparisonlist=[]
        for data_2 in dataset_2:
            comparisonlist.append([data_1.datavalues, data_2.datavalues])

        with Pool() as pool:
            results = pool.imap_unordered(isdataequal, comparisonlist, chunksize=4096)
        
            for result in results:
                if result != None: 
                    final_results.append(result)
                else: pass
        print(f"Finished a comparison block")

    print(final_results)

Should be reproducible code. Interestingly, this cut down version of the program utilizes 99% cpu power on task manager in short bursts, reducing down to the previously stated less than 0.5% usage inbetween. However, the time to complete each comparison is not significantly different.

EDIT 2: responded to some comments. Also, I want to note that all I want out of this project is for the comparisons not to take literal days to compute.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

How can i use multiprocessing in python for performing millions of comparisons? - Stack Overflow

与本文相关的文章

评论列表(0)