My goal is to run a comparison between 2 different states of a rubiks cube class, which on its own is simple. The issue comes when you need to compute something on the order of 900 million comparisons in order to go over all of them. For reference, what we are comparing is if 2 cubestates from 2 different lists is exactly the same.
The solutions I've tried so far are as follows:
- Bruteforce comparisons
- Pool object using multiprocess (fork of multiprocessing)
- Variations of Compiling a list of simpler operations for Pool multiprocess (caused a memory issue)
- Setting pool process count, and/or chunksize for pool.imap_unordered
I want desperately to make this work using multiprocessing, and my current code looks like this:
print("Beginning pooling, with a new pool for each start pair. This may take a while...")
final_algs=[]
startcomparisons=0
for pairA in startcube_pairs:
comparisonlist=[]
startcomparisons+=1
print(f"Starting comparison block {startcomparisons}/{len(startcube_pairs)}")
for pairB in solvedcube_pairs:
comparisonlist.append([pairA, pairB])
# now we have a small list of comparisons to make, and we will start comparing what we currently have. If we just put all possible pairs in a list, we encounter a memory problem.
with Pool() as pool:
results = pool.imap_unordered(comparecubes_returnsolution_ifexists, comparisonlist, chunksize=4096)
# imap_unordered means the processes will not necessarily start in order, which is fine since we don't care what order we get algorithms out
for result in results: # this portion existing shows all the subprocesses in the task manager so we will keep it
if result != None:
print(f"Found: [{alg_as_str(result)}] length: {len(result)}")
final_algs.append(result)
else: pass
print(f"Finished comparison block {startcomparisons}/{len(startcube_pairs)}\n\n")
With this version of the code, the python process and its subprocesses only collectively use up 7-9% cpu, with most of the workers from the pool using less than 0.5% of my cpu. My goal is that if I were able to better utilize multiprocessing, I'd be able able to use a target 40-60% of my cpu while running these comparisons.
Is multiprocessing even the correct solution to this problem? If so, can I allow python to allocate more processing power without nuking my system's memory?
I've searched a while using the similar questions dropdown and also searching on google, but please direct me to an answer if I have missed something.
EDIT: Will replace above code with reproducible code soon.
from multiprocess import Pool
import math as m
import random
class basicdata:
datavalues=[]
def __init__(self):
self.datavalues = [[[random.randint(1,7) for i in range(3)] for j in range(3)] for k in range(6)]
dataset_1=[basicdata() for i in range(155000)]
dataset_2=[basicdata() for i in range(155000)]
def isdataequal(args):
dataset_1=args[0]
dataset_2=args[1]
isequal=True
for i in range(len(dataset_1)):
for y in range(3):
for x in range(3):
if dataset_1[i][y][x] != dataset_2[i][y][x]:
if dataset_1[i][y][x] != 7: isequal=False # 7 represents a value who's comparison is unimportant to us
if isequal: return dataset_1
else: return None
if __name__ == "__main__":
final_results=[]
print("Beginning comparisons")
for data_1 in dataset_1:
comparisonlist=[]
for data_2 in dataset_2:
comparisonlist.append([data_1.datavalues, data_2.datavalues])
with Pool() as pool:
results = pool.imap_unordered(isdataequal, comparisonlist, chunksize=4096)
for result in results:
if result != None:
final_results.append(result)
else: pass
print(f"Finished a comparison block")
print(final_results)
Should be reproducible code. Interestingly, this cut down version of the program utilizes 99% cpu power on task manager in short bursts, reducing down to the previously stated less than 0.5% usage inbetween. However, the time to complete each comparison is not significantly different.
EDIT 2: responded to some comments. Also, I want to note that all I want out of this project is for the comparisons not to take literal days to compute.