I'm working on a tool to create inputs for array simulations, where various inputs are iterated over multiple series to create all combinations of inputs. Some values need to iterate coupled together, for a toy example, 'day':['Sun','Mon','Tue','Wed','Thu','Fri','Sat'] and 'class':[False,False,True,False,True,False,False] would be used in pairs, but 'week':[1,2,3,4,5] would be iterated in another dimension. I'm trying to fit all input lists and settings into a simple text format, to then be intrepeted by python and outputing all relevent combinations. If zip() is similar to an inner product in linear algebra, the function I'm trying to make is an outer product.
What I've tried is a dictionary of key:[iterable] pairs, and a list of "dimensions" to iterate through
inputs:
format_array = {
'year':['f2024','s2025'],
'week':[1,2,3,4,5],
'day':['Sun','Mon','Tue','Wed','Thu','Fri','Sat'],
'class':[False,False,True,False,True,False,False],
'course':['SciComp'],
'students':[['tmuzzy','jsmith'],['Alice','Bob','Charlie']]
}
format_dimensions = [1,2,3,3,0,1]
Where all keys that have the same dimension are iterated at the same time, (ie: year and list of students, and class and day) and have the same length, but each dimention is iterated over each combination. To do this I used the itertools.product
function over a range object of the length of each dimension to get all index combinations:
for dim in it.product(*[range(d) for d in dim_sizes])
Then the indexes get the values for each key:[iterable][index] combo to a dictionary and then save that to a list:
out_list = [
{'course': 'SciComp', 'year': 'f2024', 'week': 1, 'day': 'Sun', 'class': False, 'students': ['tmuzzy','jsmith']}
{'course': 'SciComp', 'year': 'f2024', 'week': 1, 'day': 'Mon', 'class': False, 'students': ['tmuzzy','jsmith']}
...
{'course': 'SciComp', 'year': 's2025', 'week': 5, 'day': 'Fri', 'class': False, 'students': ['Alice','Bob','Charlie']}
{'course': 'SciComp', 'year': 's2025', 'week': 5, 'day': 'Sat', 'class': False, 'students': ['Alice','Bob','Charlie']}
]
Which can then be easily called in a different function to create my array simulation:
for day in out_list:
print("In {course} during {year}, in week {week} on {day} is there class: {class} ".format(**day))
This works, but it does not allow multiple sizes of iterable, where the 'students' direction has a different length for every 'year', so it cannot be iterated independently and is passed out as a list This is a reasonably complicated problem, but a great solution would really improve my workflows. I would like suggestions to have a better way to setup the inputs to handle "uneven dimensions", but I can figure out the implementationdetails myself. I can provide the full current implementation if needed.
I'm working on a tool to create inputs for array simulations, where various inputs are iterated over multiple series to create all combinations of inputs. Some values need to iterate coupled together, for a toy example, 'day':['Sun','Mon','Tue','Wed','Thu','Fri','Sat'] and 'class':[False,False,True,False,True,False,False] would be used in pairs, but 'week':[1,2,3,4,5] would be iterated in another dimension. I'm trying to fit all input lists and settings into a simple text format, to then be intrepeted by python and outputing all relevent combinations. If zip() is similar to an inner product in linear algebra, the function I'm trying to make is an outer product.
What I've tried is a dictionary of key:[iterable] pairs, and a list of "dimensions" to iterate through
inputs:
format_array = {
'year':['f2024','s2025'],
'week':[1,2,3,4,5],
'day':['Sun','Mon','Tue','Wed','Thu','Fri','Sat'],
'class':[False,False,True,False,True,False,False],
'course':['SciComp'],
'students':[['tmuzzy','jsmith'],['Alice','Bob','Charlie']]
}
format_dimensions = [1,2,3,3,0,1]
Where all keys that have the same dimension are iterated at the same time, (ie: year and list of students, and class and day) and have the same length, but each dimention is iterated over each combination. To do this I used the itertools.product
function over a range object of the length of each dimension to get all index combinations:
for dim in it.product(*[range(d) for d in dim_sizes])
Then the indexes get the values for each key:[iterable][index] combo to a dictionary and then save that to a list:
out_list = [
{'course': 'SciComp', 'year': 'f2024', 'week': 1, 'day': 'Sun', 'class': False, 'students': ['tmuzzy','jsmith']}
{'course': 'SciComp', 'year': 'f2024', 'week': 1, 'day': 'Mon', 'class': False, 'students': ['tmuzzy','jsmith']}
...
{'course': 'SciComp', 'year': 's2025', 'week': 5, 'day': 'Fri', 'class': False, 'students': ['Alice','Bob','Charlie']}
{'course': 'SciComp', 'year': 's2025', 'week': 5, 'day': 'Sat', 'class': False, 'students': ['Alice','Bob','Charlie']}
]
Which can then be easily called in a different function to create my array simulation:
for day in out_list:
print("In {course} during {year}, in week {week} on {day} is there class: {class} ".format(**day))
This works, but it does not allow multiple sizes of iterable, where the 'students' direction has a different length for every 'year', so it cannot be iterated independently and is passed out as a list This is a reasonably complicated problem, but a great solution would really improve my workflows. I would like suggestions to have a better way to setup the inputs to handle "uneven dimensions", but I can figure out the implementationdetails myself. I can provide the full current implementation if needed.
Share Improve this question edited Feb 5 at 21:56 Tristan Muzzy asked Feb 5 at 21:53 Tristan MuzzyTristan Muzzy 114 bronze badges 2 |2 Answers
Reset to default 0Using pandas
to take advantage of its indexing capabilities.
- Extend lists to the size of the longest
- Make a dataframe with that.
- Iterate expected dimensions
Last dimension was turned into a tuple to allow slicing
import pandas as pd
fa = {
'year':['f2024','s2025'],
'week':[1,2,3,4,5],
'day':['Sun','Mon','Tue','Wed','Thu','Fri','Sat'],
'class':[False,False,True,False,True,False,False],
'course':['SciComp'],
'students':[['tmuzzy','jsmith'],['Alice','Bob','Charlie']]
}
fm = [[0,0,4,4,0, (0, (0,1))],
[1,2,3,3,0, (1, (1,3))]]
m = max([len(v) for v in fa.values()])
for k,v in fa.items():
if len(v) < m:
v.extend([None] * (m - len(v)))
data = pd.DataFrame.from_dict(fa)
print(data)
print(f"{'-' * 80}")
out = []
for d in range(len(fm)):
out.append({})
for i, k in enumerate(fa.keys()):
dim = fm[d][i]
if k == 'students':
out[-1][k]= data[k].iloc[dim[0]][dim[1][0]:dim[1][1]]
else:
out[-1][k]= data[k].iloc[dim]
for s in out:
print("In {course} during {year}, in week {week} on {day} is there class: {class} ({students})".format(**s))
#print(s)
Result
year week day class course students
0 f2024 1.0 Sun False SciComp [tmuzzy, jsmith]
1 s2025 2.0 Mon False None [Alice, Bob, Charlie]
2 None 3.0 Tue True None None
3 None 4.0 Wed False None None
4 None 5.0 Thu True None None
5 None NaN Fri False None None
6 None NaN Sat False None None
--------------------------------------------------------------------------------
In SciComp during f2024, in week 1.0 on Thu is there class: True (['tmuzzy'])
In SciComp during s2025, in week 3.0 on Wed is there class: False (['Bob', 'Charlie'])
After some more work, I figured out most of what I wanted, and Barmar's comment did help. It uses similar functions to what I had before but new input syntax. I came up with a better way to define the "dimensions" and iterate through those before calling itertools.product
to create a cartesian product.The example shows most of the input setup
Here's the docstring I wrote:
'''
Generate list of dictionaries for multidimensional arrays of inputs
INPUTS:
setkey_iterable: dictionary of format outlined below
delim: delimiter to split setting & name (default '.')
print_array: prints all output dicts
print_debug: prints dimension information
OUTPUTS:
out_list: list of dictionaries of for each permutation
[{key0:iteration0,key1:iteration0},
{key0:iteration1,key1:iteration1}]
The values, called "iterables" must be a iterable object (ie: range, list,
tuple, string) or they will be wrapped in a list (ie: 42 => [42])
the cartesian product of the iterable is generated, so the output
list will have values of one item from each list (see example)
keys, called "setkeys" are split on a delimiter (default '.')
rightmost [-1] after split is the "name", which is the key of the output
dicts. name(s) are converted to strings before output.
ie: 1:'abc' => '1':'a' in output iterations
this is for convenience, mixing 1 and '1' is a bad idea.
leftmost [0] after the split is the "dimension" of the cartesian product;
if two setkeys share a dimension, the outputs will be called from the
same index if the iterable. ie 'same.let' and 'same.num'
if there is a middle value [1], that iterable is looped through only when
the dimension's index is equal to that value. ie: when '1' is at
index 0 (value = 'a'), the name 'greek', loops through 'alpha' and 'α'
if there are more than 2 delimiters, [2:-1] is ignored. With the current
implementation, only 1-deep "sub-iterations" are possible, but
more complicated inputs are allowed with the syntax, but code changes
would be needed to make this N-deep
output order is determined by sorting dimension strings, not load order
Input example:
setkey_iterable = {1:'abc','1.0.greek':['alpha','α'],'1.1.greek':['beta'],
'n.N':42, 'R':range(3),
'same.let':['i','j','k'],'same.num':['e1','e2','e3']}
Output example (35 total)
{'1': 'a', 'greek': 'alpha', 'R': 0, 'N': 42, 'let': 'i', 'num': 'e1'}
{'1': 'a', 'greek': 'alpha', 'R': 0, 'N': 42, 'let': 'j', 'num': 'e2'}
...
{'1': 'a', 'greek': 'alpha', 'R': 2, 'N': 42, 'let': 'k', 'num': 'e3'}
{'1': 'a', 'greek': 'α', 'R': 0, 'N': 42, 'let': 'i', 'num': 'e1'}
...
{'1': 'b', 'greek': 'beta', 'R': 2, 'N': 42, 'let': 'k', 'num': 'e3'}
{'1': 'c', 'R': 0, 'N': 42, 'let': 'i', 'num': 'e1'}
Similar to:
https://stackoverflow.com/questions/5228158/cartesian-product-of-a-dictionary-of-lists
'''
If this sounds useful, go ahead and use this code, and cite this StackOverflow page. The link above is a much simpler problem, but Tarrasch's answer was the basis for this implementation.
import itertools as it
def gen_array(setkey_iterable, delim = '.', print_array = False, print_debug = False):
if type(setkey_iterable) == str:
## could use eval() but no no
sys.exit('Convert this to a dict first')
dim_key = {}
if print_debug:
print('input parse:')
print('setting | key | iterable')
in_copy = setkey_iterable.copy() # can't modify during loop, so edit original
for sk, iterable in in_copy.items():
if type(sk) != str: # convert setkey type
setkey_iterable[str(sk)] = setkey_iterable.pop(sk)
sk = str(sk)
## ie {1:['a']} => {'1':['a']}
if not hasattr(iterable, '__iter__'): # convert iterable type
iterable = [iterable] # convert single non iterable to iterable
setkey_iterable[sk] = iterable
## ie 1 => [1]
s_k = str(sk).split(delim) # this is repeated later
key = s_k[-1] # last section is name
if len(s_k) == 1:
setting = s_k # same name and dim
elif len(s_k) == 2:
setting = [s_k[0]] # 1st is dim
else:
setting = [s_k[0], int(s_k[1])] # dim, index be called on
if print_debug:
print(setting, key, iterable)
if setting[0] in dim_key:
dim_key[setting[0]].append((setting, sk))
else:
dim_key[setting[0]] = [(setting, sk)]
dim_iterations = {}
if print_debug:
print()
print('dimensions:')
print('dim | setkey(s)')
print('main_keys | ind_key')
print()
for dim, setkeys in sorted(dim_key.items()):
if print_debug:
print(dim, setkeys)
main_keys = [] # list of non call sk
ind_key = {} # dict of call:sk
for setkey in setkeys:
if len(setkey[0]) == 1: # setting
main_keys.append(setkey[1]) # sk
else:
if not setkey[0][1] in ind_key: # index
ind_key[setkey[0][1]] = []
ind_key[setkey[0][1]].append(setkey[1])
dim_iterations[dim] = []
if print_debug:
print(main_keys, ind_key)
print()
main_iterations = zip(*(setkey_iterable[key] for key in main_keys))
## non single iteration values
for ii, iteration in enumerate(main_iterations): # loop through those
keys = [key.split(delim)[-1] for key in main_keys] # save name
iteration = dict(zip(keys,iteration)) # make name:iteration dict
if ii in ind_key: # if index
single_iterations = zip(*(setkey_iterable[key] for key in ind_key[ii]))
# get single iterations
for jj, single in enumerate(single_iterations):
keys = [key.split(delim)[-1] for key in ind_key[ii]] # save name
sub_iter = dict(zip(keys, single)) # make name:iteration dict for call index
dim_iterations[dim].append({**iteration, **sub_iter})
# merge and save dicts
else:
dim_iterations[dim].append(iteration) # save dict
out_list = []
prod = it.product(*dim_iterations.values()) # cartesian product of each dimension's iterations
## tuple of dict of each dim
if print_array:
print('Output cartesian product')
for dicts in prod: # Merge tuple into single dict
iteration = {}
for dir_dict in dicts:
iteration.update(dir_dict)
if print_array:
print(iteration)
out_list.append(iteration)
return out_list
itertools.product()
from the ones that should be combined withzip()
. Then you can figure out how to apply these functions dynamically. – Barmar Commented Feb 5 at 23:17