I have a DataFrame df_items and want to create combinations of its rows of size i using itertoolsbinations. Each combination should maintain all columns from the original DataFrame.
Current approach: works but loses column names
from itertools import combinations
combinations = np.array(list(combinations(range(len(df_items)), i)))
selected_items = df_items.values[combinations]
I have a DataFrame df_items and want to create combinations of its rows of size i using itertools.combinations. Each combination should maintain all columns from the original DataFrame.
Current approach: works but loses column names
from itertools import combinations
combinations = np.array(list(combinations(range(len(df_items)), i)))
selected_items = df_items.values[combinations]
Share
Improve this question
asked Feb 7 at 18:24
A AA A
1
New contributor
A A is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
3
|
1 Answer
Reset to default 0If you want independent DataFrames for each combination of rows, the best is to use iloc
in a loop:
for c in combinations(range(len(df_items)), 2):
print(df_items.iloc[list(c)])
Example output:
A B
0 0 0
1 1 1
A B
0 0 0
2 2 2
A B
1 1 1
2 2 2
Used input:
df_items = pd.DataFrame({'A': range(3),
'B': range(3)})
You could also groupby
but this will be less efficient:
from itertools import combinations, chain
i = 2
tmp = df_items.iloc[list(chain.from_iterable(combinations(range(len(df_items)), i)))]
tmp.groupby(np.arange(len(tmp))//i)
groupby
as this is the shortest way I often use to find the list of all possible combinations. For example mydf
has colsemployee
andcustomer_id
, then if I want to find all the combination of those two factors, I justdf.groupby(['employee', 'customer_id'])['var'].size()
Hope this helps – PTQuoc Commented Feb 7 at 18:27