dataframe - How to select rows based on combinations while preserving column names in pandas?

I have a DataFrame df_items and want to create combinations of its rows of size i using itertoolsbinations. Each combination should maintain all columns from the original DataFrame.

Current approach: works but loses column names

from itertools import combinations
combinations = np.array(list(combinations(range(len(df_items)), i)))
selected_items = df_items.values[combinations]

I have a DataFrame df_items and want to create combinations of its rows of size i using itertools.combinations. Each combination should maintain all columns from the original DataFrame.

Current approach: works but loses column names

from itertools import combinations
combinations = np.array(list(combinations(range(len(df_items)), i)))
selected_items = df_items.values[combinations]

Share Improve this question asked Feb 7 at 18:24 A A 1 New contributor A A is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.

A few line of sample code would be easier to visualize. Anyway, check using groupby as this is the shortest way I often use to find the list of all possible combinations. For example my df has cols employee and customer_id, then if I want to find all the combination of those two factors, I just df.groupby(['employee', 'customer_id'])['var'].size() Hope this helps – PTQuoc Commented Feb 7 at 18:27
Please add a minimal reproducible example together with the exact desired output based on the small sample to be provided. – ouroboros1 Commented Feb 7 at 18:30
please provide samples for context – Fred Alisson Commented Feb 7 at 18:30

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

If you want independent DataFrames for each combination of rows, the best is to use iloc in a loop:

for c in combinations(range(len(df_items)), 2):
    print(df_items.iloc[list(c)])

Example output:

Used input:

df_items = pd.DataFrame({'A': range(3),
                         'B': range(3)})

You could also groupby but this will be less efficient:

from itertools import combinations, chain

i = 2

tmp = df_items.iloc[list(chain.from_iterable(combinations(range(len(df_items)), i)))]

tmp.groupby(np.arange(len(tmp))//i)

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

dataframe - How to select rows based on combinations while preserving column names in pandas? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)