最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

dataframe - How to select rows based on combinations while preserving column names in pandas? - Stack Overflow

programmeradmin1浏览0评论

I have a DataFrame df_items and want to create combinations of its rows of size i using itertoolsbinations. Each combination should maintain all columns from the original DataFrame.

Current approach: works but loses column names

from itertools import combinations
combinations = np.array(list(combinations(range(len(df_items)), i)))
selected_items = df_items.values[combinations]

I have a DataFrame df_items and want to create combinations of its rows of size i using itertools.combinations. Each combination should maintain all columns from the original DataFrame.

Current approach: works but loses column names

from itertools import combinations
combinations = np.array(list(combinations(range(len(df_items)), i)))
selected_items = df_items.values[combinations]
Share Improve this question asked Feb 7 at 18:24 A AA A 1 New contributor A A is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 3
  • A few line of sample code would be easier to visualize. Anyway, check using groupby as this is the shortest way I often use to find the list of all possible combinations. For example my df has cols employee and customer_id, then if I want to find all the combination of those two factors, I just df.groupby(['employee', 'customer_id'])['var'].size() Hope this helps – PTQuoc Commented Feb 7 at 18:27
  • Please add a minimal reproducible example together with the exact desired output based on the small sample to be provided. – ouroboros1 Commented Feb 7 at 18:30
  • please provide samples for context – Fred Alisson Commented Feb 7 at 18:30
Add a comment  | 

1 Answer 1

Reset to default 0

If you want independent DataFrames for each combination of rows, the best is to use iloc in a loop:

for c in combinations(range(len(df_items)), 2):
    print(df_items.iloc[list(c)])

Example output:

   A  B
0  0  0
1  1  1
   A  B
0  0  0
2  2  2
   A  B
1  1  1
2  2  2

Used input:

df_items = pd.DataFrame({'A': range(3),
                         'B': range(3)})

You could also groupby but this will be less efficient:

from itertools import combinations, chain

i = 2

tmp = df_items.iloc[list(chain.from_iterable(combinations(range(len(df_items)), i)))]

tmp.groupby(np.arange(len(tmp))//i)
发布评论

评论列表(0)

  1. 暂无评论