最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

python - pandas apply multiple columns - Stack Overflow

programmeradmin1浏览0评论

Starting from this dataframe

df = pd.DataFrame(
    np.arange(3*4).reshape((4, 3)),
    index=['a', 'b', 'c', 'd'],
    columns=['A', 'B', 'C']
)
print(df)
   A   B   C
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11

I want to apply two functions to each column to generate two columns for each original column to obtain this shape, with a multiindex column nested below each original column:

    A        B        C     
    x    y   x    y   x    y
a  10  100  11  101  12  102
b  13  103  14  104  15  105
c  16  106  17  107  18  108
d  19  109  20  110  21  111

however, something like this doesn't work

df.apply(lambda series:
    series.transform([lambda x: x+10, lambda x: x+100])
)

and raises ValueError: If using all scalar values, you must pass an index

Note that I do not want to use agg like in this answer, since this is not an aggregation. I also want to avoid referring to column names directly.

Starting from this dataframe

df = pd.DataFrame(
    np.arange(3*4).reshape((4, 3)),
    index=['a', 'b', 'c', 'd'],
    columns=['A', 'B', 'C']
)
print(df)
   A   B   C
a  0   1   2
b  3   4   5
c  6   7   8
d  9  10  11

I want to apply two functions to each column to generate two columns for each original column to obtain this shape, with a multiindex column nested below each original column:

    A        B        C     
    x    y   x    y   x    y
a  10  100  11  101  12  102
b  13  103  14  104  15  105
c  16  106  17  107  18  108
d  19  109  20  110  21  111

however, something like this doesn't work

df.apply(lambda series:
    series.transform([lambda x: x+10, lambda x: x+100])
)

and raises ValueError: If using all scalar values, you must pass an index

Note that I do not want to use agg like in this answer, since this is not an aggregation. I also want to avoid referring to column names directly.

Share Improve this question edited Nov 20, 2024 at 20:10 wjandrea 33.2k10 gold badges69 silver badges98 bronze badges asked Nov 20, 2024 at 19:56 goweongoweon 1,37414 silver badges21 bronze badges
Add a comment  | 

2 Answers 2

Reset to default 4

You just need to use df.transform() and give your functions names.

def x(k):
    return k + 10

def y(k):
    return k + 100

df.transform([x, y])
    A        B        C     
    x    y   x    y   x    y
a  10  100  11  101  12  102
b  13  103  14  104  15  105
c  16  106  17  107  18  108
d  19  109  20  110  21  111

SOLUTION 1

A possible solution, whose steps are:

  • First, it creates two new dataframes: one that adds 10 to each element, and another one that adds 100 to each element.

  • Then, it concatenates these dataframes along the columns using pd.concat with axis=1 and assigns keys ['x', 'y'] to create a hierarchical column index.

  • The method swaplevel is applied to swap the levels of the column MultiIndex, followed by sort_index to sort the columns.

(pd.concat([df + 10, df + 100], axis=1, keys=['x', 'y'])
 .swaplevel(axis=1).sort_index(axis=1))

SOLUTION 2

Another possible solution, whose steps are:

  • It first creates two new dataframes: one where 10 is added to each element (df + 10) and another where 100 is added (df + 100).

  • These two dataframes are combined into a 3D numpy array using stack with axis=2, resulting in an array where the third dimension stacks the two transformations.

  • The array is then reshaped into a two-dimensional array with the same number of rows as df.

  • A new dataframe is created from this reshaped array, with columns assigned a hierarchical index using pd.MultiIndex.from_product.

pd.DataFrame(
    np.stack([df + 10, df + 100], axis=2).reshape(df.shape[0], -1),
    columns=pd.MultiIndex.from_product([df.columns, ['x', 'y']]))

Output:

    A        B        C     
    x    y   x    y   x    y
0  10  100  11  101  12  102
1  13  103  14  104  15  105
2  16  106  17  107  18  108
3  19  109  20  110  21  111
发布评论

评论列表(0)

  1. 暂无评论