I have a dataframe with a column multi-index, df1
, with a datetime index and 2 levels: level 0, called Capitals, has columns A, B, C, and level 1, called Smalls, has columns a, b, c, d, e.
Capitals | A | B | C | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Smalls | a | b | c | d | e | a | b | c | d | e | a | b | c | d | e |
Date | |||||||||||||||
01-01-25 | |||||||||||||||
01-02-25 | |||||||||||||||
01-03-25 | |||||||||||||||
01-04-25 |
I have a dataframe with a column multi-index, df1
, with a datetime index and 2 levels: level 0, called Capitals, has columns A, B, C, and level 1, called Smalls, has columns a, b, c, d, e.
Capitals | A | B | C | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Smalls | a | b | c | d | e | a | b | c | d | e | a | b | c | d | e |
Date | |||||||||||||||
01-01-25 | |||||||||||||||
01-02-25 | |||||||||||||||
01-03-25 | |||||||||||||||
01-04-25 |
I have a second dataframe, df2
, with the same datetime index and three columns, X, Y and Z.
X | Y | Z | |
---|---|---|---|
Date | |||
01-01-25 | |||
01-02-25 | |||
01-03-25 | |||
01-04-25 |
Is there a way to:
i) multiply B of df1 by Z of df2 (Ba * Z, Bb * Z, Bc * Z, Bd * Z Be * Z) and
ii) add the 5 new (Smalls: a, b, c, d ,e) columns to a new Capitals column called D in df1
?
Capitals | A | B | C | D | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Smalls | a | b | c | d | e | a | b | c | d | e | a | b | c | d | e | a | b | c | d | e |
Date | ||||||||||||||||||||
01-01-25 | ||||||||||||||||||||
01-02-25 | ||||||||||||||||||||
01-03-25 | ||||||||||||||||||||
01-04-25 |
The method I'm using first creates an empty multi-index data frame with a similar structure to df1, with the new columns I want to add, which is connected to the original multi-index data frame.
Then it iterates through the level 1 values of B, multiplying by the value of Z in the second df.
# Extract level 1 tickers from df1.columns
smalls = df1.columns.get_level_values(1).unique()
# Create new MultiIndex for the empty columns
new_columns = pd.MultiIndex.from_product(['D', smalls],names=df1.columns.names)
# Create an empty DataFrame with the new columns
empty_df = pd.DataFrame(0, index=df1.index, columns=new_columns)
# Concatenate with the original DataFrame
df1 = pd.concat([df1, empty_df], axis=1)
# Multiply dfs and populate D
for small in smalls:
df1[('D', small)] = df1[('B', small)] / df2['Z']
Is there a more streamlined way to do this, using vectors rather than iterating?
Share Improve this question edited Mar 25 at 1:56 Shaido 28.4k25 gold badges75 silver badges81 bronze badges asked Mar 25 at 1:32 AndysPythonStuffAndysPythonStuff 1671 silver badge10 bronze badges2 Answers
Reset to default 1You can create the new D columns by multiplying the B columns in df1
by the Z column in df2
then concatinating them by concat
. The column name can be renamed using MultiIndex.from_product
.
B_columns = df1.loc[:, 'B']
D_columns = B_columns.mul(df2['Z'], axis=0)
# Rename the top level from 'B' to 'D'
smalls = df1.columns.get_level_values(1).unique()
D_columns.columns = pd.MultiIndex.from_product([['D'], smalls], names=df1.columns.names)
df = pd.concat([df1, D_columns], axis=1)
Using two testing dataframes, df1
:
Capitals A B C
Smalls a b c d e a b c d e a b c d e
Date
01-01-25 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
01-02-25 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
01-03-25 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
01-04-25 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
and df2
:
X Y Z
Date
01-01-25 10 11 12
01-02-25 10 11 12
01-03-25 10 11 12
01-04-25 10 11 12
then above code gives:
Capitals A B C D
Smalls a b c d e a b c d e a b c d e a b c d e
Date
01-01-25 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 12 24 36 48 60
01-02-25 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 12 24 36 48 60
01-03-25 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 12 24 36 48 60
01-04-25 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 12 24 36 48 60
You can use concat
for add D
level to B
and another concat
or DataFrame.join
for append to original:
np.random.seed(2)
mux = pd.MultiIndex.from_product([list('ABC'), list('abcde')])
df1 = pd.DataFrame(np.random.randint(10, size=(4, 15)),
index=pd.date_range('01-01-25','01-04-25'), columns=mux)
print (df1)
A B C
a b c d e a b c d e a b c d e
2025-01-01 8 8 6 2 8 7 2 1 5 4 4 5 7 3 6
2025-01-02 4 3 7 6 1 3 5 8 4 6 3 9 2 0 4
2025-01-03 2 4 1 7 8 2 9 8 7 1 6 8 5 9 9
2025-01-04 9 3 0 0 2 8 8 2 9 6 5 6 6 6 3
df2 = pd.DataFrame(np.random.randint(10, size=(4, 3)),
index=pd.date_range('01-01-25','01-04-25'),
columns=list('XYZ'))
print (df2)
X Y Z
2025-01-01 8 2 1
2025-01-02 4 8 1
2025-01-03 6 9 5
2025-01-04 1 2 4
df3 = pd.concat({'D': df1.loc[:, 'B'].mul(df2['X'], axis=0)}, axis=1)
df = pd.concat([df1, df3], axis=1)
Or:
df3 = pd.concat({'D': df1.loc[:, 'B'].mul(df2['X'], axis=0)}, axis=1)
df = df1.join(df3)
print (df)
A B C D
a b c d e a b c d e a b c d e a b c d e
2025-01-01 8 8 6 2 8 7 2 1 5 4 4 5 7 3 6 7 2 1 5 4
2025-01-02 4 3 7 6 1 3 5 8 4 6 3 9 2 0 4 3 5 8 4 6
2025-01-03 2 4 1 7 8 2 9 8 7 1 6 8 5 9 9 10 45 40 35 5
2025-01-04 9 3 0 0 2 8 8 2 9 6 5 6 6 6 3 32 32 8 36 24