I have a dataframe that I like to add a column of values from array of tuples. The tuple contains the coordinates (position, value). An example:
import pandas as pd
import numpy as np
alpha = [chr(i) for i in range(ord('A'), ord('K')+1)]
dt = pd.date_range(start='2025-1-1', freq='1h', periods = len(alpha))
df = pd.DataFrame ( alpha , index = dt )
df.index.name = 'timestamp'
df.columns = ['item']
c = np.array( [(1, 100), (2, 202), (6, 772)] )
which gives:
timestamp | item |
---|---|
2025-01-01 00:00:00 | A |
2025-01-01 01:00:00 | B |
2025-01-01 02:00:00 | C |
2025-01-01 03:00:00 | D |
2025-01-01 04:00:00 | E |
2025-01-01 05:00:00 | F |
2025-01-01 06:00:00 | G |
2025-01-01 07:00:00 | H |
2025-01-01 08:00:00 | I |
2025-01-01 09:00:00 | J |
2025-01-01 10:00:00 | K |
I have a dataframe that I like to add a column of values from array of tuples. The tuple contains the coordinates (position, value). An example:
import pandas as pd
import numpy as np
alpha = [chr(i) for i in range(ord('A'), ord('K')+1)]
dt = pd.date_range(start='2025-1-1', freq='1h', periods = len(alpha))
df = pd.DataFrame ( alpha , index = dt )
df.index.name = 'timestamp'
df.columns = ['item']
c = np.array( [(1, 100), (2, 202), (6, 772)] )
which gives:
timestamp | item |
---|---|
2025-01-01 00:00:00 | A |
2025-01-01 01:00:00 | B |
2025-01-01 02:00:00 | C |
2025-01-01 03:00:00 | D |
2025-01-01 04:00:00 | E |
2025-01-01 05:00:00 | F |
2025-01-01 06:00:00 | G |
2025-01-01 07:00:00 | H |
2025-01-01 08:00:00 | I |
2025-01-01 09:00:00 | J |
2025-01-01 10:00:00 | K |
I am trying to join column c, in such a way that ROW[1] contains [B and 100].
I have accomplished what I want with the following:
df.reset_index(inplace = True)
df.index.name = 'pos'
for x,y in c:
df.loc[ int(x) , 'price'] = y
df.set_index("timestamp", inplace=True)
This gave me the desired results:
timestamp | item | price |
---|---|---|
2025-01-01 00:00:00 | A | nan |
2025-01-01 01:00:00 | B | 100 |
2025-01-01 02:00:00 | C | 202 |
2025-01-01 03:00:00 | D | nan |
2025-01-01 04:00:00 | E | nan |
2025-01-01 05:00:00 | F | nan |
2025-01-01 06:00:00 | G | 772 |
2025-01-01 07:00:00 | H | nan |
2025-01-01 08:00:00 | I | nan |
2025-01-01 09:00:00 | J | nan |
2025-01-01 10:00:00 | K | nan |
However, the idea of dropping and recreating the index for this feels a bit awkward, especially if I have multiple indexes.
My question, is there a better way that dropping and recreating an index to add a column with missing values, using position ?
Share Improve this question asked Mar 13 at 1:38 MansourMansour 6881 gold badge6 silver badges14 bronze badges2 Answers
Reset to default 3Index.take returns the index of your dataframe based on the position and we can use the first column of your array to get the index.
df.loc[df.index.take(c[:, 0]), 'price'] = c[:, 1]
You can also use a combination of loc and iloc.
df.loc[df.iloc[c[:, 0]].index, 'price'] = c[:, 1]
End result:
item price
2025-01-01 00:00:00 A NaN
2025-01-01 01:00:00 B 100.0
2025-01-01 02:00:00 C 202.0
2025-01-01 03:00:00 D NaN
2025-01-01 04:00:00 E NaN
2025-01-01 05:00:00 F NaN
2025-01-01 06:00:00 G 772.0
2025-01-01 07:00:00 H NaN
2025-01-01 08:00:00 I NaN
2025-01-01 09:00:00 J NaN
2025-01-01 10:00:00 K NaN
You can do it this way by creating a key from hour in dataframe index and join on a dataframe created from the list of tuples.
df.reset_index().assign(key=df.index.hour).merge(pd.DataFrame(c, columns=['key', 'price']), how='left')
Output:
timestamp item key price
0 2025-01-01 00:00:00 A 0 NaN
1 2025-01-01 01:00:00 B 1 100.0
2 2025-01-01 02:00:00 C 2 202.0
3 2025-01-01 03:00:00 D 3 NaN
4 2025-01-01 04:00:00 E 4 NaN
5 2025-01-01 05:00:00 F 5 NaN
6 2025-01-01 06:00:00 G 6 772.0
7 2025-01-01 07:00:00 H 7 NaN
8 2025-01-01 08:00:00 I 8 NaN
9 2025-01-01 09:00:00 J 9 NaN
10 2025-01-01 10:00:00 K 10 NaN
Or even shorted:
df.assign(price=df.index.hour.map(pd.Series(c[:,1], index=c[:,0])))
Output:
item price
timestamp
2025-01-01 00:00:00 A NaN
2025-01-01 01:00:00 B 100.0
2025-01-01 02:00:00 C 202.0
2025-01-01 03:00:00 D NaN
2025-01-01 04:00:00 E NaN
2025-01-01 05:00:00 F NaN
2025-01-01 06:00:00 G 772.0
2025-01-01 07:00:00 H NaN
2025-01-01 08:00:00 I NaN
2025-01-01 09:00:00 J NaN
2025-01-01 10:00:00 K NaN