I'm trying to calculate a 2d variable z = x + y where x and y are 1d arrays of unequal dimensions (say, x- and y-coordinate points on a spatial grid). I'd like to display the result row-by-row in which the values of x and y are in the first two columns and the corresponding value of z calculated from these x and y values are in the third, something like the following for x = [1, 2] and y = [3, 4, 5]:
x y z
1 3 4
1 4 5
1 5 6
2 3 5
2 4 6
2 5 7
The code below works (using lists here, but will probably need numpy arrays later):
import pandas as pd
x = [1, 2]
y = [3, 4, 5]
col1 = []
col2 = []
z = []
for i in range(len(x)):
for j in range(len(y)):
col1.append(x[i])
col2.append(y[j])
z.append(x[i]+y[j])
df = pd.DataFrame(zip(col1, col2, z), columns=["x", "y", "z"])
print(df)
Just wondering, is there a better way of doing this without using the loop by some combination of meshgrid, indices, flatten, v/hstack, and reshape? The size of x and y will typically be around 100.
I'm trying to calculate a 2d variable z = x + y where x and y are 1d arrays of unequal dimensions (say, x- and y-coordinate points on a spatial grid). I'd like to display the result row-by-row in which the values of x and y are in the first two columns and the corresponding value of z calculated from these x and y values are in the third, something like the following for x = [1, 2] and y = [3, 4, 5]:
x y z
1 3 4
1 4 5
1 5 6
2 3 5
2 4 6
2 5 7
The code below works (using lists here, but will probably need numpy arrays later):
import pandas as pd
x = [1, 2]
y = [3, 4, 5]
col1 = []
col2 = []
z = []
for i in range(len(x)):
for j in range(len(y)):
col1.append(x[i])
col2.append(y[j])
z.append(x[i]+y[j])
df = pd.DataFrame(zip(col1, col2, z), columns=["x", "y", "z"])
print(df)
Just wondering, is there a better way of doing this without using the loop by some combination of meshgrid, indices, flatten, v/hstack, and reshape? The size of x and y will typically be around 100.
Share Improve this question asked Mar 17 at 6:25 Schat17Schat17 333 bronze badges 1- Yes there is, read the docs and try? How to Ask – Julien Commented Mar 17 at 6:31
4 Answers
Reset to default 5Here is one way:
import numpy as np
import pandas as pd
x = np.asarray([1, 2])[:, np.newaxis]
y = np.asarray([3, 4, 5])
x, y = np.broadcast_arrays(x, y)
z = x + y
df = pd.DataFrame(zip(x.ravel(), y.ravel(), z.ravel()), columns=["x", "y", "z"])
print(df)
# x y z
# 0 1 3 4
# 1 1 4 5
# 2 1 5 6
# 3 2 3 5
# 4 2 4 6
# 5 2 5 7
But yes, you can also use meshgrid instead of orthogonal arrays + explicit broadcasting. You can also use NumPy instead of Pandas.
x = np.asarray([1, 2])
y = np.asarray([3, 4, 5])
x, y = np.meshgrid(x, y, indexing='ij')
z = x + y
print(np.stack((x.ravel(), y.ravel(), z.ravel())).T)
# array([[1, 3, 4],
# [1, 4, 5],
# [1, 5, 6],
# [2, 3, 5],
# [2, 4, 6],
# [2, 5, 7]])
Not as efficient as low level numpy broadcasting, but you could use a cross-merge
:
x = [1, 2]
y = [3, 4, 5]
df = (pd.DataFrame({'x': x})
.merge(pd.DataFrame({'y': y}), how='cross')
.eval('z = x+y') # or .assign(z=lambda d: d['x']+d['y'])
)
Alternative with MultiIndex.from_product
if you have many combinations of arrays/lists:
df = (pd.MultiIndex.from_product([x, y], names=['x', 'y'])
.to_frame(index=False)
.eval('z = x+y')
)
# or in pure python
df = (pd.DataFrame(product(x, y), columns=['x', 'y'])
.eval('z = x+y')
)
Output:
x y z
0 1 3 4
1 1 4 5
2 1 5 6
3 2 3 5
4 2 4 6
5 2 5 7
Here's one approach:
df = (
pd.DataFrame(
np.array(np.meshgrid(x, y)).T.reshape(-1, 2),
columns=['x', 'y']
)
.assign(z=lambda df: df.sum(axis=1))
)
Output:
x y z
0 1 3 4
1 1 4 5
2 1 5 6
3 2 3 5
4 2 4 6
5 2 5 7
Explanation / Intermediate
- Use
np.meshgrid
with defaultindexing='xy'
(cartesian). - Pass result to
np.array
+.np.transpose
(.T
) +ndarray.reshape
:
np.array(np.meshgrid(x, y)).T.reshape(-1, 2)
array([[1, 3],
[1, 4],
[1, 5],
[2, 3],
[2, 4],
[2, 5]])
- Use inside
pd.DataFrame
and adddf.sum
onaxis=1
viadf.assign
.
Use einops :
import numpy as np
import pandas as pd
from einops import repeat
# shape (2,)
x = np.array([1, 2])
# shape (3,)
y = np.array([3, 4, 5])
xGrid =repeat(x, 'i -> i j', j = len(y))
'''
[[1 1 1]
[2 2 2]]
'''
yGrid = repeat(y, 'j -> i j', i = len(x))
'''
[[3 4 5]
[3 4 5]]
'''
zGrid = xGrid + yGrid
res = np.stack([xGrid.ravel(),yGrid.ravel(),zGrid.ravel()],axis=1)
print(res)
df = pd.DataFrame(res, columns=['x', 'y', 'z'])
print(df)
'''
x y z
0 1 3 4
1 1 4 5
2 1 5 6
3 2 3 5
4 2 4 6
5 2 5 7
'''