python - How to list a 2d array in a tabular form along with two 1d arrays from which it was generated?

I'm trying to calculate a 2d variable z = x + y where x and y are 1d arrays of unequal dimensions (say, x- and y-coordinate points on a spatial grid). I'd like to display the result row-by-row in which the values of x and y are in the first two columns and the corresponding value of z calculated from these x and y values are in the third, something like the following for x = [1, 2] and y = [3, 4, 5]:

The code below works (using lists here, but will probably need numpy arrays later):

import pandas as pd

x = [1, 2]
y = [3, 4, 5]
col1 = []
col2 = []
z = []
for i in range(len(x)):
    for j in range(len(y)):
        col1.append(x[i])
        col2.append(y[j])
        z.append(x[i]+y[j])

df = pd.DataFrame(zip(col1, col2, z), columns=["x", "y", "z"])
print(df)

Just wondering, is there a better way of doing this without using the loop by some combination of meshgrid, indices, flatten, v/hstack, and reshape? The size of x and y will typically be around 100.

The code below works (using lists here, but will probably need numpy arrays later):

import pandas as pd

x = [1, 2]
y = [3, 4, 5]
col1 = []
col2 = []
z = []
for i in range(len(x)):
    for j in range(len(y)):
        col1.append(x[i])
        col2.append(y[j])
        z.append(x[i]+y[j])

df = pd.DataFrame(zip(col1, col2, z), columns=["x", "y", "z"])
print(df)

Just wondering, is there a better way of doing this without using the loop by some combination of meshgrid, indices, flatten, v/hstack, and reshape? The size of x and y will typically be around 100.

Share Improve this question asked Mar 17 at 6:25 Schat17 333 bronze badges

Yes there is, read the docs and try? How to Ask – Julien Commented Mar 17 at 6:31

Add a comment |

4 Answers 4

Sorted by: Reset to default 5

Here is one way:

import numpy as np
import pandas as pd
x = np.asarray([1, 2])[:, np.newaxis]
y = np.asarray([3, 4, 5])
x, y = np.broadcast_arrays(x, y)
z = x + y
df = pd.DataFrame(zip(x.ravel(), y.ravel(), z.ravel()), columns=["x", "y", "z"])
print(df)
#    x  y  z
# 0  1  3  4
# 1  1  4  5
# 2  1  5  6
# 3  2  3  5
# 4  2  4  6
# 5  2  5  7

But yes, you can also use meshgrid instead of orthogonal arrays + explicit broadcasting. You can also use NumPy instead of Pandas.

x = np.asarray([1, 2])
y = np.asarray([3, 4, 5])
x, y = np.meshgrid(x, y, indexing='ij')
z = x + y
print(np.stack((x.ravel(), y.ravel(), z.ravel())).T)
# array([[1, 3, 4],
#        [1, 4, 5],
#        [1, 5, 6],
#        [2, 3, 5],
#        [2, 4, 6],
#        [2, 5, 7]])

Not as efficient as low level numpy broadcasting, but you could use a cross-merge:

x = [1, 2]
y = [3, 4, 5]

df = (pd.DataFrame({'x': x})
        .merge(pd.DataFrame({'y': y}), how='cross')
        .eval('z = x+y') # or .assign(z=lambda d: d['x']+d['y'])
     )

Alternative with MultiIndex.from_product if you have many combinations of arrays/lists:

df = (pd.MultiIndex.from_product([x, y], names=['x', 'y'])
        .to_frame(index=False)
        .eval('z = x+y')
     )

# or in pure python
df = (pd.DataFrame(product(x, y), columns=['x', 'y'])
        .eval('z = x+y')
     )

Output:

Here's one approach:

df = (
  pd.DataFrame(
      np.array(np.meshgrid(x, y)).T.reshape(-1, 2), 
      columns=['x', 'y']
      )
  .assign(z=lambda df: df.sum(axis=1))
  )

Output:

Explanation / Intermediate

Use np.meshgrid with default indexing='xy' (cartesian).
Pass result to np.array + .np.transpose (.T) + ndarray.reshape:

np.array(np.meshgrid(x, y)).T.reshape(-1, 2)

array([[1, 3],
       [1, 4],
       [1, 5],
       [2, 3],
       [2, 4],
       [2, 5]])

Use inside pd.DataFrame and add df.sum on axis=1 via df.assign.

Use einops :

import numpy as np
import pandas as pd
from einops import repeat

# shape (2,)
x = np.array([1, 2])  

# shape (3,)     
y = np.array([3, 4, 5])     

xGrid  =repeat(x, 'i -> i j', j = len(y))
'''
[[1 1 1]
 [2 2 2]]
'''
yGrid = repeat(y, 'j -> i j', i = len(x))
''' 
[[3 4 5]
 [3 4 5]]
'''
zGrid = xGrid + yGrid

res = np.stack([xGrid.ravel(),yGrid.ravel(),zGrid.ravel()],axis=1)
print(res)
df = pd.DataFrame(res, columns=['x', 'y', 'z'])
print(df)
'''
   x  y  z
0  1  3  4
1  1  4  5
2  1  5  6
3  2  3  5
4  2  4  6
5  2  5  7
'''

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - How to list a 2d array in a tabular form along with two 1d arrays from which it was generated? - Stack Overflow

4 Answers 4

与本文相关的文章

评论列表(0)