Given two numpy arrays of equal shape, I want to track how elements from the first array have moved in the second array. Specifically, for each element in the second array, I want to find its original position in the first array. The arrays are not sorted.
Example 1: All elements present
a1 = np.array([1, 2, 3, 4]) # original array
a2 = np.array([2, 1, 3, 4]) # new array
# Result: [1, 0, 2, 3]
# Explanation:
# - 2 was originally at index 1
# - 1 was originally at index 0
# - 3 was originally at index 2
# - 4 was originally at index 3
Example 2: With new elements
a1 = np.array([1, 2, 3, 4])
a2 = np.array([2, 1, 33, 4])
# Result: [1, 0, -1, 3]
# The value 33 wasn't in the original array, so it gets -1
My solution is:
[a1.tolist().index(v) if v in a1 else -1 for v in a2]
or np.where(a2[:, None] == a1)[1]
but this will not work in example 2
Is there a better way to do this? In real life my arrays have million of rows. Columns not that many, less than 10.
Given two numpy arrays of equal shape, I want to track how elements from the first array have moved in the second array. Specifically, for each element in the second array, I want to find its original position in the first array. The arrays are not sorted.
Example 1: All elements present
a1 = np.array([1, 2, 3, 4]) # original array
a2 = np.array([2, 1, 3, 4]) # new array
# Result: [1, 0, 2, 3]
# Explanation:
# - 2 was originally at index 1
# - 1 was originally at index 0
# - 3 was originally at index 2
# - 4 was originally at index 3
Example 2: With new elements
a1 = np.array([1, 2, 3, 4])
a2 = np.array([2, 1, 33, 4])
# Result: [1, 0, -1, 3]
# The value 33 wasn't in the original array, so it gets -1
My solution is:
[a1.tolist().index(v) if v in a1 else -1 for v in a2]
or np.where(a2[:, None] == a1)[1]
but this will not work in example 2
Is there a better way to do this? In real life my arrays have million of rows. Columns not that many, less than 10.
Share Improve this question edited Mar 3 at 11:46 mkrieger1 23.6k7 gold badges64 silver badges82 bronze badges asked Feb 28 at 18:42 AenaonAenaon 3,6135 gold badges41 silver badges70 bronze badges4 Answers
Reset to default 1You could combine np.argmax
with np.any
to check whether there was no match at all. Here is a minimal example:
import numpy as np
a1 = np.array([1, 2, 3, 4])
a2 = np.array([2, 1, 33, 4])
has_match = a2[:, None] == a1
idx = np.argmax(has_match, axis=0)
idx[~np.any(has_match, axis=0)] = -1
This gives:
array([ 1, 0, -1, 3])
Euqivalent to your pure Python solution. The advantage here is that both argmax
and any
allow to specify the axis they operate along.
You can try pandas.merge
to keep track of the index mapping
import pandas as pd
df = pd.DataFrame({'a1': a1, 'a2': a2})
lut = pd.DataFrame({'a1': a2, 'idx': df.index})
idx = pd.merge(df, lut, how = 'left')['idx'].fillna(-1).values.astype(int)
- Given
a1 = np.array([1, 2, 3, 4])
anda2 = np.array([2, 1, 3, 4])
, you will obtain
array([1, 0, 2, 3])
- Given
a1 = np.array([1, 2, 3, 4])
anda2 = np.array([2, 1, 33, 4])
, you will obtain
array([ 1, 0, -1, 3])
You use dict
for this.
a1 = np.array([1, 2, 3, 4])
a2 = np.array([2, 1, 33, 4])
old_indices = {element: index for index, element in enumerate(a1.to_list())}
result = [old_indices.get(i, -1) for element in a2.to_list()]
Another possible solution, which uses np.where
to identify the indices and np.full
to set to -1 the missing indices:
i, j = np.where(a1 == a2[:, None])
out = np.full(len(a2), -1, dtype=int)
out[i] = j
Output:
array([ 1, 0, -1, 3])