Problem
I have a NumPy array and need to identify repeated elements, marking the second occurrence and beyond as True
, while keeping the first occurrence as False
.
For example, given the following array:
np.random.seed(100)
a = np.random.randint(0, 5, 10)
# Output: [0 0 3 0 2 4 2 2 2 2]
I want to get the following output:
[False True False True False False True True True True]
How can I achieve this using NumPy functions only, without using any loops or extra libraries?
What did you try and what were you expecting?
I was able to get it working with a loop, but I wanted to solve it using only NumPy functions. I tried implementing np.cumsum
with masks, but I couldn’t make much progress.
Here's the solution I came up with using one loop:
np.random.seed(100)
a = np.random.randint(0, 5, 10)
print(a)
uniques, first_indices = np.unique(a, return_index=True)
all_occurrences = np.zeros_like(a, dtype=bool)
for i in range(len(a)):
all_occurrences[i] = np.any(a[:i] == a[i])
all_occurrences[first_indices] = False
print(all_occurrences)
Problem
I have a NumPy array and need to identify repeated elements, marking the second occurrence and beyond as True
, while keeping the first occurrence as False
.
For example, given the following array:
np.random.seed(100)
a = np.random.randint(0, 5, 10)
# Output: [0 0 3 0 2 4 2 2 2 2]
I want to get the following output:
[False True False True False False True True True True]
How can I achieve this using NumPy functions only, without using any loops or extra libraries?
What did you try and what were you expecting?
I was able to get it working with a loop, but I wanted to solve it using only NumPy functions. I tried implementing np.cumsum
with masks, but I couldn’t make much progress.
Here's the solution I came up with using one loop:
np.random.seed(100)
a = np.random.randint(0, 5, 10)
print(a)
uniques, first_indices = np.unique(a, return_index=True)
all_occurrences = np.zeros_like(a, dtype=bool)
for i in range(len(a)):
all_occurrences[i] = np.any(a[:i] == a[i])
all_occurrences[first_indices] = False
print(all_occurrences)
Share
Improve this question
asked Nov 20, 2024 at 9:59
Dos_SantosDos_Santos
431 silver badge5 bronze badges
1
|
2 Answers
Reset to default 4Vectorized Operations for Finding Repeated Elements in a Numpy Array
Finding Unique Elements and Their First Indices:
np.unique(a, return_index=True)
This function finds all unique elements in the array
a
and returns their first indices.Using
np.isin
withinvert=True
: This checks if each index in the array does not belong to the list of first occurrence indices.Assigning
True
to Repeated Indices: Indices that are not the first occurrence of an element are marked asTrue
.Handling Edge Cases: For empty arrays, the function directly returns an empty boolean array to prevent errors.
Function Implementation:
def find_repeated(a):
if a.size == 0:
return np.array([], dtype=bool) # Handle empty array edge case
_, first_indices = np.unique(a, return_index=True)
repeated_mask = np.zeros_like(a, dtype=bool) # Initialize mask with `False`
repeated_mask[np.isin(np.arange(a.size), first_indices, invert=True)] = True
return repeated_mask
To reveal that the problem is actually less complicated than it may seem at first glance, the question could be rephrased as follows: Mark all first occurrences of values with False
.
This leads to a bit of a simplified version of EuanG's answer¹:
def find_repeated(a):
mask = np.ones_like(a, dtype=bool)
mask[np.unique(a, return_index=True)[-1]] = False
return mask
Steps: (1) Initialize the result mask
as an all-True
array of appropriate shape. (2) Find the indices of the first occurrences in the given array a
. (3) Only set these indices to False
in the result mask
.
To make the code also work with n-dimensional arrays, we need to add an extra step of unraveling the result of np.unique()
, as it returns the indices into the flattened given array a
:
def find_repeated(a):
mask = np.ones_like(a, dtype=bool)
mask[np.unravel_index(np.unique(a, return_index=True)[-1], a.shape)] = False
return mask
In either case:
- We can directly use the indices (
np.unique(…, return_index=True)[-1]
) for indexing themask
array. - No need for catching the empty-array case here, as it is handled implicitly.
¹) Yes, I find EuanG's answer perfectly acceptable as well. No, I did not downvote it.
for
loop by simply initializing yourall_occurrences
as an all-True
array, then continuing as you did by settingall_occurrences[first_indices] = False
. See the answer that I added. – simon Commented Nov 20, 2024 at 14:29