python - How to Mark Repeated Entries as True Starting from the Second Occurrence Using NumPy?

Problem

I have a NumPy array and need to identify repeated elements, marking the second occurrence and beyond as True, while keeping the first occurrence as False.

For example, given the following array:

np.random.seed(100)
a = np.random.randint(0, 5, 10)
# Output: [0 0 3 0 2 4 2 2 2 2]

I want to get the following output:

[False True False True False False True True True True]

How can I achieve this using NumPy functions only, without using any loops or extra libraries?

What did you try and what were you expecting?

I was able to get it working with a loop, but I wanted to solve it using only NumPy functions. I tried implementing np.cumsum with masks, but I couldn’t make much progress.

Here's the solution I came up with using one loop:

np.random.seed(100)
a = np.random.randint(0, 5, 10)
print(a)
uniques, first_indices = np.unique(a, return_index=True)
all_occurrences = np.zeros_like(a, dtype=bool)
for i in range(len(a)):
    all_occurrences[i] = np.any(a[:i] == a[i])

all_occurrences[first_indices] = False
print(all_occurrences)

Problem

I have a NumPy array and need to identify repeated elements, marking the second occurrence and beyond as True, while keeping the first occurrence as False.

For example, given the following array:

np.random.seed(100)
a = np.random.randint(0, 5, 10)
# Output: [0 0 3 0 2 4 2 2 2 2]

I want to get the following output:

[False True False True False False True True True True]

How can I achieve this using NumPy functions only, without using any loops or extra libraries?

What did you try and what were you expecting?

I was able to get it working with a loop, but I wanted to solve it using only NumPy functions. I tried implementing np.cumsum with masks, but I couldn’t make much progress.

Here's the solution I came up with using one loop:

np.random.seed(100)
a = np.random.randint(0, 5, 10)
print(a)
uniques, first_indices = np.unique(a, return_index=True)
all_occurrences = np.zeros_like(a, dtype=bool)
for i in range(len(a)):
    all_occurrences[i] = np.any(a[:i] == a[i])

all_occurrences[first_indices] = False
print(all_occurrences)

Share Improve this question asked Nov 20, 2024 at 9:59 Dos_Santos 431 silver badge5 bronze badges

1 You were almost there. All you had to do would be to realize that you can replace your for loop by simply initializing your all_occurrences as an all-True array, then continuing as you did by setting all_occurrences[first_indices] = False. See the answer that I added. – simon Commented Nov 20, 2024 at 14:29

Add a comment |

2 Answers 2

Sorted by: Reset to default 4

Vectorized Operations for Finding Repeated Elements in a Numpy Array

Finding Unique Elements and Their First Indices:
```
np.unique(a, return_index=True)
```
This function finds all unique elements in the array a and returns their first indices.
Using np.isin with invert=True: This checks if each index in the array does not belong to the list of first occurrence indices.
Assigning True to Repeated Indices: Indices that are not the first occurrence of an element are marked as True.
Handling Edge Cases: For empty arrays, the function directly returns an empty boolean array to prevent errors.

Function Implementation:

def find_repeated(a):
    if a.size == 0:
        return np.array([], dtype=bool)  # Handle empty array edge case

    _, first_indices = np.unique(a, return_index=True)

    repeated_mask = np.zeros_like(a, dtype=bool)  # Initialize mask with `False`

    repeated_mask[np.isin(np.arange(a.size), first_indices, invert=True)] = True

    return repeated_mask

To reveal that the problem is actually less complicated than it may seem at first glance, the question could be rephrased as follows: Mark all first occurrences of values with False.

This leads to a bit of a simplified version of EuanG's answer¹:

def find_repeated(a):
    mask = np.ones_like(a, dtype=bool)
    mask[np.unique(a, return_index=True)[-1]] = False
    return mask

Steps: (1) Initialize the result mask as an all-True array of appropriate shape. (2) Find the indices of the first occurrences in the given array a. (3) Only set these indices to False in the result mask.

To make the code also work with n-dimensional arrays, we need to add an extra step of unraveling the result of np.unique(), as it returns the indices into the flattened given array a:

def find_repeated(a):
    mask = np.ones_like(a, dtype=bool)
    mask[np.unravel_index(np.unique(a, return_index=True)[-1], a.shape)] = False
    return mask

In either case:

We can directly use the indices (np.unique(…, return_index=True)[-1]) for indexing the mask array.
No need for catching the empty-array case here, as it is handled implicitly.

_{¹) Yes, I find EuanG's answer perfectly acceptable as well. No, I did not downvote it.}

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - How to Mark Repeated Entries as True Starting from the Second Occurrence Using NumPy? - Stack Overflow

Problem

What did you try and what were you expecting?

Problem

What did you try and what were you expecting?

2 Answers 2

与本文相关的文章

评论列表(0)