All,
The following function get_frequency_of_events
detects the frequency of consecutive numbers, for example,
import numpy as np
aa=np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
get_frequency_of_events(aa)
this yields the following:
list of indices @ the beginning of each group [1, 3, 6, 10]
frequency of each group [2, 3, 4, 5]
another example,
aa=np.array([1,1,1,np.nan,np.nan,1,1,np.nan])
idx, feq= get_frequency_of_events(aa)
list of indices @ the beginning of each group [0, 5]
frequency of each group [3, 2]
Yet, this function is slow, especially when iterating it over 3D data. How can I vectorize such a function to achieve faster processing?
Here is the function
def get_frequency_of_events(mydata):
"""
Author : Shaaban
Date : Jan 22, 2025
Purpose : get the frequency of repeated consecutive numbers and their indices, this is important when finding the frequency of heatwaves and etc ... All we have to do is to build matrix of ones (or any other number), and NAN. One refers to the existence of the EVENT, and nan refers to the inexistence of the event. Then this function could give you a summary of the the frequency of the events and their associated indices.
tests :
aa=np.array([1,1,0,0,0,1,0,1,1,1,1,0,1,1])
get_frequency(aa)
aa=np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
get_frequency(aa)
aa=np.array([1,1,1,1,0,0,1,1,1])
get_frequency(aa)
aa=np.arange(10)
get_frequency(aa)
aa=np.ones(10)
get_frequency(aa)
# CAUTION CAUTION CAUTION
#For heatwave numbers, etc , make your array consits of fixed number (any number) that is associated with an evens and Nan for days/hours/month not associated with events. The trick here is that no nan could ever be equal to another nan.
aa=np.array([1,1,1,np.nan,np.nan,1,1,np.nan])
idx, feq= get_frequency(aa)
"""
index_list=[]
events_frequency_list=[]
idx_last_num=len(mydata)-1
counter=0
ii=0
while(ii <= idx_last_num-1):
#print( '@ index = '+str(ii) )
counter=0
while(mydata[ii] == mydata[ii+1]):
print(' Find match @ '+str(ii)+' & '+str(ii+1)+\
' data are '+str(mydata[ii])+' & '+str(mydata[ii+1]))
# store the index of the first match of each group.
if counter == 0:
index_list.append(ii)
ii=ii+1
counter=counter+1
# break from while if this is the last element in the array.
if ii==idx_last_num:
break
# if we just were iniside loop, store the no of events
if counter != 0:
no_events=counter+1
events_frequency_list.append(no_events)
# counter if there is no match at all for the outer while.
ii=ii+1
print('list of indices @ the begining of each group ')
print(index_list)
print(' frequency of each group.')
print(events_frequency_list)
return index_list, events_frequency_list
All,
The following function get_frequency_of_events
detects the frequency of consecutive numbers, for example,
import numpy as np
aa=np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
get_frequency_of_events(aa)
this yields the following:
list of indices @ the beginning of each group [1, 3, 6, 10]
frequency of each group [2, 3, 4, 5]
another example,
aa=np.array([1,1,1,np.nan,np.nan,1,1,np.nan])
idx, feq= get_frequency_of_events(aa)
list of indices @ the beginning of each group [0, 5]
frequency of each group [3, 2]
Yet, this function is slow, especially when iterating it over 3D data. How can I vectorize such a function to achieve faster processing?
Here is the function
def get_frequency_of_events(mydata):
"""
Author : Shaaban
Date : Jan 22, 2025
Purpose : get the frequency of repeated consecutive numbers and their indices, this is important when finding the frequency of heatwaves and etc ... All we have to do is to build matrix of ones (or any other number), and NAN. One refers to the existence of the EVENT, and nan refers to the inexistence of the event. Then this function could give you a summary of the the frequency of the events and their associated indices.
tests :
aa=np.array([1,1,0,0,0,1,0,1,1,1,1,0,1,1])
get_frequency(aa)
aa=np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
get_frequency(aa)
aa=np.array([1,1,1,1,0,0,1,1,1])
get_frequency(aa)
aa=np.arange(10)
get_frequency(aa)
aa=np.ones(10)
get_frequency(aa)
# CAUTION CAUTION CAUTION
#For heatwave numbers, etc , make your array consits of fixed number (any number) that is associated with an evens and Nan for days/hours/month not associated with events. The trick here is that no nan could ever be equal to another nan.
aa=np.array([1,1,1,np.nan,np.nan,1,1,np.nan])
idx, feq= get_frequency(aa)
"""
index_list=[]
events_frequency_list=[]
idx_last_num=len(mydata)-1
counter=0
ii=0
while(ii <= idx_last_num-1):
#print( '@ index = '+str(ii) )
counter=0
while(mydata[ii] == mydata[ii+1]):
print(' Find match @ '+str(ii)+' & '+str(ii+1)+\
' data are '+str(mydata[ii])+' & '+str(mydata[ii+1]))
# store the index of the first match of each group.
if counter == 0:
index_list.append(ii)
ii=ii+1
counter=counter+1
# break from while if this is the last element in the array.
if ii==idx_last_num:
break
# if we just were iniside loop, store the no of events
if counter != 0:
no_events=counter+1
events_frequency_list.append(no_events)
# counter if there is no match at all for the outer while.
ii=ii+1
print('list of indices @ the begining of each group ')
print(index_list)
print(' frequency of each group.')
print(events_frequency_list)
return index_list, events_frequency_list
Share
asked Mar 4 at 12:08
KernelKernel
7431 gold badge14 silver badges26 bronze badges
1 Answer
Reset to default 1A possible solution:
boundaries = np.where(np.diff(aa) != 0)[0] + 1 #group boundaries
get_idx_freqs = lambda i, d: (np.concatenate(([0], i))[d >= 2], d[d >= 2])
idx, freqs = get_idx_freqs(boundaries, np.diff(np.r_[0, boundaries, len(aa)]))
The process begins by detecting group boundaries with np.where(np.diff(aa)!=0)[0]+1
, which locates indices where the value changes and marks the start of new groups. Next, group lengths are computed by concatenating the starting index, the change indices, and the array's end using np.r_[0, boundaries, len(aa)]
, and then applying np.diff
to obtain the lengths of these groups. Finally, a lambda function applies a mask (d>=2
) to both the start indices and the group lengths, filtering out any groups of only one element. (See np.diff
and np.where
.)
Output:
# aa=np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
(array([ 1, 3, 6, 10]), array([2, 3, 4, 5]))
# aa=np.array([1,1,1,np.nan,np.nan,1,1,np.nan])
(array([0, 5]), array([3, 2]))