python - Vectoring nested while statements block that detects the frequency of consecutive numbers

All, The following function get_frequency_of_events detects the frequency of consecutive numbers, for example,

import numpy as np
aa=np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
get_frequency_of_events(aa)

this yields the following:

list of indices @ the beginning of each group [1, 3, 6, 10]

frequency of each group [2, 3, 4, 5]

another example,

aa=np.array([1,1,1,np.nan,np.nan,1,1,np.nan])
idx, feq= get_frequency_of_events(aa)

list of indices @ the beginning of each group [0, 5]

frequency of each group [3, 2]

Yet, this function is slow, especially when iterating it over 3D data. How can I vectorize such a function to achieve faster processing?

Here is the function

def get_frequency_of_events(mydata):
    """ 
    Author  :  Shaaban
    Date    : Jan 22, 2025 
    Purpose : get the frequency of repeated consecutive numbers and their indices, this is important when finding the frequency of heatwaves and etc ... All we have to do is to build matrix of ones (or any other number), and NAN. One refers to the existence of the EVENT, and nan refers to the inexistence of the event. Then this function could give you a summary of the the frequency of the events and their associated indices. 
    tests : 
        
    aa=np.array([1,1,0,0,0,1,0,1,1,1,1,0,1,1])
    get_frequency(aa)
    
    aa=np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
    get_frequency(aa)

    aa=np.array([1,1,1,1,0,0,1,1,1])
    get_frequency(aa)

    aa=np.arange(10)
    get_frequency(aa)

    aa=np.ones(10)
    get_frequency(aa)
    
    # CAUTION CAUTION CAUTION 
    #For heatwave numbers, etc , make your array consits of fixed number (any number) that is associated with an evens and Nan for days/hours/month not associated with events. The trick here is that no nan could ever be equal to another nan. 
    
    aa=np.array([1,1,1,np.nan,np.nan,1,1,np.nan])
    idx, feq= get_frequency(aa)
    """
    
    index_list=[]
    events_frequency_list=[]
    
    idx_last_num=len(mydata)-1
    
    counter=0
    ii=0
    while(ii <= idx_last_num-1):
        #print( '@ index = '+str(ii) )
        counter=0
        while(mydata[ii] == mydata[ii+1]):
            print(' Find match @ '+str(ii)+' & '+str(ii+1)+\
                  ' data are '+str(mydata[ii])+' & '+str(mydata[ii+1]))
            # store the index of the first match of each group.
            if counter == 0:
                index_list.append(ii)
            ii=ii+1
            counter=counter+1
            # break from while if this is the last element in the array.
            if ii==idx_last_num:
                break
        # if we just were iniside loop, store the no of events
        if counter != 0:
            no_events=counter+1
            events_frequency_list.append(no_events)
        
        # counter if there is no match at all for the outer while. 
        ii=ii+1
    print('list of indices @ the begining of each group  ')
    print(index_list)
    print(' frequency of each group.')
    print(events_frequency_list)
    return index_list, events_frequency_list

All, The following function get_frequency_of_events detects the frequency of consecutive numbers, for example,

import numpy as np
aa=np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
get_frequency_of_events(aa)

this yields the following:

list of indices @ the beginning of each group [1, 3, 6, 10]

frequency of each group [2, 3, 4, 5]

another example,

aa=np.array([1,1,1,np.nan,np.nan,1,1,np.nan])
idx, feq= get_frequency_of_events(aa)

list of indices @ the beginning of each group [0, 5]

frequency of each group [3, 2]

Yet, this function is slow, especially when iterating it over 3D data. How can I vectorize such a function to achieve faster processing?

Here is the function

def get_frequency_of_events(mydata):
    """ 
    Author  :  Shaaban
    Date    : Jan 22, 2025 
    Purpose : get the frequency of repeated consecutive numbers and their indices, this is important when finding the frequency of heatwaves and etc ... All we have to do is to build matrix of ones (or any other number), and NAN. One refers to the existence of the EVENT, and nan refers to the inexistence of the event. Then this function could give you a summary of the the frequency of the events and their associated indices. 
    tests : 
        
    aa=np.array([1,1,0,0,0,1,0,1,1,1,1,0,1,1])
    get_frequency(aa)
    
    aa=np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
    get_frequency(aa)

    aa=np.array([1,1,1,1,0,0,1,1,1])
    get_frequency(aa)

    aa=np.arange(10)
    get_frequency(aa)

    aa=np.ones(10)
    get_frequency(aa)
    
    # CAUTION CAUTION CAUTION 
    #For heatwave numbers, etc , make your array consits of fixed number (any number) that is associated with an evens and Nan for days/hours/month not associated with events. The trick here is that no nan could ever be equal to another nan. 
    
    aa=np.array([1,1,1,np.nan,np.nan,1,1,np.nan])
    idx, feq= get_frequency(aa)
    """
    
    index_list=[]
    events_frequency_list=[]
    
    idx_last_num=len(mydata)-1
    
    counter=0
    ii=0
    while(ii <= idx_last_num-1):
        #print( '@ index = '+str(ii) )
        counter=0
        while(mydata[ii] == mydata[ii+1]):
            print(' Find match @ '+str(ii)+' & '+str(ii+1)+\
                  ' data are '+str(mydata[ii])+' & '+str(mydata[ii+1]))
            # store the index of the first match of each group.
            if counter == 0:
                index_list.append(ii)
            ii=ii+1
            counter=counter+1
            # break from while if this is the last element in the array.
            if ii==idx_last_num:
                break
        # if we just were iniside loop, store the no of events
        if counter != 0:
            no_events=counter+1
            events_frequency_list.append(no_events)
        
        # counter if there is no match at all for the outer while. 
        ii=ii+1
    print('list of indices @ the begining of each group  ')
    print(index_list)
    print(' frequency of each group.')
    print(events_frequency_list)
    return index_list, events_frequency_list

Share asked Mar 4 at 12:08 Kernel 7431 gold badge14 silver badges26 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

A possible solution:

boundaries = np.where(np.diff(aa) != 0)[0] + 1 #group boundaries

get_idx_freqs = lambda i, d: (np.concatenate(([0], i))[d >= 2], d[d >= 2])
idx, freqs = get_idx_freqs(boundaries, np.diff(np.r_[0, boundaries, len(aa)]))

The process begins by detecting group boundaries with np.where(np.diff(aa)!=0)[0]+1, which locates indices where the value changes and marks the start of new groups. Next, group lengths are computed by concatenating the starting index, the change indices, and the array's end using np.r_[0, boundaries, len(aa)], and then applying np.diff to obtain the lengths of these groups. Finally, a lambda function applies a mask (d>=2) to both the start indices and the group lengths, filtering out any groups of only one element. (See np.diff and np.where.)

Output:

# aa=np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
(array([ 1,  3,  6, 10]), array([2, 3, 4, 5]))

# aa=np.array([1,1,1,np.nan,np.nan,1,1,np.nan])
(array([0, 5]), array([3, 2]))

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

python - Vectoring nested while statements block that detects the frequency of consecutive numbers - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)