最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

data structures - What is the most efficient algorithm for merging sorted lists with a carry-forward mechanism for missing keys?

programmeradmin0浏览0评论

I'm working with multiple ordered lists, each containing tuples of the form (key, value), where each list is sorted in ascending order by key. I want to merge these lists into a single ordered list that includes every distinct key from the input lists. The merged output should be a list of tuples in the form:

(key, value_from_list1, value_from_list2, ..., value_from_listK)

with the following behavior:

  • For each key present in the union of all keys, the output tuple should contain the "current" value from each list.
  • If an input list does not have an entry at that key, it should carry forward the most recent value from that list (i.e., use the value from the latest key less than or equal to the current key).

Here is a complete and minimal example: Given two lists:

A = [(1, a1), (2, a2), (10, a3)]
B = [(1, b1), (8, b2), (10, b3)]

the desired output is:

C = [
  (1, a1, b1),  // At key 1, both lists provide values.
  (2, a2, b1),  // At key 2, list A updates to a2; list B still uses b1.
  (8, a2, b2),  // At key 8, list B updates to b2; list A remains at a2.
  (10, a3, b3)  // At key 10, both lists update.
]

This merging behavior should extend to more than two lists as well (e.g., merging 5 lists into tuples with 6 elements, where the first element is the key).

All input lists are assumed to contain the element (0, null), but we omit this element (i.e. (0, null, null, ..., null)) from the final output. This ensures that, when merging, if a list doesn't have an entry for a given key, it carries forward the most recent value (initially null). For example, merging A = [(1, a1)] and B = [(2, b1)] yields [(1, a1, null), (2, a1, b1)].

My Questions:

  1. Algorithm Efficiency: What is the most efficient algorithm for merging sorted lists with a carry-forward mechanism for missing keys?

My current solution is using a sweeping line algorithm with multiple pointers—one for each list—to keep track of the most recent value as I iterate through the union of keys. If I am not mistaken, this runs in O(N log(N))

Edit 2025-03-13

Pseudocode

I´ve created pseudocode based on the suggestion from user Smylic made in this post. As he says:

If you need to output k values for every key, the most efficient algorithm takes Θ(k⋅u) of time, where u is the number of unique keys.

function mergeLists(lists):
    k = number of lists
    pointers = array of size k, all initialized to 0
    lastValues = array of size k, initially set to None (or some default)
    output = empty list

    while true:
        minKey = None

        // Find the minimum key among the current pointers in all lists.
        for i from 0 to k-1:
            if pointers[i] < length(lists[i]):
                currentKey = lists[i][pointers[i]].key
                if minKey is None or currentKey < minKey:
                    minKey = currentKey

        // If no minimum key was found, all lists are exhausted.
        if minKey is None:
            break

        // Update lastValues for each list that has the current minKey
        for i from 0 to k-1:
            if pointers[i] < length(lists[i]) and lists[i][pointers[i]].key == minKey:
                lastValues[i] = lists[i][pointers[i]].value
                pointers[i] = pointers[i] + 1

        // Append the merged tuple (minKey, lastValues[0], lastValues[1], ..., lastValues[k-1])
        output.append( (minKey, lastValues[0], lastValues[1], ..., lastValues[k-1]) )

    return output

I'm working with multiple ordered lists, each containing tuples of the form (key, value), where each list is sorted in ascending order by key. I want to merge these lists into a single ordered list that includes every distinct key from the input lists. The merged output should be a list of tuples in the form:

(key, value_from_list1, value_from_list2, ..., value_from_listK)

with the following behavior:

  • For each key present in the union of all keys, the output tuple should contain the "current" value from each list.
  • If an input list does not have an entry at that key, it should carry forward the most recent value from that list (i.e., use the value from the latest key less than or equal to the current key).

Here is a complete and minimal example: Given two lists:

A = [(1, a1), (2, a2), (10, a3)]
B = [(1, b1), (8, b2), (10, b3)]

the desired output is:

C = [
  (1, a1, b1),  // At key 1, both lists provide values.
  (2, a2, b1),  // At key 2, list A updates to a2; list B still uses b1.
  (8, a2, b2),  // At key 8, list B updates to b2; list A remains at a2.
  (10, a3, b3)  // At key 10, both lists update.
]

This merging behavior should extend to more than two lists as well (e.g., merging 5 lists into tuples with 6 elements, where the first element is the key).

All input lists are assumed to contain the element (0, null), but we omit this element (i.e. (0, null, null, ..., null)) from the final output. This ensures that, when merging, if a list doesn't have an entry for a given key, it carries forward the most recent value (initially null). For example, merging A = [(1, a1)] and B = [(2, b1)] yields [(1, a1, null), (2, a1, b1)].

My Questions:

  1. Algorithm Efficiency: What is the most efficient algorithm for merging sorted lists with a carry-forward mechanism for missing keys?

My current solution is using a sweeping line algorithm with multiple pointers—one for each list—to keep track of the most recent value as I iterate through the union of keys. If I am not mistaken, this runs in O(N log(N))

Edit 2025-03-13

Pseudocode

I´ve created pseudocode based on the suggestion from user Smylic made in this post. As he says:

If you need to output k values for every key, the most efficient algorithm takes Θ(k⋅u) of time, where u is the number of unique keys.

function mergeLists(lists):
    k = number of lists
    pointers = array of size k, all initialized to 0
    lastValues = array of size k, initially set to None (or some default)
    output = empty list

    while true:
        minKey = None

        // Find the minimum key among the current pointers in all lists.
        for i from 0 to k-1:
            if pointers[i] < length(lists[i]):
                currentKey = lists[i][pointers[i]].key
                if minKey is None or currentKey < minKey:
                    minKey = currentKey

        // If no minimum key was found, all lists are exhausted.
        if minKey is None:
            break

        // Update lastValues for each list that has the current minKey
        for i from 0 to k-1:
            if pointers[i] < length(lists[i]) and lists[i][pointers[i]].key == minKey:
                lastValues[i] = lists[i][pointers[i]].value
                pointers[i] = pointers[i] + 1

        // Append the merged tuple (minKey, lastValues[0], lastValues[1], ..., lastValues[k-1])
        output.append( (minKey, lastValues[0], lastValues[1], ..., lastValues[k-1]) )

    return output
Share Improve this question edited Mar 28 at 12:45 Isak asked Mar 12 at 21:35 IsakIsak 857 bronze badges 4
  • 1 Please show your implementation. If you have good reason to believe that it is O(n log(n)), it may already be optimal. – btilly Commented Mar 13 at 0:30
  • @btilly Ive added psuedocode for a better approach. – Isak Commented Mar 13 at 11:42
  • Your specification is broken. What should be the output for two input lists, [(1,a)] and [(2,b)]? There is no "previous" value at 1 for the second list. – Will Ness Commented Mar 13 at 22:09
  • True. All input lists are assumed to contain the element (0, null), but we omit this element (i.e. (0, null, null, ..., null)) from the final output. This ensures that, when merging, if a list doesn't have an entry for a given key, it carries forward the most recent value (initially null). For example, merging A = [(1, a1)] and B = [(2, b1)] yields [(1, a1, null), (2, a1, b1)]. Is this a good description? – Isak Commented Mar 14 at 5:15
Add a comment  | 

1 Answer 1

Reset to default 7

If there are n distinct keys and m lists, the total output will be O(n*m). Therefore you cannot be faster than that. Your current algorithm is already that speed, so the most that you can improve is a constant factor.

It is possible to find an improvement when most keys do not appear in most lists. But it will add an extra log(m) factor if most keys appear in most lists.

So your algorithm seems good enough to me.

发布评论

评论列表(0)

  1. 暂无评论