performance - Performant comparison of two hashmaps

I'm performing in Groovy a compare of two hashmaps as in example below

 h1 = [key1 : [att1 : 1, att2 : 2], key2 : [att1 : 10, att2 : 20], key3 : [att1 : 3, att2 : 4]]
 h2 = [key1 : [att1 : 1, att2 : 3], key2 : [att1 : 10, att2 : 20]]

and I need the result as folows

 [key1:[att1:true, att2:false], key2:[att1:true, att2:true], key3:[att1:false, att2:false]]

I.e. if the attributes of a given key match they should be marked as true, if they do not match or the key is missing in the second source they should be set to false

I tried this nested closures,

 diff = h1.collectEntries {key, value -> [key,  (value.collectEntries {k, v -> [k, (v == (h2[key] ? h2[key][k] : null) ) ] } ) ] }

which works, but for large maps scale not well, is there a better performant solution in Groovy?

Note if this may be relavent, the attributes are same in all occurencies and known in advance.

I'm performing in Groovy a compare of two hashmaps as in example below

 h1 = [key1 : [att1 : 1, att2 : 2], key2 : [att1 : 10, att2 : 20], key3 : [att1 : 3, att2 : 4]]
 h2 = [key1 : [att1 : 1, att2 : 3], key2 : [att1 : 10, att2 : 20]]

and I need the result as folows

 [key1:[att1:true, att2:false], key2:[att1:true, att2:true], key3:[att1:false, att2:false]]

I.e. if the attributes of a given key match they should be marked as true, if they do not match or the key is missing in the second source they should be set to false

I tried this nested closures,

 diff = h1.collectEntries {key, value -> [key,  (value.collectEntries {k, v -> [k, (v == (h2[key] ? h2[key][k] : null) ) ] } ) ] }

which works, but for large maps scale not well, is there a better performant solution in Groovy?

Note if this may be relavent, the attributes are same in all occurencies and known in advance.

Share Improve this question edited yesterday asked 2 days ago Marmite Bomber 21.1k4 gold badges29 silver badges57 bronze badges

Add a comment |

1 Answer 1

Sorted by: Reset to default 1

I don't think there is much more you can do other than "compare the maps"; at best this is for testing and short-circuiting is an option. Or you can utilize a library that can build up a diff.

That said, there are improvements to your code.

Starting from h1 means, that you will not find differences in keys only in h2 (e.g. swap h1 and h2 and keys3 will vanish from the result). So finding the unions of all involved keys sets will solve that (see compareMaps, def keys).

Next you can be very aggressive with the comparison for the final boolean, if you treat the following equally:

not having a map
not having the key
having a value of null

(see the closure in compareMapValues)

E.g.

def compareMapValues(ks, m1, m2) {
    ks.collectEntries{ k -> [k, m1?.get(k)==m2?.get(k)] }
}

def compareMaps(m1, m2) {
    // according to OP known and could as well be passed in
    // so this is just to make the demo work
    def nestedKeys = [m1,m2].collectMany{ it.values().collectMany{ it.keySet() }}.toSet()

    def keys = [m1,m2].collectMany{ it.keySet() }.toSet()
    keys.collectEntries{ k -> [k, compareMapValues(nestedKeys, m1[k], m2[k])] }
}


def h1 = [key1 : [att1 : 1, att2 : 2], key2 : [att1 : 10, att2 : 20], key3 : [att1 : 3, att2 : 4]]
def h2 = [key1 : [att1 : 1, att2 : 3], key2 : [att1 : 10, att2 : 20]]

println(compareMaps(h1,h2))
// → [key1:[att2:false, att1:true], key2:[att2:true, att1:true], key3:[att2:false, att1:false]]

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

performance - Performant comparison of two hashmaps - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)