最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

xml - How to find element with most common attribute value using only XPath 3.1? - Stack Overflow

programmeradmin4浏览0评论

I have an XML file of a play, from which this extract will serve:

<play>
    <speech>
        <spkr name="CAL">CAL.</spkr>Et tu mecastor salve, Lysistrata. Sed quid conturbata es?
        expe frontem, carissima: non enim te decent contracta supercilia.</speech>
    <speech>
        <spkr name="LYS">LYS.</spkr>Sed, ô Calonice, uritur mihi cor, et valde me piget sexus
        nostri, quoniam viri existimant<endnote orig="transcriber" n="1"/> nos esse nequam.</speech>
    <speech>
        <spkr name="CAL">CAL.</spkr>Quippe tales pol sumus.</speech>
</play>

I'm trying to find an XPath (3.1, run in oXygen) solution to the question:

What character speaks the most frequently in this play? (I.e. to return, for this sample, the attribute value CAL .)

I've tried various ways of combining the functions distinct-values(), count(), and max(), and have worked through the articles here on Stack Overflow about saxon:highest(), but I can't seem to get it to work where the number of which I'm trying to get the max() is a number of counted same values from the distinct-values() of attribute values.

I could find an XQuery answer, where I run a for-loop and order it by the count and then tell it to return only the first one on the list, but surely there must be a reasonably elegant XPath answer. This would also enable me to transfer the answer to XSLT when needed.

I have an XML file of a play, from which this extract will serve:

<play>
    <speech>
        <spkr name="CAL">CAL.</spkr>Et tu mecastor salve, Lysistrata. Sed quid conturbata es?
        expe frontem, carissima: non enim te decent contracta supercilia.</speech>
    <speech>
        <spkr name="LYS">LYS.</spkr>Sed, ô Calonice, uritur mihi cor, et valde me piget sexus
        nostri, quoniam viri existimant<endnote orig="transcriber" n="1"/> nos esse nequam.</speech>
    <speech>
        <spkr name="CAL">CAL.</spkr>Quippe tales pol sumus.</speech>
</play>

I'm trying to find an XPath (3.1, run in oXygen) solution to the question:

What character speaks the most frequently in this play? (I.e. to return, for this sample, the attribute value CAL .)

I've tried various ways of combining the functions distinct-values(), count(), and max(), and have worked through the articles here on Stack Overflow about saxon:highest(), but I can't seem to get it to work where the number of which I'm trying to get the max() is a number of counted same values from the distinct-values() of attribute values.

I could find an XQuery answer, where I run a for-loop and order it by the count and then tell it to return only the first one on the list, but surely there must be a reasonably elegant XPath answer. This would also enable me to transfer the answer to XSLT when needed.

Share Improve this question edited Feb 17 at 16:52 Yitzhak Khabinsky 22.3k2 gold badges19 silver badges23 bronze badges asked Feb 17 at 16:22 haggis78haggis78 734 bronze badges 2
  • 1 While asking a question, you need to provide a minimal reproducible example: Please edit your original question and provide the following: (1) Well-formed XML file sample with all relevant namespaces.. (2) What you need to do, i.e. logic, and your code attempt trying to implement it. (3) Desired output based on the sample data in #1 above. – Yitzhak Khabinsky Commented Feb 17 at 16:37
  • stackoverflow/questions/76543312/… – Yitzhak Khabinsky Commented Feb 17 at 16:50
Add a comment  | 

2 Answers 2

Reset to default 2

With pure XPath 3.1 (example Saxon 12 HE fiddle)

map:merge(//spkr ! map:entry(string(@name), .), map { 'duplicates' : 'combine'}) => map:for-each(function($k, $v) { map:entry($k, -count($v)) }) => sort((), function($e) { $e?* }) => head() => map:keys()

With XPath 4 (available with Saxon 12 EE or PE in oXygen but not sure how to force XPath version 4 in the settings, also BaseX fiddle):

map:merge(//spkr ! map:entry(string(@name), .), map { 'duplicates' : 'combine'}) => map:for-each(function($k, $v) { map:entry($k, count($v)) }) => highest((), function($e) { $e?* } ) => map:keys()

Grouping queries are generally easier in XSLT or XQuery rather than in XPath. But in 4.0 you can build a map of speakers / number of speeches with

let $freq = map:build(//spkr, fn{@name}, fn{1}, op('+'))

and then get the highest with

return highest(map:pairs($freq), fn{?value})?key

Not tested. And yes, I know, you wanted an XPath 3.1 solution, presumably without using any Saxon extensions. In 3.1 we can build the histogram with

let $freq := for $n in distinct-values(//spkr/@name)
             return count(//spkr[@name = $n])

then we can find the highest count with

let $max := max($freq?*)

and then we can find the name having that count with

return map:keys($freq)[map:get(.) = $max]

Not pretty, but should work. Certainly justifies some of the new 4.0 functionality!

发布评论

评论列表(0)

  1. 暂无评论