I have an XML file of a play, from which this extract will serve:
<play>
<speech>
<spkr name="CAL">CAL.</spkr>Et tu mecastor salve, Lysistrata. Sed quid conturbata es?
expe frontem, carissima: non enim te decent contracta supercilia.</speech>
<speech>
<spkr name="LYS">LYS.</spkr>Sed, ô Calonice, uritur mihi cor, et valde me piget sexus
nostri, quoniam viri existimant<endnote orig="transcriber" n="1"/> nos esse nequam.</speech>
<speech>
<spkr name="CAL">CAL.</spkr>Quippe tales pol sumus.</speech>
</play>
I'm trying to find an XPath (3.1, run in oXygen) solution to the question:
What character speaks the most frequently in this play? (I.e. to return, for this sample, the attribute value CAL .)
I've tried various ways of combining the functions distinct-values(), count(), and max(), and have worked through the articles here on Stack Overflow about saxon:highest(), but I can't seem to get it to work where the number of which I'm trying to get the max() is a number of counted same values from the distinct-values() of attribute values.
I could find an XQuery answer, where I run a for-loop and order it by the count and then tell it to return only the first one on the list, but surely there must be a reasonably elegant XPath answer. This would also enable me to transfer the answer to XSLT when needed.
I have an XML file of a play, from which this extract will serve:
<play>
<speech>
<spkr name="CAL">CAL.</spkr>Et tu mecastor salve, Lysistrata. Sed quid conturbata es?
expe frontem, carissima: non enim te decent contracta supercilia.</speech>
<speech>
<spkr name="LYS">LYS.</spkr>Sed, ô Calonice, uritur mihi cor, et valde me piget sexus
nostri, quoniam viri existimant<endnote orig="transcriber" n="1"/> nos esse nequam.</speech>
<speech>
<spkr name="CAL">CAL.</spkr>Quippe tales pol sumus.</speech>
</play>
I'm trying to find an XPath (3.1, run in oXygen) solution to the question:
What character speaks the most frequently in this play? (I.e. to return, for this sample, the attribute value CAL .)
I've tried various ways of combining the functions distinct-values(), count(), and max(), and have worked through the articles here on Stack Overflow about saxon:highest(), but I can't seem to get it to work where the number of which I'm trying to get the max() is a number of counted same values from the distinct-values() of attribute values.
I could find an XQuery answer, where I run a for-loop and order it by the count and then tell it to return only the first one on the list, but surely there must be a reasonably elegant XPath answer. This would also enable me to transfer the answer to XSLT when needed.
Share Improve this question edited Feb 17 at 16:52 Yitzhak Khabinsky 22.3k2 gold badges19 silver badges23 bronze badges asked Feb 17 at 16:22 haggis78haggis78 734 bronze badges 2- 1 While asking a question, you need to provide a minimal reproducible example: Please edit your original question and provide the following: (1) Well-formed XML file sample with all relevant namespaces.. (2) What you need to do, i.e. logic, and your code attempt trying to implement it. (3) Desired output based on the sample data in #1 above. – Yitzhak Khabinsky Commented Feb 17 at 16:37
- stackoverflow/questions/76543312/… – Yitzhak Khabinsky Commented Feb 17 at 16:50
2 Answers
Reset to default 2With pure XPath 3.1 (example Saxon 12 HE fiddle)
map:merge(//spkr ! map:entry(string(@name), .), map { 'duplicates' : 'combine'}) => map:for-each(function($k, $v) { map:entry($k, -count($v)) }) => sort((), function($e) { $e?* }) => head() => map:keys()
With XPath 4 (available with Saxon 12 EE or PE in oXygen but not sure how to force XPath version 4 in the settings, also BaseX fiddle):
map:merge(//spkr ! map:entry(string(@name), .), map { 'duplicates' : 'combine'}) => map:for-each(function($k, $v) { map:entry($k, count($v)) }) => highest((), function($e) { $e?* } ) => map:keys()
Grouping queries are generally easier in XSLT or XQuery rather than in XPath. But in 4.0 you can build a map of speakers / number of speeches with
let $freq = map:build(//spkr, fn{@name}, fn{1}, op('+'))
and then get the highest with
return highest(map:pairs($freq), fn{?value})?key
Not tested. And yes, I know, you wanted an XPath 3.1 solution, presumably without using any Saxon extensions. In 3.1 we can build the histogram with
let $freq := for $n in distinct-values(//spkr/@name)
return count(//spkr[@name = $n])
then we can find the highest count with
let $max := max($freq?*)
and then we can find the name having that count with
return map:keys($freq)[map:get(.) = $max]
Not pretty, but should work. Certainly justifies some of the new 4.0 functionality!