最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

character encoding - Why does fn:encode-for-uri('§') result in %C2%A7 rather than just %A7? - Stack Ove

programmeradmin2浏览0评论

In Oxygen XML Editor 27.0, using the "XPath/XQuery Builder" (which, as far as I know, makes use of Saxon as XPath/XQuery processor), when I execute the XPath 2.0 query encode-for-uri('§'), I get %C2%A7 as a result. Where does the %C2 come from?

Encoding other "special characters" like $, (, | and so on, I get only the respective hexadecimal ASCII code (i.e. just one, not two - for instance: | => %7C).

Why is this different with §?

In Oxygen XML Editor 27.0, using the "XPath/XQuery Builder" (which, as far as I know, makes use of Saxon as XPath/XQuery processor), when I execute the XPath 2.0 query encode-for-uri('§'), I get %C2%A7 as a result. Where does the %C2 come from?

Encoding other "special characters" like $, (, | and so on, I get only the respective hexadecimal ASCII code (i.e. just one, not two - for instance: | => %7C).

Why is this different with §?

Share Improve this question asked Mar 3 at 18:06 Philipp KochPhilipp Koch 13711 bronze badges 2
  • Spec is w3./TR/xpath-functions-31/#func-encode-for-uri, which refers to ietf./rfc/rfc3986.txt which I think refers to first get the UTF-8 encoding (which for § is two bytes C2 and A7). Do you have any other XPath 2 or 3 implementation giving you a different result? Which is the one you expect? – Martin Honnen Commented Mar 3 at 18:22
  • Thank you, @MartinHonnen - I wasn't aware! No, I have only seen this with the described implementation and mentioned it in case it would be helpful for finding an explanation. But you pointed out the answer already. :) – Philipp Koch Commented Mar 3 at 18:27
Add a comment  | 

1 Answer 1

Reset to default 2

From fn:encode-for-uri:


Like the fn:escape-html-uri and fn:iri-to-uri functions, this function replaces each special character with an escape sequence in the form %xx, where xx is two hexadecimal digits (in uppercase) that represent the character in UTF-8. For example, édition.html is changed to %C3%A9dition.html, with the é escaped as %C3%A9.

Hence, § (U+00A7, Section Sign) is encoded as %C2%A7

发布评论

评论列表(0)

  1. 暂无评论