最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

xml - Escaping negative lookbehind in regex inside XSLT w Saxon - Stack Overflow

programmeradmin5浏览0评论

I am trying to do string replacement in an XSLT 2.0 stylesheet. I am doing this via Saxon, in oxYgen or in my own Java code. The string replacement I am trying to do is:

  • remove all open parentheses except when followed by a period or a hyphen
  • remove all close parentheses except when preceded by a period or a hyphen

This translates into regular expression with negative lookahead and negative lookbehind:

\((?![\.-])

(?<![\.-])\)

Now this needs to go into an <xsl:variable> statement:

<xsl:variable name="result" select=replace($input, '(?<![\.-])\)', '')/>

My question is: How do I escape the regex string so that it remains valid and the variable statement is valid XML?

I tried different combinations of <, unicode codepoints, but either the replacement does not happen or I get

[Saxon-PE 12.3] Syntax error at char 11 in regular expression: Unmatched close paren

or

The value of attribute "regex" associated with an element type "xsl:analyze-string" must not contain the '<' character.

The first error occurs during transformation, the second while checking the well-formedness of the XML.

What is the right way to do escaping in this string?

I am trying to do string replacement in an XSLT 2.0 stylesheet. I am doing this via Saxon, in oxYgen or in my own Java code. The string replacement I am trying to do is:

  • remove all open parentheses except when followed by a period or a hyphen
  • remove all close parentheses except when preceded by a period or a hyphen

This translates into regular expression with negative lookahead and negative lookbehind:

\((?![\.-])

(?<![\.-])\)

Now this needs to go into an <xsl:variable> statement:

<xsl:variable name="result" select=replace($input, '(?<![\.-])\)', '')/>

My question is: How do I escape the regex string so that it remains valid and the variable statement is valid XML?

I tried different combinations of <, unicode codepoints, but either the replacement does not happen or I get

[Saxon-PE 12.3] Syntax error at char 11 in regular expression: Unmatched close paren

or

The value of attribute "regex" associated with an element type "xsl:analyze-string" must not contain the '<' character.

The first error occurs during transformation, the second while checking the well-formedness of the XML.

What is the right way to do escaping in this string?

Share Improve this question asked Jan 29 at 20:24 Bernd MoosBernd Moos 493 bronze badges 1
  • The &lt; should have been &lt; – Bernd Moos Commented Jan 29 at 20:25
Add a comment  | 

2 Answers 2

Reset to default 2

Firstly, the XPath regular expression dialect does not support look-ahead and look-behind. A workaround in Saxon is to invoke the Java regex engine rather than the XPath regex engine, which you can do by setting the flags argument to ";j".

Secondly, < appearin.g anywhere in XML (including regular expressions) must be written &lt;.

Thirdly, I think you have escaped the wrong closing ).

So

<xsl:variable name="result" select=replace($input, '(?<![\.-])\)', '')/>

should be

<xsl:variable name="result" select="replace($input, '(?&lt;![\.-]\))', '', ';j')"/>

Saxon 12 is an XSLT 3 processor, not an XSLT 2 one. As for using negative lookahead or lookbehind, I don't think they are part of the regular expression options that the XPath/XSLT regular expression support. However, with Saxon Java, you can use the flag ;j' to switch to Java regular expressions.

I tried

  <xsl:param name="opening-parentesis" as="xs:string" expand-text="no"><![CDATA[\((?![\.-])]]></xsl:param>

with e.g.

<p>(.(-((ab(cd(.ef(-gh(</p>

and code like

  <xsl:template match="p/text()">
    <xsl:value-of select=". => replace($opening-parentesis, '', ';j')"/>
  </xsl:template>

and it outputs

<p>(.(-abcd(.ef(-gh</p>

Using the CDATA and parameter is usually an easier way than needing escape your regular expression inside of an XML attribute.

Expanded to both regular expressions I get

  <xsl:param name="opening-parentesis" as="xs:string" expand-text="no"><![CDATA[\((?![\.-])]]></xsl:param>
  <xsl:param name="closing-parentesis" as="xs:string" expand-text="no"><![CDATA[(?<![\.-])\)]]></xsl:param>
  
  <xsl:template match="p/text()">
    <xsl:value-of select=". => replace($opening-parentesis, '', ';j') => replace($closing-parentesis, '', ';j')"/>
  </xsl:template>

and

<p>(.(-((ab(cd(.ef(-gh(ij.)kl-)mn)op)</p>

is transformed into

<p>(.(-abcd(.ef(-ghij.)kl-)mnop</p>
发布评论

评论列表(0)

  1. 暂无评论