I am trying to do string replacement in an XSLT 2.0 stylesheet. I am doing this via Saxon, in oxYgen or in my own Java code. The string replacement I am trying to do is:
- remove all open parentheses except when followed by a period or a hyphen
- remove all close parentheses except when preceded by a period or a hyphen
This translates into regular expression with negative lookahead and negative lookbehind:
\((?![\.-])
(?<![\.-])\)
Now this needs to go into an <xsl:variable>
statement:
<xsl:variable name="result" select=replace($input, '(?<![\.-])\)', '')/>
My question is: How do I escape the regex string so that it remains valid and the variable statement is valid XML?
I tried different combinations of <, unicode codepoints, but either the replacement does not happen or I get
[Saxon-PE 12.3] Syntax error at char 11 in regular expression: Unmatched close paren
or
The value of attribute "regex" associated with an element type "xsl:analyze-string" must not contain the '<' character.
The first error occurs during transformation, the second while checking the well-formedness of the XML.
What is the right way to do escaping in this string?
I am trying to do string replacement in an XSLT 2.0 stylesheet. I am doing this via Saxon, in oxYgen or in my own Java code. The string replacement I am trying to do is:
- remove all open parentheses except when followed by a period or a hyphen
- remove all close parentheses except when preceded by a period or a hyphen
This translates into regular expression with negative lookahead and negative lookbehind:
\((?![\.-])
(?<![\.-])\)
Now this needs to go into an <xsl:variable>
statement:
<xsl:variable name="result" select=replace($input, '(?<![\.-])\)', '')/>
My question is: How do I escape the regex string so that it remains valid and the variable statement is valid XML?
I tried different combinations of <, unicode codepoints, but either the replacement does not happen or I get
[Saxon-PE 12.3] Syntax error at char 11 in regular expression: Unmatched close paren
or
The value of attribute "regex" associated with an element type "xsl:analyze-string" must not contain the '<' character.
The first error occurs during transformation, the second while checking the well-formedness of the XML.
What is the right way to do escaping in this string?
Share Improve this question asked Jan 29 at 20:24 Bernd MoosBernd Moos 493 bronze badges 1 |2 Answers
Reset to default 2Firstly, the XPath regular expression dialect does not support look-ahead and look-behind. A workaround in Saxon is to invoke the Java regex engine rather than the XPath regex engine, which you can do by setting the flags argument to ";j".
Secondly, <
appearin.g anywhere in XML (including regular expressions) must be written <
.
Thirdly, I think you have escaped the wrong closing )
.
So
<xsl:variable name="result" select=replace($input, '(?<![\.-])\)', '')/>
should be
<xsl:variable name="result" select="replace($input, '(?<![\.-]\))', '', ';j')"/>
Saxon 12 is an XSLT 3 processor, not an XSLT 2 one. As for using negative lookahead or lookbehind, I don't think they are part of the regular expression options that the XPath/XSLT regular expression support. However, with Saxon Java, you can use the flag ;j'
to switch to Java regular expressions.
I tried
<xsl:param name="opening-parentesis" as="xs:string" expand-text="no"><![CDATA[\((?![\.-])]]></xsl:param>
with e.g.
<p>(.(-((ab(cd(.ef(-gh(</p>
and code like
<xsl:template match="p/text()">
<xsl:value-of select=". => replace($opening-parentesis, '', ';j')"/>
</xsl:template>
and it outputs
<p>(.(-abcd(.ef(-gh</p>
Using the CDATA and parameter is usually an easier way than needing escape your regular expression inside of an XML attribute.
Expanded to both regular expressions I get
<xsl:param name="opening-parentesis" as="xs:string" expand-text="no"><![CDATA[\((?![\.-])]]></xsl:param>
<xsl:param name="closing-parentesis" as="xs:string" expand-text="no"><![CDATA[(?<![\.-])\)]]></xsl:param>
<xsl:template match="p/text()">
<xsl:value-of select=". => replace($opening-parentesis, '', ';j') => replace($closing-parentesis, '', ';j')"/>
</xsl:template>
and
<p>(.(-((ab(cd(.ef(-gh(ij.)kl-)mn)op)</p>
is transformed into
<p>(.(-abcd(.ef(-ghij.)kl-)mnop</p>
<
– Bernd Moos Commented Jan 29 at 20:25