javascript - Regex To Match &amp;entity; or &amp;#0-9; And Capture &amp;

I'm trying to do a replace on the following string prototype: "I&lsquo;m singing & dancing in the rain." The following regular expression matches the instance properly, but also captures the character following the instance of &amp. "(&)[#?a-zA-Z0-9;]" captures the following string from the above prototype: "&l".

How can I limit it to only capture the &?

Edit: I should add that I don't want to match "&" by itself.

How can I limit it to only capture the &?

Edit: I should add that I don't want to match "&" by itself.

Share Improve this question edited Nov 19, 2009 at 16:22 asked Nov 19, 2009 at 16:12 sholsinger 3,0882 gold badges24 silver badges41 bronze badges

Add a ment |

5 Answers 5

Sorted by: Reset to default 4

look for (this copes with named, decimal and hexadecimal entities):

&amp;([A-Za-z]+|#x[\dA-Fa-f]+|#\d+);

replace with

&$1;

Be warned: This has a real probability to go wrong. I remend using a HTML parser to decode the text. You can decode it twice, if it was double encoded. HTML and regex don't play well together even on the small scale.

Since you are in JavaScript, I expect you are in a browser. If you are, you have a nice DOM parser at your hands. Create a new element, assign the string to its inner HTML property and read out the text value. Done.

I gather that you want to match &, but only if it is followed by an alphanumeric character or certain punctuation. That calls for lookahead. This regular expression should match what you want without capturing or consuming any additional characters.

(&)(?=[#?a-zA-Z0-9;])

Actually you're matching the string &l but captured is only the &. This is because of the character class after the capture group which will match an additional character.

But your original regex is a little flawed to begin with anyway. A (not optimal) replacement might be:

&amp;(#[0-9]+|#x[0-9a-zA-Z]+|[a-zA-Z]+);

which will match the plete entity or character declaration and capture the &.

If you only want to match &, why did you include the character class [#?a-zA-Z0-9;] as well?

In english, your expression would be "Match & followed by a character that is #, ?, a lowercase letter, an uppercase letter or ;".

Just use (&)

You probably meant:

"&amp;([#a-zA-Z0-9]+;)"

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

javascript - Regex To Match &amp;entity; or &amp;#0-9; And Capture &amp; - Stack Overflow

5 Answers 5

与本文相关的文章

评论列表(0)