最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Regex To Match &entity; or &#0-9; And Capture & - Stack Overflow

programmeradmin5浏览0评论

I'm trying to do a replace on the following string prototype: "I‘m singing & dancing in the rain." The following regular expression matches the instance properly, but also captures the character following the instance of &amp. "(&)[#?a-zA-Z0-9;]" captures the following string from the above prototype: "&l".

How can I limit it to only capture the &?

Edit: I should add that I don't want to match "&" by itself.

I'm trying to do a replace on the following string prototype: "I‘m singing & dancing in the rain." The following regular expression matches the instance properly, but also captures the character following the instance of &amp. "(&)[#?a-zA-Z0-9;]" captures the following string from the above prototype: "&l".

How can I limit it to only capture the &?

Edit: I should add that I don't want to match "&" by itself.

Share Improve this question edited Nov 19, 2009 at 16:22 sholsinger asked Nov 19, 2009 at 16:12 sholsingersholsinger 3,0882 gold badges24 silver badges41 bronze badges
Add a ment  | 

5 Answers 5

Reset to default 4

look for (this copes with named, decimal and hexadecimal entities):

&([A-Za-z]+|#x[\dA-Fa-f]+|#\d+);

replace with

&$1;

Be warned: This has a real probability to go wrong. I remend using a HTML parser to decode the text. You can decode it twice, if it was double encoded. HTML and regex don't play well together even on the small scale.

Since you are in JavaScript, I expect you are in a browser. If you are, you have a nice DOM parser at your hands. Create a new element, assign the string to its inner HTML property and read out the text value. Done.

I gather that you want to match &, but only if it is followed by an alphanumeric character or certain punctuation. That calls for lookahead. This regular expression should match what you want without capturing or consuming any additional characters.

(&)(?=[#?a-zA-Z0-9;])

Actually you're matching the string &l but captured is only the &. This is because of the character class after the capture group which will match an additional character.

But your original regex is a little flawed to begin with anyway. A (not optimal) replacement might be:

&(#[0-9]+|#x[0-9a-zA-Z]+|[a-zA-Z]+);

which will match the plete entity or character declaration and capture the &.

If you only want to match &, why did you include the character class [#?a-zA-Z0-9;] as well?

In english, your expression would be "Match & followed by a character that is #, ?, a lowercase letter, an uppercase letter or ;".

Just use (&)

You probably meant:

"&([#a-zA-Z0-9]+;)"
发布评论

评论列表(0)

  1. 暂无评论