最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Referencing nested groups in JavaScript using string replace using regex - Stack Overflow

programmeradmin0浏览0评论

Because of the way that jQuery deals with script tags, I've found it necessary to do some HTML manipulation using regular expressions (yes, I know... not the ideal tool for the job). Unfortunately, it seems like my understanding of how captured groups work in JavaScript is flawed, because when I try this:

var scriptTagFormat = /<script .*?(src="(.*?)")?.*?>(.*?)<\/script>/ig;

html = html.replace(
    scriptTagFormat, 
    '<span class="script-placeholder" style="display:none;" title="$2">$3</span>');

The script tags get replaced with the spans, but the resulting title attribute is blank. Shouldn't $2 match the content of the src attribute of a script tag?

Because of the way that jQuery deals with script tags, I've found it necessary to do some HTML manipulation using regular expressions (yes, I know... not the ideal tool for the job). Unfortunately, it seems like my understanding of how captured groups work in JavaScript is flawed, because when I try this:

var scriptTagFormat = /<script .*?(src="(.*?)")?.*?>(.*?)<\/script>/ig;

html = html.replace(
    scriptTagFormat, 
    '<span class="script-placeholder" style="display:none;" title="$2">$3</span>');

The script tags get replaced with the spans, but the resulting title attribute is blank. Shouldn't $2 match the content of the src attribute of a script tag?

Share Improve this question edited Nov 24, 2017 at 13:14 ekad 14.6k26 gold badges46 silver badges48 bronze badges asked May 5, 2011 at 20:08 JacobJacob 78.9k24 gold badges157 silver badges241 bronze badges
Add a ment  | 

5 Answers 5

Reset to default 5

Nesting of groups is irrelevant; their numbering is determined strictly by the positions of their opening parentheses within the regex. In your case, that means it's group #1 that captures the whole src="value" sequence, and group #2 that captures just the value part.

Try this:

/<script (?:(?!src).)*(?:src="(.*?)")?.*?>(.*?)<\/script>/ig

See here: rubular

As stema wrote, the .*? matches too much. With the negative lookahead (?:(?!src).)* you will match only until a src attribute.

But actually in this case you could also just move the .*? into the optional part:

/<script (?:.*?src="(.*?)")?.*?>(.*?)<\/script>/ig

See here: rubular

The .*? matches too much because the following group is optional, ==> your src is matched from one of the .*? around. if you remove the ? after your first group it works.

Update: As @morja pointed out your solution is to move the first .*? into the optional src part.

Just for pleteness: /<script (?:.*?(src="(.*?)"))?.*?>(.*?)<\/script>/ig

You can see it here on rubular (corrected my link also)

If you don't want to use the content of the first capturing group, then make it a non capturing group using (?:)

/<script (?:.*?(?:src="(.*?)"))?.*?>(.*?)<\/script>/ig

Then your wanted result is in $1 and $2.

Could you post the html you are retrieving? Your code works fine in a simple example: jsfiddle (warning: alert box)

My first guess is that one of your script tags does not have a src meaning you are left with a single capture group (the script contents).

I'm thinking that regular expressions by themselves can't do exactly what I'm looking for, so here's my modification to work around the problem:

var scriptTagFormat = /<script\s+((.*?)="(.*?)")*\s*>(.*?)<\/script>/ig;

html = html.replace(
    scriptTagFormat, 
    '<span class="script-placeholder" style="display:none;" $1>$4</span>');

Before, I wanted to avoid setting non-standard attributes on the replacement span. This code blindly copies all attributes instead. Luckily, the non-standard attributes aren't stripped out of the DOM when I insert the HTML, so it will work for my purposes.

发布评论

评论列表(0)

  1. 暂无评论