最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Getting all subgroups with a regex match - Stack Overflow

programmeradmin4浏览0评论

Given the string:

 © 2010 Women’s Flat Track Derby Association (WFTDA) 

I want:

2010 -- Women's -- Flat
Women's -- Flat -- Track
Track -- Derby -- Association

I'm using regex:

([a-zA-Z]+)\s([A-Z][a-z]*)\s([a-zA-Z]+)

It's only returning:

s -- Flat -- Track

Given the string:

 © 2010 Women’s Flat Track Derby Association (WFTDA) 

I want:

2010 -- Women's -- Flat
Women's -- Flat -- Track
Track -- Derby -- Association

I'm using regex:

([a-zA-Z]+)\s([A-Z][a-z]*)\s([a-zA-Z]+)

It's only returning:

s -- Flat -- Track
Share Improve this question edited Nov 16, 2010 at 22:11 Bart Kiers 170k37 gold badges306 silver badges295 bronze badges asked Nov 16, 2010 at 22:09 CaveatrobCaveatrob 13.3k33 gold badges110 silver badges189 bronze badges 1
  • Sorry - it's ultraedit JS, so probably javascript would work. – Caveatrob Commented Nov 16, 2010 at 22:10
Add a ment  | 

2 Answers 2

Reset to default 12

This problem isn't straightforward, but to understand why, you need to understand how the regular expression engine operates on your string.

Let's consider the pattern [a-z]{3} (match 3 successive characters between a and z) on the target string abcdef. The engine starts from the left side of the string (before the a), and sees that a matches [a-z], so it advances one position. Then, it sees that b matches [a-z] and advances again. Finally, it sees that c matches, advances again (to before d) and returns abc as a match.

If the engine is set up to return multiple matches, it will now try to match again, but it keeps its positional information (so, like above, it'll match and return def).

Because the engine has already moved past the b while matching abc, bcd will never be considered as a match. For this same reason, in your expression, once a group of words is matched, the engine will never consider words within the first match to be a part of the next one.


In order to get around this, you need to use capturing groups inside of lookaheads to collect matching words that appear later in the string:

var str = "2010 Women's Flat Track Derby Association",
    regex = /([a-z0-9']+)(?=\s+([a-z0-9']+)\s+([a-z0-9']+))/ig,
    match;

while (match = regex.exec(str))
{
    var group1 = match[1], group2 = match[2], group3 = match[3];
    console.log("Found match: " + group1 + " -- " + group2 + " -- " + group3);
}

This results in:

2010 -- Women's -- Flat
Women's -- Flat -- Track
Flat -- Track -- Derby
Track -- Derby -- Association

See this in action at http://jsfiddle/jRgXm/.

The regular expression searches for what you seem to be defining as a word ([a-z0-9']+), and captures it into subgroup 1, and then uses a lookahead (which is a zero-width assertion, so it doesn't advance the engine's cursor), that captures the next two words into subgroups 2 and 3.

However, if you are using the actual Javascript engine, you must RegExp.exec and loop over the results (see this question for a discussion of why) or use the new matchAll method (ES2020). I don't know how UltraEdit's engine is implemented, but hopefully it can do a global search and also collect subgroups.

Just for pleteness, here's the example above using ES2020' matchAll (the first element in each returned array is the total match, then the subsequent elements are the capture groups):

const str = "2010 Women's Flat Track Derby Association";
const regex = /([a-z0-9']+)(?=\s+([a-z0-9']+)\s+([a-z0-9']+))/ig;

console.log([...str.matchAll(regex)]);

I'm using some generic regex tester, so I can't guarantee it will work for you but...

([A-Z0-9][\w’]+)\s([A-Z][\w]+)\s([A-Z][\w]+)

Three words starting with a number or capital letter followed by letters/numbers or that funky apostrophe, separated by spaces. Works for me.

Edit: I assume you can loop through, repeating the matcher in JS i've never used it.

发布评论

评论列表(0)

  1. 暂无评论