最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Efficient Regexp matching starting from given index within string - Stack Overflow

programmeradmin0浏览0评论

I have already parsed a string up to index idx. My next parse step uses a Regexp. It needs to match the next part of the string, i.e. staring from position idx . How do I do this efficiently?

For example:

let myString = "<p>ONE</p><p>TWO</p>"
let idx

// some code not shown here parses the first paragraph
// and updates idx
idx = 10

// next parse step must continue from idx 
let myRegex = /<p>[^<]*<\/p>/
let subbed = myString.substring(idx)
let result = myRegex.exec(subbed)
console.log(result) // "<p>TWO</p>", not "<p>ONE</p>"

I have already parsed a string up to index idx. My next parse step uses a Regexp. It needs to match the next part of the string, i.e. staring from position idx . How do I do this efficiently?

For example:

let myString = "<p>ONE</p><p>TWO</p>"
let idx

// some code not shown here parses the first paragraph
// and updates idx
idx = 10

// next parse step must continue from idx 
let myRegex = /<p>[^<]*<\/p>/
let subbed = myString.substring(idx)
let result = myRegex.exec(subbed)
console.log(result) // "<p>TWO</p>", not "<p>ONE</p>"

But myString.substring(idx) seems like a quite expensive operation.

Are there no regex operations like this: result = myRegex.execFromIndex(idx, myString);?

In general, I want to start regex matching from different indexes so I can exclude parts of the string and avoid matches that are already parsed. So one time it can be from myString[0] another time myString[51] and so on.

Is there a way to do this efficiently? I'm parsing hundreds of thousands of lines and want to do this in an as cheap way as possible.

Share Improve this question edited Feb 9, 2022 at 6:25 Inigo 15k5 gold badges50 silver badges81 bronze badges asked Jan 18, 2017 at 15:43 mottossonmottosson 3,7735 gold badges39 silver badges81 bronze badges 8
  • 1 typo myString.length – Pranav C Balan Commented Jan 18, 2017 at 15:45
  • 1 if you really are concerned about efficiency, try not to use regex. – Faibbus Commented Jan 18, 2017 at 15:47
  • 4 Construct the regex instance and then set its .lastIndex property. Read the documentation. – Pointy Commented Jan 18, 2017 at 15:49
  • @Faibbus What do you suggest instead? I'm quite sure I won't be able to write a more efficient search than regex on my own. – mottosson Commented Jan 18, 2017 at 15:54
  • This question might get better answers if you would provide the logic by which you decide you need to search from a certain index only. – trincot Commented Jan 18, 2017 at 15:57
 |  Show 3 more comments

2 Answers 2

Reset to default 9

Use Regexp.exec and lastIndex

  1. Create a Regexp with the y or g flag
    • with the y flag, the match must start exactly at the specified start index
    • with the g flag, the match can occur anywhere after the specified index
  2. Set its lastIndex property to the start index
  3. Call exec

I've applied the above steps to your example code:

let myString = "<p>ONE</p><p>TWO</p>"
let idx

// some code not shown here parses the first paragraph
// and updates idx
idx = 10

// next parse step must continue from idx 
let myRegex = /<p>[^<]*<\/p>/y  // 
发布评论

评论列表(0)

  1. 暂无评论