最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

javascript - Bible Verse Regex - Stack Overflow

programmeradmin0浏览0评论

I am trying to match bible verses that can be any of these formats:

1 John 4:5 - 6
2 john 4:5 - 4:6
3 john 4:5 - 3 John 4:6
John 4:5 - 6
john 4:5 - 4:6
John 4:5 - 1 John 4:6
1john4:6
john 4
john 4-5
1 john 4-5

-any spaces in the above examples should be ignored when matched -any of the above can appear anywhere in a string of text:

text this is text John 4:5 - 1 John 4:6 text text john 4-5 more text

this is what I have but barely works and doesnt match correctly in a long string of text:

\b[a-zA-Z]+(?:\s+\d+)?(?::\d+(?:–\d+)?(?:,\s*\d+(?:–\d+)?)*)?

I am trying to match bible verses that can be any of these formats:

1 John 4:5 - 6
2 john 4:5 - 4:6
3 john 4:5 - 3 John 4:6
John 4:5 - 6
john 4:5 - 4:6
John 4:5 - 1 John 4:6
1john4:6
john 4
john 4-5
1 john 4-5

-any spaces in the above examples should be ignored when matched -any of the above can appear anywhere in a string of text:

text this is text John 4:5 - 1 John 4:6 text text john 4-5 more text

this is what I have but barely works and doesnt match correctly in a long string of text:

\b[a-zA-Z]+(?:\s+\d+)?(?::\d+(?:–\d+)?(?:,\s*\d+(?:–\d+)?)*)?
Share Improve this question edited Mar 7, 2014 at 15:57 Liam 29.7k28 gold badges137 silver badges200 bronze badges asked Mar 7, 2014 at 15:55 user3071933user3071933 2071 gold badge4 silver badges10 bronze badges 4
  • 3 What does it match, being that it 'barely works'? What doesn't it match? What should it match and should it not match? – George Stocker Commented Mar 7, 2014 at 15:59
  • 6 So a regular expression to match an irregular pattern? Good luck! – David Thomas Commented Mar 7, 2014 at 16:00
  • Writing something that organises your data is likely the beststart , no point in letting your application code see the data until it is nice and tidy – Rob Sedgwick Commented Mar 7, 2014 at 16:02
  • 2 I was thinking of meeting up with John 4:10-4:15 - what do you think? :-D – Code Jockey Commented Mar 7, 2014 at 16:07
Add a comment  | 

4 Answers 4

Reset to default 9

Let's break down your format.

First of all, the main thing I see is that "there can be a dash followed by stuff" so let's split this problem up into two parts: first deal with the start bit, then the optional dash and end bit.

Your first bit is focussed around the name, and there may be a number before it. After it there is a number, which may be followed by a colon then another number. So we have:

(\d*)\s*([a-z]+)\s*(\d+)(?::(\d+))?

Now for the bit after the dash. It's a number, which may be followed by the name and another number. The whole thing may then be followed by a colon and another number. And remember the whole thing is optional:

(\s*-\s*(\d+)(?:\s*([a-z]+)\s*(\d+))?(?::(\d+))?)?

Put the two together and wrap it in a literal with case-insensitivity and you get:

/(\d*)\s*([a-z]+)\s*(\d+)(?::(\d+))?(\s*-\s*(\d+)(?:\s*([a-z]+)\s*(\d+))?(?::(\d+))?)?/i

Which, depending on how devout you are, may be described by any variety of colourful language.

But since when were Regexes pretty?

Anyway, in your result match, you will have:

  1. Initial number
  2. Name
  3. Second number
  4. Number after the colon
  5. Number after the dash
  6. Second name
  7. Number after the name
  8. Final number after the second colon

Of course, any of these can be empty, except for 2 and 3.

This is as specific as one could get, utilizing stuff like an optional capital letter at the start so things like "jOhn" don't match.

(?:\d\s*)?[A-Z]?[a-z]+\s*\d+(?:[:-]\d+)?(?:\s*-\s*\d+)?(?::\d+|(?:\s*[A-Z]?[a-z]+\s*\d+:\d+))?

You can try this:

/(?:\d+ ?)?[a-z]+ ?\d+(?:(?::\d+)?(?: ?- ?(?:\d+ [a-z]+ )?\d+(?::\d+)?)?)?/i

FWIW I've found that RegexPal to be a huge help in these cases. Here's what I ended up with:

([\d ]*[a-zA-Z]+( \d*:\d*)?)(( - )| )?(((\d* )?[a-zA-Z]+ )?\d*([:-]+\d*)?)

Which breaks down as:

// zero of more digit(s) or a space
[\d ]*

// any number of upper/lowercase letters
[a-zA-Z]+

// a space followed by an optional any number of digits, a colon,
// and any number of digits again
( \d*:\d*)?)

// an optional hyphen with a space either side, or a space.
(( - )| )

Repeat for the other side of the optional hyphen except for this difference:

// one or more of either a colon or a hyphen
[:-]+
发布评论

评论列表(0)

  1. 暂无评论