I need to create regular expression which verifies if user inputs:
- 4 digits OR
- value like XXXXXX-YY, where X is roman numerals from I to XXXIII and YY is two latin characters (A-Z)
I need to create regular expression which verifies if user inputs:
- 4 digits OR
- value like XXXXXX-YY, where X is roman numerals from I to XXXIII and YY is two latin characters (A-Z)
- 2 See this other question: stackoverflow./questions/267399/… – satoshi Commented Feb 18, 2012 at 11:43
- @RobW, it can be from 1 to 6 characters since expected value is from I to XXXIII (i.e. from 1 to 33). – LA_ Commented Feb 18, 2012 at 12:09
3 Answers
Reset to default 3According to the requirements, these are possible roman number-formats. For readability, only the maximum number of X is shown.
XXX III (or: <empty>, I or II instead of III) XX V (or: IV, IX and X instead of IV)
I suggest this pact pattern:
/^(\d{4}|(?=[IVX])(X{0,3}I{0,3}|X{0,2}VI{0,3}|X{0,2}I?[VX])-[A-Z]{2})$/i
Explanation:
^ Begin of string
( Begin of group 1.
\d{4} 4 digits
| OR
(?=[IVX]) Look-ahead: Must be followed by a I, V or X
( Begin of group 2.
X{0,3}I{0,3} = 0 1 2 3 + { 0 ; 10 ; 20 ; 30} (roman)
| OR
X{0,2}VI{0,3} = 5 6 7 8 + { 0 ; 10 ; 20 } (roman)
| OR
X{0,2}I?[VX] = 4 9 + { 0 ; 10 ; 20 } (roman)
) End of group 2
-[A-Z]{2} Postfixed by a hyphen and two letters
) End of group 1.
$ End of string
Well the part that matches a Roman numeral between I and XXXIII is:
(?:X(?:X(?:V(?:I(?:I?I)?)?|X(?:I(?:I?I)?)?|I(?:[VX]|I?I)?)?|V(?:I(?:I?I)?)?|I(?:[VX]|I?I)?)?|V(?:I(?:I?I)?)?|I(?:[VX]|I?I)?)
As revealed by this:
#!/usr/bin/env perl
use Regexp::Assemble;
use Roman;
my $ra = new Regexp::Assemble;
for my $num (1..33) {
$ra->add(Roman($num));
}
print $ra->re, "\n";
function inputIsValid(value) {
var r = /(^[0-9]{4}$)|(^(?:(?:[X]{0,2}(?:[I](?:[XV]?|[I]{0,2})?|(?:[V][I]{0,3})?))|(?:[X]{3}[I]{0,3}))\-[A-Z]{2}$)/ig;
return value.match(r);
}
That will match either a 4-digit input, or a roman number (ranged 1 - 33) followed by a dash and two letters.
To explain the regex, below is an expanded source with ments:
// Test for a 4-digit number
( // Start required capturing group
^ // Start of string
[0-9]{4} // Test for 0-9, exactly 4 times
$ // End of string
) // End required capturing group
| // OR
// Test for Roman Numerals, 1 - 33, followed by a dash and two letters
( // Start required capturing group
^ // Start of string
(?: // Start required non-capturing group
// Test for 1 - 29
(?: // Start required non-capturing group
// Test for 10, 20, (and implied 0, although the Romans did not have a digit, or mathematical concept, for 0)
[X]{0,2} // X, optionally up to 2 times
(?: // Start required non-capturing group
// Test for 1 - 4, and 9
[I] // I, exactly once (I = 1)
(?: // Start optional non-capturing group
// IV = 4, IX = 9
[XV]? // Optional X or V, exactly once
| // OR
// II = 2, III = 3
[I]{0,2} // Optional I, up to 2 times
)? // End optional non-capturing group
| // OR
// Test for 5 - 8
(?: // Start optional non-capturing group
[V][I]{0,3} // Required V, followed by optional I, up to 3 times
)? // End optional non-capturing group
) // End required non-capturing group
) // End required non-capturing group
| // OR
// Test for 30 - 33
(?: // Start required non-capturing group
// Test for 30
[X]{3} // X exactly 3 times
// Test for 1 - 3
[I]{0,3} // Optional I, up to 3 times
) // End required non-capturing group
) // End required non-capturing group
// Test for dash and two letters
\- // Literal -, exactly 1 time
[A-Z]{2} // Alphabetic character, exactly 2 times
$ // End of string
) // End required capturing group
The 4-digit number and trailing \-[A-Z]{2}
were (to me) self-evident. My method for the Roman Numerals was to:
- Open Excel Populate a column with 1-33.
- Convert that column to Roman Numerals (in all 7 different varieties).
- Check to see if any of the varieties were different from 1-33 (they weren't).
- Fiddled with moving the Roman Numerals into the minimum number of unique patterns that limited them to 33 (i.e, "then shalt thou count to thirty-three, no more, no less. Thirty-three shall be the number thou shalt count, and the number of the counting shall be thirty-three. Thirty-four shalt thou not count, neither count thou thirty-two, excepting that thou then proceed to thirty-three. Thirty-five is right out.")
- Realized that up to thirty-nine is a single pattern (
^(([X]{0,3}([I]([XV]?|[I]{0,2})?|([V][I]{0,3})?)))$
, changed to capturing groups for better clarity). - Changed pattern to allow up to twenty-nine.
- Added another to allow thirty to thirty-nine.
- Construct the whole pattern and test in RegexBuddy (an invaluable tool for this stuff) against digits 0 - 20,000 and Roman Numerals 1 - 150 followed by "-AA".
- The pattern worked, so I posted it (then grabbed another cup o' coffee and self-administered an 'atta-boy' for pleting what I thought was a lovely Saturday morning challenge).
By extraneous brackets, I assume you mean the non-capturing groups (?: ... )
. I use those a lot to group things (and the grouping is quite necessary here). I made them non-capturing because I do not need to capture the sub-groups, only the parent groups (and in this use case I don't think they need to actually be captured either, but it doesn't hurt to do so). By making them non-capturing, they won't create backreferences which speeds up processing (though for a single input, the time gained is negligible).