I want to parse certain character combinations which represent musical pitches from the begining of a string. For that I am using regular expressions (NSRegularExpression in Swift, if that matters). Here is the regex I have been using so far:
^[A-G][b#]?
Possible matches are A, Bb, D#, but not b alone.
Now I want to make it possible to detect the pitch names from lowercase letters as well. And here the problem begins.
So the new regex so far is:
^[A-Ga-g][b#]?
Now a single lowercase b will match, which is desired, as b is a musical pitch.
Furthermore I also want to recognise pitches described another way, i.e. as musical scale degrees in roman numeral notation. These look like this: V, #II, bVII.
My code first tries to get a match for an absolute pitch with the regex given before, and if that gives no result tries another regex for the roman numeral notation.
The problem is, that the optional characters b and # are used in both notations but in different places. AFTER the absolute pitch notation (like Gb) but BEFORE the roman numeral (like bV).
By allowing the absolute pitch letter name to be lowercase my first regex returns b as a match even when it is not the lowercase note letter b, but part of the roman numeral bV.
So what I want to do is to accept b as a match only if it is not followed by V or I. With the help of several tutorials I found out that a negative lookahead should be the solution, but I am struggling with the exact syntax for the regex.
ChatGPT is suggesting the following:
^[A-Ga-g](?!b[V|I])[b#]?
but this does not give the desired result. From bV I still get b as a match.
Another suggestion, also provided by ChatGPT is
^(?!.*b[V|I])[A-Ga-g][b#x]?
with the explanation that the (?!.*b[V|I]) ensures, that nowhere in the string bV or bI occurs. This regex does indeed give the desired result, but I am not sure about the explanation. I only want to test for these ocurrences at the beginning of the string.
My own idea would be:
^[A-Gab(?![V|I])c-g][b#x]?
but this does not pass the online regex tester.
Is it possible, to write a regex in which a single character b from a character list [a-g] is accepted only when not followed by a character from another list [VI]?
I want to parse certain character combinations which represent musical pitches from the begining of a string. For that I am using regular expressions (NSRegularExpression in Swift, if that matters). Here is the regex I have been using so far:
^[A-G][b#]?
Possible matches are A, Bb, D#, but not b alone.
Now I want to make it possible to detect the pitch names from lowercase letters as well. And here the problem begins.
So the new regex so far is:
^[A-Ga-g][b#]?
Now a single lowercase b will match, which is desired, as b is a musical pitch.
Furthermore I also want to recognise pitches described another way, i.e. as musical scale degrees in roman numeral notation. These look like this: V, #II, bVII.
My code first tries to get a match for an absolute pitch with the regex given before, and if that gives no result tries another regex for the roman numeral notation.
The problem is, that the optional characters b and # are used in both notations but in different places. AFTER the absolute pitch notation (like Gb) but BEFORE the roman numeral (like bV).
By allowing the absolute pitch letter name to be lowercase my first regex returns b as a match even when it is not the lowercase note letter b, but part of the roman numeral bV.
So what I want to do is to accept b as a match only if it is not followed by V or I. With the help of several tutorials I found out that a negative lookahead should be the solution, but I am struggling with the exact syntax for the regex.
ChatGPT is suggesting the following:
^[A-Ga-g](?!b[V|I])[b#]?
but this does not give the desired result. From bV I still get b as a match.
Another suggestion, also provided by ChatGPT is
^(?!.*b[V|I])[A-Ga-g][b#x]?
with the explanation that the (?!.*b[V|I]) ensures, that nowhere in the string bV or bI occurs. This regex does indeed give the desired result, but I am not sure about the explanation. I only want to test for these ocurrences at the beginning of the string.
My own idea would be:
^[A-Gab(?![V|I])c-g][b#x]?
but this does not pass the online regex tester.
Is it possible, to write a regex in which a single character b from a character list [a-g] is accepted only when not followed by a character from another list [VI]?
Share Improve this question edited 23 hours ago DarkBee 15.6k8 gold badges70 silver badges115 bronze badges asked 2 days ago MassMoverMassMover 5452 silver badges19 bronze badges 4 |2 Answers
Reset to default -1You described two, mutually exclusive, alternatives for notating notes:
- Roman letters followed by sharp/flat
- Roman numerals (I through VII) preceded by sharp/flat
That suggests developing two separate regex scans, one for letters and one for numerals, and tying them together with an alternation.
You have already got the letter scanner complete and the Roman numeral scanner is a tiny bit tricky if you want to reject malformed numbers. For the scan of I through VII you can use (I{1,3}|IV|VI{0,2})
. Now prepend [b#]?
, and put them together with an alternation (vertical bar).
^([b#]?(I{1,3}|IV|VI{0,2}))|([A-Ga-g][b#]?)
It is important to put the Roman numeral scanner first so a bV is scanned correctly.
Here is a sample using the Regex101 site:
You'd have to deal with each of the two formats separately with an alternation (|
). Also, I guess you don't want to match anything in strings like "Able", even though "Ab" on its own would be fine. So you'll need to add an assertion at the end of your regex to reject matches based on what follows right after.
I'd propose this one:
^(?:[A-Ga-g](?:[#b])?|[#b]?(?:I{1,3}|IV|VI{0,2}))(?![#\w])
Here are some test cases on regex101
However, there are some cases of ambiguity:
- "I" is a valid pitch, but it is also an English word, such as in "I like cheese".
- "A" is a valid pitch, but it is also an English word, such as in "A nice view".
...and probably we can come up with some other ambiguities. Depending on what you expect as actual input you might need to deal with those in a different way.
(?!b[VI])[a-g]
? This matchesa
,b
not followed withV
orI
,c
,d
,e
,f
, org
. Please provide some test cases. Also, note that[V|I]
is wrong, you need[VI]
since|
is a literal pipe char in the character class. – Wiktor Stribiżew Commented 2 days ago^(?!b[VI])[A-Ga-g][#bx]?
– bobble bubble Commented 2 days ago