I am parsing C source files.
I want to match all the variables (in snake-case format) that end in _VALUE
and don't begin with CANA_
, CANB_
... ,CANF_
. I need to match the whole variable name for later substitution.
This is my current setup with python
import re
def signal_ending_VALUE_updater(match: re.Match) -> str:
groups = match.groupdict()
return some_operation_on(group["SIGNAL_NAME"])
REGEX=r"(?<!CAN[A-F]_)\b(?P<SIGNAL_NAME>\w+_VALUE)\b"
with open(file_path,'r') as f:
content = f.read()
content_new = re.sub(REGEX,signal_ending_VALUE_updater,content)
Unfortunately this regex doesn't work all the times, for example if we try this testacase
test=" shared->option.mem = ((canAGetScuHmiVehReqLiftModBtnSt() == CANA_SCU_HMI_VEH_REQ_LIFT_MOD_BTN_ST_PRESSED_VALUE) ||"
re.find(REGEX,test)
Will return the variable (CANA_SCU_HMI...
) that I don't want to match.
What am I not considering in the regex?
The idea behind the regex is:
(?<!CAN[A-F]_)
: with a negative-lookbehind ensure the match does not start with CAN followed by one of the letters A, B, C, D, E, or F, and an underscore (_).\b
: word boundary, ensuring that we are matching whole words and not one part of a word(?P<SIGNAL_NAME>\w+_VALUE)
:(?P<SIGNAL_NAME>...)
: group match with the nameSIGNAL_NAME
\w
same as[a-zA-Z0-9_]
will match snakecase variable names+
ensures one or more of before_VALUE
matches the literal string _VALUE at the end of the variable name.
\b
This again is a word boundary that ensures the match ends right after the variable name.
I am parsing C source files.
I want to match all the variables (in snake-case format) that end in _VALUE
and don't begin with CANA_
, CANB_
... ,CANF_
. I need to match the whole variable name for later substitution.
This is my current setup with python
import re
def signal_ending_VALUE_updater(match: re.Match) -> str:
groups = match.groupdict()
return some_operation_on(group["SIGNAL_NAME"])
REGEX=r"(?<!CAN[A-F]_)\b(?P<SIGNAL_NAME>\w+_VALUE)\b"
with open(file_path,'r') as f:
content = f.read()
content_new = re.sub(REGEX,signal_ending_VALUE_updater,content)
Unfortunately this regex doesn't work all the times, for example if we try this testacase
test=" shared->option.mem = ((canAGetScuHmiVehReqLiftModBtnSt() == CANA_SCU_HMI_VEH_REQ_LIFT_MOD_BTN_ST_PRESSED_VALUE) ||"
re.find(REGEX,test)
Will return the variable (CANA_SCU_HMI...
) that I don't want to match.
What am I not considering in the regex?
The idea behind the regex is:
(?<!CAN[A-F]_)
: with a negative-lookbehind ensure the match does not start with CAN followed by one of the letters A, B, C, D, E, or F, and an underscore (_).\b
: word boundary, ensuring that we are matching whole words and not one part of a word(?P<SIGNAL_NAME>\w+_VALUE)
:(?P<SIGNAL_NAME>...)
: group match with the nameSIGNAL_NAME
\w
same as[a-zA-Z0-9_]
will match snakecase variable names+
ensures one or more of before_VALUE
matches the literal string _VALUE at the end of the variable name.
\b
This again is a word boundary that ensures the match ends right after the variable name.
1 Answer
Reset to default 2This part of your regex (?<!CAN[A-F]_)\b
asserts that this pattern CAN[A-F]_
does not occur directly to the left of the current position followed by a word boundary.
You get a match for this text CANA_SCU_HMI_VEH_REQ_LIFT_MOD_BTN_ST_PRESSED_VALUE
because at the beginning of that text, that assertion is true.
What you can do instead is start with a word boundary, and then assert that what is directly to the right does not match the pattern CAN[A-F]_
\b(?!CAN[A-F]_)(?P<SIGNAL_NAME>\w+_VALUE)\b
See a regex 101 demo
\b(?!
like\b(?!CAN[A-F]_)(?P<SIGNAL_NAME>\w+_VALUE)\b
regex101/r/ZLVEEn/1 – The fourth bird Commented 2 days ago\b
at the beginning that would make incorrect matches (like this one regex101/r/ZLVEEn/3). Why that\b
fixes it and instead I get that strange behaviour like in the link i provided? – Jhonathan Asimov Commented 2 days ago\b
then the engine move 1 character forward, and from that position there is a match because the assertion is true and there is no other "rule" like a word boundary – The fourth bird Commented 2 days ago