最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

regex - Python - Iterate once over string to find all substrings and their positions - Stack Overflow

programmeradmin1浏览0评论

I have the following python code using regex that finds the substrings "¬[", "[", "¬(", "(", ")", "]" and get their positions (I transformed the "¬[" and "¬(" into "[" and "(")

import re

expression = "¬[P∧¬(¬T∧R)]∧(T→¬P)"
# [[0 "¬["], [4 "¬("], [13 "("], [10 ")"], [18 ")"], [11 "]"]]

lsqb = [[match.start(), "["] for match in re.finditer("\¬\[|\[", expression)]
lpar = [[match.start(), "("] for match in re.finditer("\¬\(|\(", expression)]
rpar = [[match.start(), ")"] for  match in re.finditer("\)", expression)]
rsqb = [[match.start(), "]"] for match in re.finditer("\]", expression)]
all = lsqb + lpar + rpar + rsqb

print(lsqb) # [[0, '[']]
print(lpar) # [[4, '('], [13, '(']]
print(rpar) # [[10, ')'], [18, ')']]
print(rsqb) # [[11, ']']]

print(all) # [[0, '['], [4, '('], [13, '('], [10, ')'], [18, ')'], [11, ']']]

The issue is that I'm iterating over the string 4 times (once for each type of parentheses I want to find the position of... ) I'd like to get rid of all those parentheses variables and just have the "all" one while iterating only once over the string and still getting: [[0, '['], [4, '('], [13, '('], [10, ')'], [18, ')'], [11, ']']] as a result

I have the following python code using regex that finds the substrings "¬[", "[", "¬(", "(", ")", "]" and get their positions (I transformed the "¬[" and "¬(" into "[" and "(")

import re

expression = "¬[P∧¬(¬T∧R)]∧(T→¬P)"
# [[0 "¬["], [4 "¬("], [13 "("], [10 ")"], [18 ")"], [11 "]"]]

lsqb = [[match.start(), "["] for match in re.finditer("\¬\[|\[", expression)]
lpar = [[match.start(), "("] for match in re.finditer("\¬\(|\(", expression)]
rpar = [[match.start(), ")"] for  match in re.finditer("\)", expression)]
rsqb = [[match.start(), "]"] for match in re.finditer("\]", expression)]
all = lsqb + lpar + rpar + rsqb

print(lsqb) # [[0, '[']]
print(lpar) # [[4, '('], [13, '(']]
print(rpar) # [[10, ')'], [18, ')']]
print(rsqb) # [[11, ']']]

print(all) # [[0, '['], [4, '('], [13, '('], [10, ')'], [18, ')'], [11, ']']]

The issue is that I'm iterating over the string 4 times (once for each type of parentheses I want to find the position of... ) I'd like to get rid of all those parentheses variables and just have the "all" one while iterating only once over the string and still getting: [[0, '['], [4, '('], [13, '('], [10, ')'], [18, ')'], [11, ']']] as a result

Share Improve this question asked Mar 6 at 16:29 user29917130user29917130 272 bronze badges 5
  • 1 You need to use raw strings for the regular expressions. – Barmar Commented Mar 6 at 16:33
  • all is a builtin method that you are clobbering at the moment. – JonSG Commented Mar 6 at 16:48
  • 2 Is there a reason you want the output to be in that order? Why not the matches in order from left to right? – trincot Commented Mar 6 at 16:48
  • "I transformed the "¬[" and "¬(" into "[" and "("": then why do you even match the ¬ symbol? Just to have the starting index pointing at it? – trincot Commented Mar 6 at 16:52
  • I'm not going to use "all" as my actual variable, but thanks for pointing it out Yeah, I matched the "¬" just did it for the index, I had code that checked in a parentheses had a "¬" before it but I felt like that was really clunky. Also, the output doesn't have to be in any particular order, sorry for not specifying – user29917130 Commented Mar 6 at 17:51
Add a comment  | 

1 Answer 1

Reset to default 6

Use a single regular expression that matches all the patterns. You can use a capture group to extract the parenthesis after ¬.

Then loop over all the matches, generating the appropriate string in the result based on what was matched.

expression = "¬[P∧¬(¬T∧R)]∧(T→¬P)"
pattern = r'¬?([\[(])|([\])])'
all_matches = [(match.start(), match.group(1) or match.group(2))
                for match in re.finditer(pattern, expression)]
print(all_matches)
# [(0, '['), (4, '('), (10, ')'), (11, ']'), (13, '('), (18, ')')]

Each match will only match one side of the pipe, so match.group(1) or match.group(2) selects the matched parenthesis.

发布评论

评论列表(0)

  1. 暂无评论