python – Matching multiple regex patterns with the alternation operator?
python – Matching multiple regex patterns with the alternation operator?
From the documentation of re.findall
:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
While your regexp is matching the string three times, the (.*?)
group is empty for the second two matches. If you want the output of the other half of the regexp, you can add a second group:
>>> re.findall(r((.*?))|(w), (zyx)bc)
[(zyx, ), (, b), (, c)]
Alternatively, you could remove all the groups to get a simple list of strings again:
>>> re.findall(r(.*?)|w, (zyx)bc)
[(zyx), b, c]
You would need to manually remove the parentheses though.
Other answers have shown you how to get the result you need, but with the extra step of manually removing the parentheses. If you use lookarounds in your regex, you wont need to strip the parentheses manually:
>>> import re
>>> s = (zyx)bc
>>> print (re.findall(r(?<=()w+(?=))|w, s))
[zyx, b, c]
Explained:
(?<=() // lookbehind for left parenthesis
w+ // all characters until:
(?=)) // lookahead for right parenthesis
| // OR
w // any character
python – Matching multiple regex patterns with the alternation operator?
Lets take a look at our output using re.DEBUG
.
branch
literal 40
subpattern 1
min_repeat 0 65535
any None
literal 41
or
in
category category_word
Ouch, theres only one subpattern
in there but re.findall
only pulls out subpattern
s if one exists!
a = re.findall(r((.*?))|(.), (zyx)bc,re.DEBUG); a
[(zyx, ), (, b), (, c)]
branch
literal 40
subpattern 1
min_repeat 0 65535
any None
literal 41
or
subpattern 2
any None
Better. 🙂
Now we just have to make this into the format you want.
[i[0] if i[0] != else i[1] for i in a]
[zyx, b, c]