python – raw strings in Python3, regular expressions
python – raw strings in Python3, regular expressions
According to the rules of interpolation:
n
becomes the ascii byte 0x0A; this applies to your first string to match.
rn
becomes the literal n
, that is followed by
n
; this applies to the second string to match.
\n
becomes the literal n
; this applies to your first pattern string.
r\n
becomes the literal \n
; this applies to the second pattern string.
When you perform the matching there is another round of interpolation done on patterns by re.search
:
the literal n
turns into the ascii byte 0x0A (first pattern)
the literal \n
turns into the literal n
(second pattern)
So in the end your first string matches the first pattern as both contain ascii 0x0A,
and the second string matches the second pattern as both contain literal n
.
Thats it, no mystery here.
A raw string essentially tells the system to read the backslashes in the following string as what they are – backslashes. So,
print(rhinhi)
Prints out hinhi
.
However, the system treats backslashes in non-raw strings as a method to escape out the following character. Hence,
print(hinhi)
Prints:
hi
hi
So, the n
in non-raw strings becomes a newline.
In your code, pattern
contains a string with a newline, not a backslash and n. Were you to use pattern = rn
, pattern
would contain a backslash and n, but not a newline
Hence, searching for a \n
in the string, essentially tells the system to escape out a (thus, it searches for a backslash) followed by
n
.
First of all, lets clarify: c
contains a newline, and d
contains n
, literally. This can be verified by printing the strings.
-
When you search for
\n
, the regex pattern searches for a newline. So,c
matches, butd
does not match. -
When
pattern = n
thenc
matches, butd
does not. -
When
pattern = r\n
, thend
matches, butc
does not.