python – raw strings in Python3, regular expressions

python – raw strings in Python3, regular expressions

According to the rules of interpolation:

n becomes the ascii byte 0x0A; this applies to your first string to match.
rn becomes the literal n, that is followed by n; this applies to the second string to match.

\n becomes the literal n; this applies to your first pattern string.
r\n becomes the literal \n; this applies to the second pattern string.

When you perform the matching there is another round of interpolation done on patterns by re.search:
the literal n turns into the ascii byte 0x0A (first pattern)
the literal \n turns into the literal n (second pattern)

So in the end your first string matches the first pattern as both contain ascii 0x0A,
and the second string matches the second pattern as both contain literal n.

Thats it, no mystery here.

A raw string essentially tells the system to read the backslashes in the following string as what they are – backslashes. So,

print(rhinhi)

Prints out hinhi.
However, the system treats backslashes in non-raw strings as a method to escape out the following character. Hence,

print(hinhi)

Prints:

hi
hi

So, the n in non-raw strings becomes a newline.


In your code, pattern contains a string with a newline, not a backslash and n. Were you to use pattern = rn, pattern would contain a backslash and n, but not a newline

Hence, searching for a \n in the string, essentially tells the system to escape out a (thus, it searches for a backslash) followed by n.
First of all, lets clarify: c contains a newline, and d contains n, literally. This can be verified by printing the strings.

  • When you search for \n, the regex pattern searches for a newline. So, c matches, but d does not match.

  • When pattern = n then c matches, but d does not.

  • When pattern = r\n, then d matches, but c does not.

python – raw strings in Python3, regular expressions

Leave a Reply

Your email address will not be published. Required fields are marked *