python – What does the b character do in front of a string literal?

python – What does the b character do in front of a string literal?

Python 3.x makes a clear distinction between the types:

If youre familiar with:

  • Java or C#, think of str as String and bytes as byte[];
  • SQL, think of str as NVARCHAR and bytes as BINARY or BLOB;
  • Windows registry, think of str as REG_SZ and bytes as REG_BINARY.

If youre familiar with C(++), then forget everything youve learned about char and strings, because a character is not a byte. That idea is long obsolete.

You use str when you want to represent text.

print(שלום עולם)

You use bytes when you want to represent low-level binary data like structs.

NaN = struct.unpack(>d, bxffxf8x00x00x00x00x00x00)[0]

You can encode a str to a bytes object.

>>> uFEFF.encode(UTF-8)
bxefxbbxbf

And you can decode a bytes into a str.

>>> bxE2x82xAC.decode(UTF-8)
€

But you cant freely mix the two types.

>>> bxEFxBBxBF + Text with a UTF-8 BOM
Traceback (most recent call last):
  File <stdin>, line 1, in <module>
TypeError: cant concat bytes to str

The b... notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.

>>> bA == bx41
True

But I must emphasize, a character is not a byte.

>>> A == bA
False

In Python 2.x

Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:

  • unicode = u... literals = sequence of Unicode characters = 3.x str
  • str = ... literals = sequences of confounded bytes/characters
    • Usually text, encoded in some unspecified encoding.
    • But also used to represent binary data like struct.pack output.

In order to ease the 2.x-to-3.x transition, the b... literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x.

So yes, b... literals in Python have the same purpose that they do in PHP.

Also, just out of curiosity, are there
more symbols than the b and u that do
other things?

The r prefix creates a raw string (e.g., rt is a backslash + t instead of a tab), and triple quotes ... or ... allow multi-line string literals.

To quote the Python 2.x documentation:

A prefix of b or B is ignored in
Python 2; it indicates that the
literal should become a bytes literal
in Python 3 (e.g. when code is
automatically converted with 2to3). A
u or b prefix may be followed by
an r prefix.

The Python 3 documentation states:

Bytes literals are always prefixed with b or B; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

python – What does the b character do in front of a string literal?

The b denotes a byte string.

Bytes are the actual data. Strings are an abstraction.

If you had multi-character string object and you took a single character, it would be a string, and it might be more than 1 byte in size depending on encoding.

If took 1 byte with a byte string, youd get a single 8-bit value from 0-255 and it might not represent a complete character if those characters due to encoding were > 1 byte.

TBH Id use strings unless I had some specific low level reason to use bytes.

Leave a Reply

Your email address will not be published.