How is split(n) method in Python implemented?

How is split(n) method in Python implemented?

The implementation of str.split() internally calls list.append(), which in turn calls the internal function list_resize(). From a comment in the source code of this function:

This over-allocates proportional to the list size, making room
for additional growth. The over-allocation is mild, but is
enough to give linear-time amortized behavior over a long
sequence of appends() in the presence of a poorly-performing
system realloc().

The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, …

If youre looking for the actual code implementing it, try this:
http://svn.python.org/view/python/trunk/Objects/stringlib/split.h?view=markup

For the basic split start looking around line 148.

Short summary: They cycle through the string looking for the defined split character, then add to the output tuple the string between the last find and the current one (or start of string for the 1st case) using PyList_Append. At the end they add the remainder of the string to the tuple.

They have placeholders in there to allocate more space to the results tuple when it reaches the current max size, as well as separate functions for checking a single split character versus another split string (i.e. if you wanted to split on /t as two characters you could, via a separate function).

How is split(n) method in Python implemented?

I think (though I havent re-checked the code) that the split() method counts the number of newline in the string and then just allocates a list of the correct size.

However, all Python lists overallocate so repeatedly appending to them is amortized linear time.

Leave a Reply

Your email address will not be published. Required fields are marked *