python – Any reason not to use + to concatenate two strings?

python – Any reason not to use + to concatenate two strings?

There is nothing wrong in concatenating two strings with +. Indeed its easier to read than .join([a, b]).

You are right though that concatenating more than 2 strings with + is an O(n^2) operation (compared to O(n) for join) and thus becomes inefficient. However this has not to do with using a loop. Even a + b + c + ... is O(n^2), the reason being that each concatenation produces a new string.

CPython2.4 and above try to mitigate that, but its still advisable to use join when concatenating more than 2 strings.

Plus operator is perfectly fine solution to concatenate two Python strings. But if you keep adding more than two strings (n > 25) , you might want to think something else.

.join([a, b, c]) trick is a performance optimization.

python – Any reason not to use + to concatenate two strings?

The assumption that one should never, ever use + for string concatenation, but instead always use .join may be a myth. It is true that using + creates unnecessary temporary copies of immutable string object but the other not oft quoted fact is that calling join in a loop would generally add the overhead of function call. Lets take your example.

Create two lists, one from the linked SO question and another a bigger fabricated

>>> myl1 = [A,B,C,D,E,F]
>>> myl2=[chr(random.randint(65,90)) for i in range(0,10000)]

Lets create two functions, UseJoin and UsePlus to use the respective join and + functionality.

>>> def UsePlus():
    return [myl[i] + myl[i + 1] for i in range(0,len(myl), 2)]

>>> def UseJoin():
    [.join((myl[i],myl[i + 1])) for i in range(0,len(myl), 2)]

Lets run timeit with the first list

>>> myl=myl1
>>> t1=timeit.Timer(UsePlus(),from __main__ import UsePlus)
>>> t2=timeit.Timer(UseJoin(),from __main__ import UseJoin)
>>> print %.2f usec/pass % (1000000 * t1.timeit(number=100000)/100000)
2.48 usec/pass
>>> print %.2f usec/pass % (1000000 * t2.timeit(number=100000)/100000)
2.61 usec/pass
>>> 

They have almost the same runtime.

Lets use cProfile

>>> myl=myl2
>>> cProfile.run(UsePlus())
         5 function calls in 0.001 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.001    0.001 <pyshell#1376>:1(UsePlus)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method disable of _lsprof.Profiler objects}
        1    0.000    0.000    0.000    0.000 {range}


>>> cProfile.run(UseJoin())
         5005 function calls in 0.029 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.015    0.015    0.029    0.029 <pyshell#1388>:1(UseJoin)
        1    0.000    0.000    0.029    0.029 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method disable of _lsprof.Profiler objects}
     5000    0.014    0.000    0.014    0.000 {method join of str objects}
        1    0.000    0.000    0.000    0.000 {range}

And it looks that using Join, results in unnecessary function calls which could add to the overhead.

Now coming back to the question. Should one discourage the use of + over join in all cases?

I believe no, things should be taken into consideration

  1. Length of the String in Question
  2. No of Concatenation Operation.

And off-course in a development pre-mature optimization is evil.

Leave a Reply

Your email address will not be published. Required fields are marked *