Does Python have a string contains substring method?
Does Python have a string contains substring method?
You can use the in
operator:
if blah not in somestring:
continue
If its just a substring search you can use string.find(substring)
.
You do have to be a little careful with find
, index
, and in
though, as they are substring searches. In other words, this:
s = This be a string
if s.find(is) == -1:
print(No is here!)
else:
print(Found is in the string.)
It would print Found is in the string.
Similarly, if is in s:
would evaluate to True
. This may or may not be what you want.
Does Python have a string contains substring method?
Does Python have a string contains substring method?
99% of use cases will be covered using the keyword, in
, which returns True
or False
:
substring in any_string
For the use case of getting the index, use str.find
(which returns -1 on failure, and has optional positional arguments):
start = 0
stop = len(any_string)
any_string.find(substring, start, stop)
or str.index
(like find
but raises ValueError on failure):
start = 100
end = 1000
any_string.index(substring, start, end)
Explanation
Use the in
comparison operator because
- the language intends its usage, and
- other Python programmers will expect you to use it.
>>> foo in **foo**
True
The opposite (complement), which the original question asked for, is not in
:
>>> foo not in **foo** # returns False
False
This is semantically the same as not foo in **foo**
but its much more readable and explicitly provided for in the language as a readability improvement.
Avoid using __contains__
The contains method implements the behavior for in
. This example,
str.__contains__(**foo**, foo)
returns True
. You could also call this function from the instance of the superstring:
**foo**.__contains__(foo)
But dont. Methods that start with underscores are considered semantically non-public. The only reason to use this is when implementing or extending the in
and not in
functionality (e.g. if subclassing str
):
class NoisyString(str):
def __contains__(self, other):
print(ftesting if {other} in {self})
return super(NoisyString, self).__contains__(other)
ns = NoisyString(a string with a substring inside)
and now:
>>> substring in ns
testing if substring in a string with a substring inside
True
Dont use find
and index
to test for contains
Dont use the following string methods to test for contains:
>>> **foo**.index(foo)
2
>>> **foo**.find(foo)
2
>>> **oo**.find(foo)
-1
>>> **oo**.index(foo)
Traceback (most recent call last):
File <pyshell#40>, line 1, in <module>
**oo**.index(foo)
ValueError: substring not found
Other languages may have no methods to directly test for substrings, and so you would have to use these types of methods, but with Python, it is much more efficient to use the in
comparison operator.
Also, these are not drop-in replacements for in
. You may have to handle the exception or -1
cases, and if they return 0
(because they found the substring at the beginning) the boolean interpretation is False
instead of True
.
If you really mean not any_string.startswith(substring)
then say it.
Performance comparisons
We can compare various ways of accomplishing the same goal.
import timeit
def in_(s, other):
return other in s
def contains(s, other):
return s.__contains__(other)
def find(s, other):
return s.find(other) != -1
def index(s, other):
try:
s.index(other)
except ValueError:
return False
else:
return True
perf_dict = {
in:True: min(timeit.repeat(lambda: in_(superstring, str))),
in:False: min(timeit.repeat(lambda: in_(superstring, not))),
__contains__:True: min(timeit.repeat(lambda: contains(superstring, str))),
__contains__:False: min(timeit.repeat(lambda: contains(superstring, not))),
find:True: min(timeit.repeat(lambda: find(superstring, str))),
find:False: min(timeit.repeat(lambda: find(superstring, not))),
index:True: min(timeit.repeat(lambda: index(superstring, str))),
index:False: min(timeit.repeat(lambda: index(superstring, not))),
}
And now we see that using in
is much faster than the others.
Less time to do an equivalent operation is better:
>>> perf_dict
{in:True: 0.16450627865128808,
in:False: 0.1609668098178645,
__contains__:True: 0.24355481654697542,
__contains__:False: 0.24382793854783813,
find:True: 0.3067379407923454,
find:False: 0.29860888058124146,
index:True: 0.29647137792585454,
index:False: 0.5502287584545229}
How can in
be faster than __contains__
if in
uses __contains__
?
This is a fine follow-on question.
Lets disassemble functions with the methods of interest:
>>> from dis import dis
>>> dis(lambda: a in b)
1 0 LOAD_CONST 1 (a)
2 LOAD_CONST 2 (b)
4 COMPARE_OP 6 (in)
6 RETURN_VALUE
>>> dis(lambda: b.__contains__(a))
1 0 LOAD_CONST 1 (b)
2 LOAD_METHOD 0 (__contains__)
4 LOAD_CONST 2 (a)
6 CALL_METHOD 1
8 RETURN_VALUE
so we see that the .__contains__
method has to be separately looked up and then called from the Python virtual machine – this should adequately explain the difference.