python – Find the similarity metric between two strings
python – Find the similarity metric between two strings
There is a built in.
from difflib import SequenceMatcher
def similar(a, b):
return SequenceMatcher(None, a, b).ratio()
Using it:
>>> similar(Apple,Appel)
0.8
>>> similar(Apple,Mango)
0.0
I think maybe you are looking for an algorithm describing the distance between strings. Here are some you may refer to:
python – Find the similarity metric between two strings
Solution #1: Python builtin
use SequenceMatcher from difflib
pros:
native python library, no need extra package.
cons: too limited, there are so many other good algorithms for string similarity out there.
example :
>>> from difflib import SequenceMatcher
>>> s = SequenceMatcher(None, abcd, bcde)
>>> s.ratio()
0.75
Solution #2: jellyfish library
its a very good library with good coverage and few issues.
it supports:
– Levenshtein Distance
– Damerau-Levenshtein Distance
– Jaro Distance
– Jaro-Winkler Distance
– Match Rating Approach Comparison
– Hamming Distance
pros:
easy to use, gamut of supported algorithms, tested.
cons: not native library.
example:
>>> import jellyfish
>>> jellyfish.levenshtein_distance(ujellyfish, usmellyfish)
2
>>> jellyfish.jaro_distance(ujellyfish, usmellyfish)
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance(ujellyfish, ujellyfihs)
1