开发者

How to use SequenceMatcher to find similarity between two strings?

开发者 https://www.devze.com 2023-02-07 06:10 出处:网络
import difflib a=\'abcd\' b=\'ab123\' seq=d开发者_如何学Cifflib.SequenceMatcher(a=a.lower(),b=b.lower())
import difflib

a='abcd'
b='ab123'
seq=d开发者_如何学Cifflib.SequenceMatcher(a=a.lower(),b=b.lower())
seq=difflib.SequenceMatcher(a,b)
d=seq.ratio()*100
print d

I used the above code but obtained output is 0.0. How can I get a valid answer?


You forgot the first parameter to SequenceMatcher.

>>> import difflib
>>> 
>>> a='abcd'
>>> b='ab123'
>>> seq=difflib.SequenceMatcher(None, a,b)
>>> d=seq.ratio()*100
>>> print d
44.4444444444

http://docs.python.org/library/difflib.html


From the docs:

The SequenceMatcher class has this constructor:

class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True)

The problem in your code is that by doing

seq=difflib.SequenceMatcher(a,b)

you are passing a as value for isjunk and b as value for a, leaving the default '' value for b. This results in a ratio of 0.0.

One way to overcome this (already mentioned by Lennart) is to explicitly pass None as extra first parameter so all the keyword arguments get assigned the correct values.

However I just found, and wanted to mention another solution, that doesn't touch the isjunk argument but uses the set_seqs() method to specify the different sequences.

>>> import difflib
>>> a = 'abcd'
>>> b = 'ab123'
>>> seq = difflib.SequenceMatcher()
>>> seq.set_seqs(a.lower(), b.lower())
>>> d = seq.ratio()*100
>>> print d
44.44444444444444
0

精彩评论

暂无评论...
验证码 换一张
取 消