Is there a hash function that is stable to small changes in text? I'm looking for the opposite of a cryptographic hash, where small changes in the source lead to huge changes in the result.
Something like a perceptual hash for text. Is there such a thing?
Edited: by "small changes i开发者_StackOverflow中文版n text" I mean changes in punctuation, correction of ortographic / grammatical mistakes, etc. The text itself is an article, like a wikipedia entry (but it can be much smaller, like 2 or 3 paragraphs).
Bonus points if somebody can point to a Python implementation.
You're looking for locality sensitive hashing.
精彩评论