I have an application that lets users publish unstructured keywords. Simultaneously, other users can publish items that must be matched to one or more specified keywords. There is no restriction on the keywords either set of users may use, so simply hoping for a collision 开发者_运维技巧is likely to mean very few matches, when the reality is users might have used different keywords for the same thing or they are close enough (eg, 'bicycles' and 'cycling', or 'meat' and 'food').
I need this to work on mobile devices (Android), so I'm happy to sacrifice matching accuracy for efficiency and a small footprint. I know about s-match but this relies on a backing dictionary of 15MB, so it isn't ideal.
What other ideas/approaches/frameworks might help with this?
Your example of 'bicycles' and 'cycling' could be addressed by a take on the Levenshtein edit-distance algorithm since the two words are somewhat related. But your example of 'meat' and 'food' would indeed require a sizable backing dictionary, unless of course the concept set or target audience is limited to say, foodies.
Have you considered hosting the dictionary as a web service and accessing the data as needed? The drawback of course is that your app would only work while in network coverage.
精彩评论