I'm building a service where a user can submit "list" of links he recommends/likes (articles, sites.开发者_如何转开发..etc) and the system will show him a list of other links he may like as well.
The only idea in my mind to do this process is:
- User A will submit his list of links (e.g 10 links)
- The system will lookup for other users (say user B) list with 80% same links between User A and User B links
- Show the 20% new links (from User B list) to User A
Do you have any other way to do this? or do you have any open source project that doing the same thing?
Any language is ok but I'm more into Perl, PHP, Java, SQL
Your approach is simple to implement. However, as I'm reading it, you are pairing users and not links. What if you don't find any match to within 80% ?
I think a better approach would be to build a graph with links as nodes and a "similarity" score for the connections. You compute the score based on the number of times each 2 links appear in the same list.
When you want to make a recommendation for user A, you get the highest scoring link(s) for each of his links that aren't already in his list.
I think having exactly the same links is very unlikely. A better approach would be to download each link and create a word index and try to match the contents rather than links. Much like web search :)
精彩评论