开发者

Algorithm for filling a matrix of item, item pairs

开发者 https://www.devze.com 2023-01-21 05:21 出处:网络
Hey guys, I have a sort of speed dating type application (not used for dating, just a similar concept) that compares users and matches them in a round based event.

Hey guys, I have a sort of speed dating type application (not used for dating, just a similar concept) that compares users and matches them in a round based event.

Currently I am storing each user to user comparison (using cosine similarity) and then finding a round in which both users are available. My current set up works fine for smaller scale but I seem to be missing a few matchings in larger data sets.

For example with a setup like so (assuming 6 users, 3 from each group)

Round (User1, User2)
----------------------------
1  (x1,y1)  (x2,y2)  (x3,y3)
2  (x1,y2)  (x2,y3)  (x3,y1)
3  (x1,y3)  (x2,y1)  (x3,y2)

My approach works well right now to ensure I have each user meeting the appropriate user without having overlaps so a user is left out, just not with larger data sets.

My current algorithm

I store a comparison of each user from x to each user from y like so

Round, user1, user2, similarity

And to build the event schedule I simply sort the comparisons by similarity and then iterate over the results, finding an open round for both users, like so:

event.user_maps.all(:order => 'similarity desc').each do |map|
  (1..event.rounds).each do |round|
    if user_free_in_round?(map.user1) and user_free_in_round?(map.user2)
      #creates the pairing and breaks from the loop
    end
  end
end

This isn't exact code but the general algorithm to build the schedule. Does anyone know a better way of filling in a matrix of item pairings where no one item can be in more than one place in the same slot?

EDIT

For some clarification, the issue I am having is that in larger sets my algorithm of placing highest similarity matches first can sometimes result in collisions. What I mean by that is that the users are paired in such a way that they have no other user to meet with.

Like so:

Round (User1, User2)
----------------------------
1  (x1,y1)  (x2,y2)  (x3,y3)
2  (x1,y3)  (x2,nil)  (x3,y1)
3  开发者_开发百科(x1,y2)  (x2,y1)  (x3,y2)

I want to be able to prevent this from happening while preserving the need for higher similar users given higher priority in scheduling.

In real scenarios there are far more matches than there are available rounds and an uneven number of x users to y users and in my test cases instead of getting every round full I will only have about 90% or so of them filled while collisions like the above are causing problems.


I think the question still needs clarification even after edit, but I could be missing something.

As far as I can tell, what you want is that each new round should start with the best possible matching (defined as sum of the cosine similarities of all the matched pairs). After any pair (x_i,y_j) have been matched in a round, they are not eligible for the next round.

You could do this by building a bipartite graph where your Xs are nodes in one side and Ys are nodes in another side, and the edge weight is cosine similarity. Then you find the max weighted match in this graph. For the next rounds, you eliminate the edges that have already been used in previous round and run the matching algorithm again. For details on how to code max weight matching in bipartite graph, see here.

BTW, this solution is not optimum since we are proceeding from one round to next in a greedy fashion. I have a feeling that getting the optimum solution would be NP hard, but I don't have a proof so can't be sure.


I agree that the question still needs clarification. As Amit expressed, I have a gut feeling that this is an NP hard problem, so I am assuming that you are looking for an approximate solution.

That said, I would need more information on the tradeoffs you would be willing to make (and perhaps I'm just missing something in your question). What are the explicit goals of the algorithm?

Is there a lower threshold for similarity below which you don't want a pairing to happen? I'm still a bit confused as to why there would be individuals which could not be paired up at all during a given round...

Essentially, you are performing a search over the space of possible pairings, correct? Maybe you could use backtracking or some form of constraint-based algorithm to make sure that you can obtain a complete solution for a given round...?

0

精彩评论

暂无评论...
验证码 换一张
取 消