I have a set of r reviewers who are rating a set of n objects. Each reviewer independently produces an ordered list of the objects he or she chooses to rank. The goal is to produce one list that is the collation of the various ordered lists. We can assume that each reviewer's point of view is equally weighted.
This differs from most merging and ordered list questions in that there is no global ordering. One reviewer can rate A > B while another can rate B > A. As mentioned, each object is not necessarily rated by each reviewer.
My current thought is to decompose each reviewer's list into a set o开发者_开发技巧f ordered tuples for each of the m * (m-1) * .5 unique pairs of entries in the list, where m is the number of objects rated. Now take all the tuples from all reviewers. For a given combination (a,b) find all such tuples and take the majority vote (of those voting) as the determinator of whether a < b.
Now I have a set of ordered tuples that represents the wisdom of all. But how do I turn these into one ordered list? I can start with a randomly chosen pair of objects, and order them, then add another one in the right order, but the output will depend on which one I choose to start with. Also there may be loops.
I'd appreciate any ideas.
The problem space you're navigating is essentially (isomorphic to) a subset of voting theory, that part where both ballots and outcomes are ordered lists of candidates.
You might benefit from reading the following:
- Condorcet Method -- gives a useful "correctness" criterion.
- Schulze Method -- gives an O(n3) algorithm for getting there.
- Voting System -- gives you other desirable criteria and compares different approaches.
- Borda Count -- essentially the same as the currently accepted answer.
- Arrow's Impossibility Theorem -- shows why you can't both have your cake and eat it too.
Based on my knowledge of voting theory, I'll give you a recommendation: if you reasonably believe that an O(n3) algorithm is workable for your particular data set, try out the Schulze Method. Otherwise, Borda Count is the only listed method which runs in O(n) time and takes a ranking as ballot inputs, so stick with that.
A solution that seems elegant and still does what it needs to do would be to convert each ordering into a score from 1 to 0, where 1 is the first (top) ranked item in a given reviewer's list and 0 is their last (bottom item) and all items in between get a linearly scaled score. So if reviewer 1 ranks just 3 items, they would get scores for that list of 1, 0.5, and 0. Then you simply take the mean score of every item to generate a collated list. Ties could be broken by the number of "reviews" for an item (so that an item unanimously marked as best by 3 reviewers would appear higher in the final list than an item unanimously marked as best by 2 reviewers, etc.)
Your requirement "The goal is to produce one list that is the collation of the various ordered lists. We can assume that each reviewer's point of view is equally weighted." is definitely met by this simple algorithm, but often problems like this have a bit more complex requirements once you dig into them.
Merging rankings is somewhat non-trivial. I think maybe you need to have a clearer idea of what you mean by "the collation of the ordered lists" - i.e. what properties do you want it to have?
See for example these CS course notes from Cornell. Given some reasonably sensible properties that the global ranking should have, it's actually impossible to create an algorithm that will definitely satisfy those properties. You may need to compromise in what you will accept as reasonable properties of your global ranking.
Items: Duke forever Quake UT3 Duke3d Halo
REV1:
- Duke nukem forever
- Duke3d
REV2:
- Duke3d
- UT3
- Quake
REV3:
- Duke3d
- Duke nukem forever
Logical deduction:
- REV1: Duke forever > duke3d #
- REV1: Duke3d < Duke forever ##
- REV2: Duke3d > UT3 %
- REV2: UT3 < Duke3d %
- REV2: Duke3d > Quake %%
- REV2: Quake < Duke3d %%
- REV2: UT3 > Quake %%%
- REV2: Quake < UT3 %%%
- REV3: Duke3d > Duke forever #
- REV3: Duke forever < Duke3d ##
Most voted:
- Duke3d 3
- Duke forever 2
- UT3, Quake 1
Remove # that contradict by creating contradicting pairs and convert to = Accept % that match in pairs. Group with count everything. If there are < and = contradictions the true is the one with more count.
Filtered then sorted by logic:
- Duke3d, Duke forever
- UT3
- Quake
But sorting should be influenced by the voting count (Bayesian sort). That way Duke3d would win.
Halo cannot be placed anywhere because it is not voted...
精彩评论