I have a set of numbers to compare. Let's say that I have to get this comparison from a user. I have a choice of either asking him a question consisting of 2 numbers or 3 numbers of 4 numbers. For instance, I can ask any of the following questions:
- 开发者_Python百科
- Which of the numbers is greater? 2 OR 4
- Which of the numbers is greater? 2 OR 3 OR 4
- Which of the numbers if greater? 2 OR 3 OR 4 OR 5
My goal is to minimize the number of questions I ask the user in some combination that will ultimately give me an ordering of all the numbers in the set... For instance, if there were just 4 numbers {2,3,4,5}, I could've asked him the third type of question where I give him 4 numbers to compare. But in the ecosystem I am designing this for, the user gets annoyed with long questions so I would want to minimize the number of questions of this type that I would ask. So if each of the questions has a particular weight, I a trying to figure out a way to ultimately get the ordering of all numbers but at the same time pose the user with minimum trouble.
Is there a good way of solving this problem? Does anyone see this as belonging to a general class of problems or am I just making it too complex? Any suggestions?
Let's experiment, would you.
1. Example
Let's look at {A,B,C,D}
and sort it.
Solution 1: By sets
- Greater of
{A,B,C,D}
->B
(thusB>A
,B>C
andB>D
) - Greater of
{A,C,D}
->D
(thusD>A
andD>C
) - Greater of
{A,C}
->A
(thusA>C
)
Total order
[B,D,A,C]
Solution 2: By pairs
- Greater between
A
andC
->A
(thusA>C
) - Greater between
B
andD
->B
(thusB>D
) - Greater between
D
andA
->D
Total order
[B,D,A,C]
What's the catch ? Obviously it's more difficult by pairs, here I chose them so that the merge would be easy (none).
2. Remarks
a) Total ordering
The >
only work so well with a total ordering: ie for two given elements A
and B
of a set either A>B
or B>A
. If none of those two relations hold, then A
and B
are equivalent.
The problem with the Solution 1 approach is that if you present the user with {A,B,C,D}
and A
and B
are equivalent and greater than C
and D
... what's the answer supposed to be ?
b) Transitivity
The >
relationship is transitive, meaning that if A>B
and B>C
then A>C
. It's important to use this fact since you can therefore deduce relationships between two elements without ever asking the user.
c) What's the goal ?
Is the goal supposed to minimize the number of questions to the user or supposed to minimize the user's work ? Because obviously it's more difficult for a user to answer questions from the first solution...
3. Modeling
One can model the problem as a "graph" problem.
You begin with a set of nodes {A,B,C,D}
which represent the values you want to test.
Sorting the set is equivalent to computing the minimum set of oriented edges that link these nodes so that given any two nodes, a path leads from one to the other. I do insist of minimum.
For example, if I have 2 edges: B>A
and B>C
, then if I discover than A>C
I shall remove B>C
because I can deduce it by transitivity. The important properties is that if no two nodes are equivalent, the cardinal of the resulting set of edges is the cardinal of the set of nodes minus 1. In my example (given 4 nodes) it would be 3.
An oracle (or an extremely lucky guy) would thus be able to only ask 3 questions, and yet build this graph... that's what we should strive for.
4. How to solve this problem ?
Okay, let's suppose that we have no 2 equivalent elements. This means that if A>B
is false, then I can deduce that B>A
...
For the representation of our little graph, let's take an array:
A B C D
D . > . # . represent the unknown
C . >
B > # < and > have their usual meaning...
A
Because of the symmetry we only need a triangle.
By counting the number .
we can see the number of unknown relationships, the ideal solution is to get rid of all of those .
while asking as few questions as possible to the user.
A good question is therefore one than remove as much .
as possible in one swoop.
5. My question
And thus, I have another type of question:
Select the elements lower than "D" in the following {A,B,C}: _
This question feels better that the What is the greater... because I explicitly target those relationships I wish to know (D?A
, D?B
and D?C
) while the greater guarantees that I will obtain as much relationships but I can't know which in advance.
I try to maximize the usefulness of the question
Of course, it's no leeway for laziness: it's still important to take transitivity into account while exploiting the results of the questions.
6. Exploring larger sets
With larger sets, you can group the elements, and then merge them later on, but the merge is always a messy operation: you will probably get answers you already knew. However it's a great practice for my little question:
Given 2 ordered (disjoint) sets: [A,B,C,D,...]
and [R,S,T,U,...]
let's see the 3 questions of the toolbox:
- Which is greater:
A
orR
? _ - Which is the greatest element of
{A,B,R,S}
? _ Which elements are greater than
A
in{R,S,T}
? _Gives 1 relationship
- Gives 2 relationships: 3 of which 1 is already known
- Gives 3 relationships
The third question asks for a more elaborate answer, but it is much more suitable for merge situations. In the merge case, it's as efficient as asking to sort the elements, since it asks precisely for the only 3 relationships we don't know in the board.
7. Conclusion
You now have 4 questions in your box:
- Sort: Sort the following elements from greater to lower {A,B,C,D} ? _
- Pivot: Which elements are greater than A in this set {B,C,D} ? _
- Greater: Which element is the greater in this set {A,B,C,D} ? _
- Pair: Which is the greater element among A and B ? _
(Pair can be regarded as a degenerated case of either Greater or Pivot)
Given that each question gives n-1
relationships for n
nodes, you could try to gauge how much time it takes for a user to answer a question T(n)
and then find the n
that maximize T(n)/n
;)
If you are sorting the array then you generally sort in O(n log(n)) time. That means that you sometimes compare pairs of numbers more than once. If you ask 4 questions constantly then you get extra information that you can store.
When you get to a point in the sort where you ask a>b, then you may as well ask is a the largest of {b,c,d,e,...}. That way you never have to ask a>c, a>d, etc. If you store these known comparisons in a table then you never have to do them again. As for picking exactly what caparison sets to ask the user I'm not sure, and it's certainly a good question.
My intuition about which sets to put in the question to user user would be to look at how quick-sort works. It splits the data into two parts, and numbers that aren't in the same group are never compared, so that way you know what NOT to put in the set to ask the user. Use 4 numbers as long as possible until the quicksort gets down into groups that have fewer than 4 numbers.
You could just do a merge sort or quick sort algorithm where the comparison function is asking the user the two number version of the question, and then the most questions they'd have to answer is n log n.
Then the question is just, does the ability to ask which is the greatest of 3 or 4 reduce the number of questions you have to ask. My intuition says no but I'd be hard pressed to prove it.
You could present the questions through the algorithms of quicksort or mergesort. Minimize the number of comparisons and guarantee the correct result.
精彩评论