I have a dataset of users' answers to a predefined list of true/false questions. The data looks like this:
+---------+-------------+--------+----+
| user_id | question_id | answer | id |
+-------------------------------------+
| 4 | 110 | 0 | 1 |
| 4 | 111 | 1 | 2 |
| 4 | 112 | 1 | 3 |
| 4 | 113 | 0 | 4 |
|---------+-------------+--------+----|
| 6 | 110 | 0 | 5 |
| 6 | 111 | 1 | 6 |
| 6 | 112 | 0 | 7 |
| 6 | 113 | 0 | 8 |
+---------+-------------+--------+----|
What I need to find are the top 10 best m开发者_运维问答atches for each user (run once for every user in the system). So to be able to find 10 other users in descending order of best match based on answers (i.e. from the example above, user 4 and user 6 are 75% compatible based on their answers).
A couple of constraints on this to hopefully make it easier are:
- Each user will have a minimum of 10 answers to be considered
- Everyone has answered the same first 10 questions
Ideally this should be able to handle people who have answered many various questions that might not be the same for everyone (i.e. they skip questions they don't want to answer.
Thanks for any help on this! I'm really at a loss for what to do.
My first though is to use an IF
. Something like:
SELECT SUM(IF(a.answer=b.answer,1,0)) AS match, b.user_id
FROM data_table AS a
JOIN data_table AS b ON a.question_id = b.question_id
WHERE a.user_id = n
AND b.user_id <> n
GROUP BY b.user_id
ORDER BY match DESC
LIMIT 10
Where n is the user_id you wish to test
精彩评论