What's the best way to find percentage match among users' answers to questions?_问答_开发者

What's the best way to find percentage match among users' answers to questions?

开发者 https://www.devze.com 2023-03-02 22:14 出处：网络

I have a dataset of users\' answers to a predefined list of true/false questions. The data looks like this:

I have a dataset of users' answers to a predefined list of true/false questions. The data looks like this:

+---------+-------------+--------+----+
| user_id | question_id | answer | id |
+-------------------------------------+
|    4    |     110     |    0   | 1  |
|    4    |     111     |    1   | 2  |
|    4    |     112     |    1   | 3  |
|    4    |     113     |    0   | 4  |
|---------+-------------+--------+----|
|    6    |     110     |    0   | 5  |
|    6    |     111     |    1   | 6  |
|    6    |     112     |    0   | 7  |
|    6    |     113     |    0   | 8  |
+---------+-------------+--------+----|

What I need to find are the top 10 best m开发者_运维问答atches for each user (run once for every user in the system). So to be able to find 10 other users in descending order of best match based on answers (i.e. from the example above, user 4 and user 6 are 75% compatible based on their answers).

A couple of constraints on this to hopefully make it easier are:

Each user will have a minimum of 10 answers to be considered
Everyone has answered the same first 10 questions

Ideally this should be able to handle people who have answered many various questions that might not be the same for everyone (i.e. they skip questions they don't want to answer.

Thanks for any help on this! I'm really at a loss for what to do.

My first though is to use an IF. Something like:

SELECT SUM(IF(a.answer=b.answer,1,0)) AS match, b.user_id 
FROM data_table AS a
JOIN data_table AS b ON a.question_id = b.question_id
WHERE a.user_id = n
AND b.user_id <> n
GROUP BY b.user_id
ORDER BY match DESC
LIMIT 10

Where n is the user_id you wish to test