I have an n:n set of data (for example, 'programmers' and 'languages'. Programmers write code in many languages, and a language can be used by many programmers). This data is in a table program开发者_StackOverflow社区mers_languages
How do I quickly select programmers who code in all of a set of languages?
More information if this is confusing:
Jon codes in C++, Pascal, and Ruby. Joe codes in C++ and Ruby. Moe codes in Ruby and Pascal. Steve Codes in C++ and Pascal.
If the set of languages in question is C++ and Pascal, I would want Jon and Steve from this list.
Note the size of this set can get pretty large, so I don't want to join the table to itself n times.
Note the size of this set can get pretty large, so I don't want to join the table to itself n times.
Any way you shake it, there's going to be a join for each language. You're looking for a value (programmer) for which there exists at least one row for each of another value (language). That means that you need to think about N different perspectives of the same table.
In most cases, it's probably most efficient for you to just do the joins. If the result set is sufficiently dense (really, most programmers speak python and c++), you could resort to some cleverness. First query the disjunction, but uniquely, then group the resulting relation by programmer and filter out the ones that speak too few languages...
SELECT programmer
FROM ( SELECT DISTINCT programmer, language
FROM speaks_table
WHERE language in ('C++', 'python') ) AS disjunction
GROUP BY disjunction.programmer
HAVING count(disjunction.language) = 2
But wether this outperforms a regular ol' multiway join is going to depend on the exact data in question. This at least has the advantage of not requiring generative queries depending on the number of languages in question.
edit: this was my first answer and doesn't work for the question.
Assuming the table Programmers_Languages
has two VARCHAR
columns, one called Programmer
and the other called Languages
:
SELECT DISTINCT Programmer
FROM Programmers_Languages
WHERE Language IN ('C++', 'Pascal')
ORDER BY Programmer
DISTINCT
so that you only get each result once. ORDER BY
if you want it sorted alphabetically.
edit: different query, this works.
SELECT Programmers
FROM Programmers_Languages
WHERE Languages IN ('C++', 'Pascal')
GROUP BY Programmers
HAVING COUNT(*) >= 2
ORDER BY Programmers
Looks like TokenMacGuy came up with something very similar. I'm assuming that the list of languages and the count of languages will be inserted in this query by some other code. If you're building the query dynamically, the following would be even quicker, of course:
SELECT DISTINCT Programmers
FROM Programmers_Languages
WHERE Languages = 'C++'
AND Languages = 'Pascal'
AND <...>
ORDER BY Programmers
精彩评论