I have a mysql table which stores maintenance logs for sensors. I'd like to design a query that finds instances where a given sensor was repaired/maintained for the same reason. (Recurring problem finder.)
My table (simplified) looks like this:
id name mask
== ==== ====
11 alpha 0011
12 alpha 0010
13 alpha 0100
14 beta 0001
The mask field is a bitmask where each position represents a particular type of repair. I was able to successfully figure out how to compare the bitmask (per this question) but trying to incorporate it into a query is proving more difficult than I thought.
Given the above sample records, only id's 11 and 12 apply, since they both have a 1
in the third mask position.
Here's what I've tried and why it didn't work:
1. Never finishes... This query seems to run forever, I don't think it is working the way I want.
SELECT t1.id, t1.name
FROM data t1
LEFT OUTER JOIN dat开发者_运维知识库a t2
ON (CONV(t1.mask,2,10) & CONV(t2.mask,2,10) > 0)
GROUP BY t1.name
HAVING COUNT(*) >1;
2. Incomplete query...
I was thinking of creating a view, to only parse the sensors that actually have more than one entry in the table. I wasn't sure where to go from here.SELECT COUNT(t1.name) AS times, t1.name, t1.id, t1.mask
FROM data AS t1
GROUP BY t1.name ASC
HAVING times > 1;
Any suggestions on this?
Since the database structure was not designed with the realities of RDBMs in mind (probably not your doing, I just have to make the point anyway…), the performance will always be poor, though it is possible to write a query that will finish.
Jim is correct in that the query results in a cartesian product. If that query were to be returned ungrouped and unfiltered, you could expect (SELECT POW(COUNT(*), 2) FROM data)
results. Also, any form of outer join is unnecessary, so a standard inner join is what you want here (not that it ought to make a difference in terms of performance, it's just more appropriate).
Also another condition of the join, t1.id != t2.id
is necessary, lest each record match itself.
SELECT t1.id, t1.name
FROM data t1
JOIN data t2
ON t1.name = t2.name
AND t1.id != t2.id //
WHERE CONV(t1.mask, 2, 10) & CONV(t2.mask, 2, 10) > 0
GROUP BY t1.name
HAVING COUNT(*) > 1;
Your incomplete query:
SELECT t1.id, t1.name, t1.mask
FROM data t1
WHERE t1.name IN (SELECT t2.name FROM data t2 GROUP BY t2.name HAVING COUNT(*) > 1);
SELECT t1.id, t1.name, t1.mask
FROM data t1
WHERE EXISTS (SELECT 1 FROM data t2 WHERE t2.name = t1.name GROUP BY t2.name HAVING COUNT(*) > 1);
Off the top of my head I can't tell you which of those would perform best. If data.name
is indexed (and I would hope it is), the cost for either query ought to be rather low. The former will cache a copy of the subselect, whereas the latter will perform multiple queries against the index.
One very basic optimization (while leaving the table structure as a whole untouched) would be to convert the mask field to an unsigned integer data type, thereby saving many calls to CONV()
.
WHERE CONV(t1.mask, 2, 10) & CONV(t2.mask, 2, 10) > 0
becomes
WHERE t1.mask & t2.mask > 0
Of course, breaking the data down further does make more sense. Instead of storing a bitmask in one record, break out all the ones bits into separate records
id name mask
== ==== ====
11 alpha 1101
would become
id name value
== ==== =====
11 alpha 1
12 alpha 4
13 alpha 8
Now, a strategically placed index on name
and value
makes the query a piece of cake
SELECT name, value
FROM data
GROUP BY name, value
HAVING COUNT(*) > 1;
I hope that this helps.
Break the mask bits out in real columns. RDMBs don't like bit fields.
Your join results in a cartesian product of the table with itself. Add `t1.name=t2.name' to the join, giving a bunch of (much) smaller cartesian products, one per unique name, which will speed things up considerably.
精彩评论