开发者

GROUP BY with aggregate and an INNER JOIN

开发者 https://www.devze.com 2023-03-18 11:44 出处:网络
I tried to narrow down the problem as much as possible, it is still quite something. This is the query that doesn\'t work the way I want it:

I tried to narrow down the problem as much as possible, it is still quite something. This is the query that doesn't work the way I want it:

SELECT *, MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
  (SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
  FROM tbl_edges edges1
  INNER JOIN tbl_edges edges2
  ON edges1.nodeB = edges2.nodeA
  GROUP BY edges1.id HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY id_edge

Here is what I get:

|id | edge | dist | id1 | id2 | numConn | MAX(tbl_stopover.dist) |
------------------------------------------------------------------
|2  | 23   | 2    | 23  | 35  | 1       | 9                      |
|4  | 24   | 5    | 24  | 46  | 1    开发者_如何学运维   | 9                      |
------------------------------------------------------------------

and this is what I would want:

|id | edge | dist | id1 | id2 | numConn | MAX(tbl_stopover.dist) |
------------------------------------------------------------------
|3  | 23   | 9    | 23  | 35  | 1       | 9                      |
|5  | 24   | 9    | 24  | 46  | 1       | 9                      |
------------------------------------------------------------------

But let me elaborate a bit...

I have a graph, let's say as such:

    node1
      |
    node2
   /     \
node3    node4
  |       |
node5    node6

Therefore I have a table I call tbl_edges like this:

| id  | nodeA | node B |
------------------------
| 12  |   1   |    2   |
| 23  |   2   |    3   |
| 24  |   2   |    4   |
| 35  |   3   |    5   |
| 46  |   4   |    6   |
------------------------

Now each edge has "stop_overs" at a certain distance (to nodeA). Therefore I have a table tbl_stopover like this:

| id  | edge  |  dist  |
------------------------
|  1  |  12   |    5   |
|  2  |  23   |    2   |
|  3  |  23   |    9   |
|  4  |  24   |    5   |
|  5  |  24   |    9   |
|  6  |  35   |    5   |
|  7  |  46   |    5   |
------------------------

Why this query?

Let's assume I want to calculate the distance between the stop_overs. Within one edge that is no problem. Across edges it gets more difficult. But if I have two edges that are connected and there is no other connection I can also calculate the distance. Here an example assuming all edges have a length of 10. :

edge23 has a stop_over(id=3) at dist=9, edge35 has a stop_over(id=6) at dist=5. Therefore the distance between these two stop_overs is:

dist = (length - dist_id3) + dist_id5 = (10-9) + 5

I am not sure if I made my self clear. If this is not understandable, feel free to ask question and I will do my best to make this more understandable.


MySQL allows you to do something silly - display fields in an aggregate query that are not a part of the GROUP BY or an aggregate function like MAX. When you do this, you get random (as you said) results for the remaining fields.

In your query you are doing this twice - once in your inner query (id2 is not part of a GROUP BY or aggregate) and once in the outer.

Prepare for random results!

To fix it, try something like this:

SELECT tbl_stopover.id,
       tbl_stopover.dist,
       tbl_conn.id1,
       tbl_conn.id2,
       tbl_conn.numConn,
       MAX(tbl_stopover.dist)
FROM tbl_stopover
INNER JOIN
  (SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
  FROM tbl_edges edges1
  INNER JOIN tbl_edges edges2
  ON edges1.nodeB = edges2.nodeA
  GROUP BY edges1.id, edges2.id
  HAVING numConn = 1) AS tbl_conn
ON tbl_stopover.id_edge = tbl_conn.id1
GROUP BY tbl_stopover.id,
         tbl_stopover.dist,
         tbl_conn.id1,
         tbl_conn.id2,
         tbl_conn.numConn

The major changes are the explicit field list (note that I removed the id_edge since you are joining on id1 and already have that field), and addition of additional fields to both the inner and outer GROUP BY clauses.

If this gives you more rows than you want then you may need to explain more about your desired result set. Something like this is the only way to ensure you get appropriate groupings.


Okay. This seems to be the answer to my question. I will do some further "investigation" though, because I'm not sure if this is reliable. If anybody has some though on this, please leave a comment.

SELECT tbl.id, tbl.dist, tbl.id1, tbl.id2, MAX(dist) maxDist
FROM
(
  SELECT tbl_stopover.id,
         tbl_stopover.dist,
         tbl_conn.id1,
         tbl_conn.id2,
         tbl_conn.numConn
  FROM tbl_stopover
  INNER JOIN
    (SELECT edges1.id id1, edges2.id id2, COUNT(edges1.id) numConn
    FROM tbl_edges edges1
    INNER JOIN tbl_edges edges2
    ON edges1.nodeB = edges2.nodeA
    GROUP BY edges1.id
    HAVING numConn = 1) AS tbl_conn
  ON tbl_stopover.id_edge = tbl_conn.id1
  GROUP BY tbl_stopover.dist, tbl_conn.id1
  ORDER BY dist DESC) AS tbl
GROUP BY tbl.id1, tbl.id2

Thanks to JNK (my colleague at work) without whom I wouldn't have gotten this far.

0

精彩评论

暂无评论...
验证码 换一张
取 消