SQL to join the first record in a custom sort order_问答_开发者

Sometimes I want to join a record that I can easily identify as "sort by this and that and pick the first record". I wonder what the best practice here.

For example in my data model I have Services and Implementations. A Service may have multiple Implementations. Implementations have version numbers; zero or more implementations may be installed and one of them may also be enabled. Now I want to select the "current implementation" for each service using the following rules:

If nothing is installed, then the current implementation is the most recent one.
If more than one is installed, then the current is either the one that is enabled or, if nothing is enabled, the most recent one again.

Some examples:

Service  Version  Installed  Enabled
=======  =======  =========  =======
A        1        False      False
A      开发者_如何学JAVA  2        False      False   <- current for service A because most recent

B        1        True       False   <- current for service B because installed
B        2        False      False

C        1        True       False
C        2        True       False   <- current for service C because most recent
C        3        False      False      among installed

D        1        True       True    <- current for service D because enabled
D        2        True       False

I figured that if I sort by these fields and pick the first record, it will do the trick. I.e. I do something like that:

SELECT s.service, <other service fields>, i.version, <other impl. fields>
  FROM service s, implementation i
 WHERE i.rowid == (
       SELECT rowid 
         FROM implementation o
        WHERE o.service == s.service
     ORDER BY installed DESC, enabled DESC, version ASC)

Update. This one is rather specific to SQLite, I guess. First, rowid is the internal record ID that is used by SQLite. Second, in SQLite the SELECT expression in this context returns a single value. I guess it could be rewritten into a more generic SQL like this:

  ...
  FROM service s, implementation i
 WHERE i.service == s.service
   AND i.version == (
       SELECT version 
         FROM ...
     ORDER BY ...
        LIMIT 1)

It works here, but sometimes there's no rowid or another single field that can serve as an identifier. Are there any other ways to get the same result? Something more generic, perhaps? (And for me something more specific is OK too, as long as it's SQLite-specific :)

You can use a NOT EXISTS condition to exclude all but the desired match. Join rows of [service] and [implementation] that match on the [service] column so long as there does not exist another row of [implementation] that also matches on the [service] column but is preferred because it would rank higher in the desired ordering. Here is the idea.

SELECT s.service, <other service fields>, i.version, <other impl. fields>
FROM service s JOIN implementation i
ON i.service = s.service
AND NOT EXISTS ( -- where there's no "better" row from i to use
SELECT * FROM implementation AS i2
   WHERE i2.service = i.service
   AND (
     i2.installed > i.installed
     OR (i2.installed = i.installed AND i2.enabled > i.enabled)
     OR (i2.installed = i.installed AND i2.enabled = i.enabled AND i2.version < i.version)
   )
)

Using Microsoft SQL Server or another SQL dialect that supports the CROSS APPLY operator, this is much, much simpler:

SELECT s.service, <other service fields>, i.version, <other impl. fields>
FROM service s
CROSS APPLY (
  SELECT TOP 1 * FROM implementation
  WHERE implementation.service = s.service
  ORDER BY installed DESC, enabled DESC, version ASC
) AS i

(Neither solution has been tested with sample data, as CREATE TABLE and INSERT statements were not posted.)

Steve Kass

I couldn't think of anything better than this:

SELECT
  i.Service,
  i.Installed,
  i.Enabled,
  MAX(i.Version) AS Version
FROM implementation i
  INNER JOIN (
    SELECT
      i.Service,
      i.Installed,
      MAX(i.Enabled) AS Enabled
    FROM implementation i
      INNER JOIN (
        SELECT
          Service,
          MAX(Installed) AS Installed
        FROM implementation
        GROUP BY
          Service
      ) m ON i.Service = m.Service
    GROUP BY
      i.Service,
      i.Installed
  ) m ON i.Service = m.Service AND i.Installed = m.Installed
GROUP BY
  i.Service,
  i.Installed
  i.Enabled

Basically this script finds maximum values of Installed for every Service, then joins the obtained list back to the original table (to filter on the found values) and finds maximums of Enabled, then again joins the obtained table back to implementation to finally retrieve maximum values of Version for the filtered rows.