SQL to get separate counts of two groups across multiple tables?_问答_开发者

We have a small randomized study that we're trying to report numbers for. In this database, we have eight tables that contain different randomization groups (treatment vs. control) for each table that are designed like so:

+--------+-------+----------------------+-----------------+
| caseID | patID | randomizedDate       | randomizedGroup |
+--------+-------+----------------------+-----------------+
|  1     | 5000  | 2/17/2010 5:12:00 PM |  T              |
|  2     | 5005  | 3/11/2010 1:45:00 PM |  C              |
|  3     | 5007  | 3/22/2010 7:16:00 AM |  C              |
|  4     | 5011  | 4/10/2010 3:34:00 PM |  T              |
|  5     | 5015  | 4/19/2010 5:41:00 PM |  C              |
|  6     | 5018  | 5/23/2010 4:06:00 PM |  T              |
|  7     | 5021  | 6/27/2010 5:28:00 PM |  T              |
|  8     | NULL  | NULL                 |  C              |
|  9     | NULL  | NULL                 |  T              |
|  10    | NULL  | NULL                 |  T              |
|  11    | NULL  | NULL                 |  C              |
|  12    | NULL  | NULL                 |  C              |

The tables were already pre-generated with randomized Ts & Cs using a stats program beforehand. So, we have eight groups of these that are waiting to be filled based on preset criteria for our project. No PatID will exist in more than one of these tables.

What we need is a breakdown of counts across these tables based on the randomizedGroup column. For example:

+--------------------+--------+--------+--------+----------+
| randomizationGroup | Table1 | Table2 | Table3 | So on... |
+--------------------+--------+--------+--------+----------+
|  C                 | 10     | 24     |  14    |          |
|  T                 | 11     | 16     |  21    |          |
+--------------------+--------+--------+--------+----------+

As of now I'm getting these numbers using the following query, but I wanted to find out if this is optimal or if I should be doing it another way. The more I use SQL, the more I like it so I'm always wanting to refine my skills and learn.

SELECT randomizationGroup, SUM(count1) AS Table1, SUM(count2) AS Table2, SUM(count3) AS Table3, SUM(count4) AS Table4, SUM(count5) AS Table5, SUM(count6) AS Table6, SUM(count7) AS Table7, SUM(count8) AS Table8
FROM (
    SELECT randomizationGroup, COUNT(*) AS count1, 0 AS count2, 0 AS count3, 0 AS count4, 0 AS count5, 0 AS count6, 0 AS count7, 0 AS count8 FROM Table1 WHERE patid IS NOT NULL GROUP BY randomizationGroup
    UNION ALL
    SELECT randomizationGroup, 0 AS count1, COUNT(*) AS count2, 0 AS count3, 0 AS count4, 0 AS count5, 0 AS count6, 0 AS count7, 0 AS count8 FROM Table2 WHERE patid IS NOT NULL GROUP BY randomizationGroup
    UNION ALL
    SELECT randomizationGroup, 0 AS count1, 0 AS count2, COUNT(*) AS count3, 0 AS count4, 0 AS count5, 0 AS count6, 0 AS count7, 0 AS count8 FROM Table3 WHERE patid IS NOT NULL GROUP BY randomizationGroup
    UNION ALL
    SELECT randomizationGroup, 0 AS count1, 0 AS count2, 0 AS count3, COUNT(*) AS count4, 0 AS count5, 0 AS count6, 0 AS count7, 0 AS count8 FROM Table4 WHERE patid IS NOT NULL GROUP BY randomizationGroup
    UNION ALL
    SELECT randomizationGroup, 0 AS count1, 0 AS count2, 0 AS count3, 0 AS count4, COUNT(*) AS count5, 0 AS count6, 0 AS count7, 0 AS count8 FROM Table5 WHERE patid IS NOT NULL GROUP BY randomizationGroup
    UNION ALL
    SELECT randomizationGroup, 0 AS count1, 0 AS count2, 0 AS count3, 0 AS count4, 0 AS count5, COUNT(*) AS count6, 0 AS count7, 0 AS count8 FROM Table6 WHERE patid IS NOT NULL GROUP BY randomizationGroup
    UNION ALL
    SELECT randomizationGroup, 0 AS count1, 0 AS count2, 0 AS count3, 0 AS count4, 0 AS count5, 0 AS count6, COUNT(*) AS count7, 0 AS count8 FROM Table7 WHERE patid IS NOT NULL GROUP BY randomizationGroup
    UNION ALL
    SELECT randomizationGroup, 0 AS count1, 0 AS count2, 0 AS count3, 0 AS count4, 0 AS count5, 0 AS count6, 0 AS count7, COUNT(*) AS cou开发者_如何学编程nt8 FROM Table8 WHERE patid IS NOT NULL GROUP BY randomizationGroup) all_groups
GROUP BY randGroup

Thank you!

I would create a view over all the tables, which in the future could be the structure for a single table should you decide to consolidate the data.

CREATE VIEW AllTables as
SELECT randomizationGroup, 'Table1' Source, COUNT(*) C FROM Table1 WHERE patid IS NOT NULL GROUP BY randomizationGroup
UNION ALL
SELECT randomizationGroup, 'Table2', COUNT(*) C FROM Table2 WHERE patid IS NOT NULL GROUP BY randomizationGroup
UNION ALL
SELECT randomizationGroup, 'Table3', COUNT(*) C FROM Table3 WHERE patid IS NOT NULL GROUP BY randomizationGroup
UNION ALL
SELECT randomizationGroup, 'Table4', COUNT(*) C FROM Table4 WHERE patid IS NOT NULL GROUP BY randomizationGroup
UNION ALL
SELECT randomizationGroup, 'Table5', COUNT(*) C FROM Table5 WHERE patid IS NOT NULL GROUP BY randomizationGroup
UNION ALL
SELECT randomizationGroup, 'Table6', COUNT(*) C FROM Table6 WHERE patid IS NOT NULL GROUP BY randomizationGroup
UNION ALL
SELECT randomizationGroup, 'Table7', COUNT(*) C FROM Table7 WHERE patid IS NOT NULL GROUP BY randomizationGroup
UNION ALL
SELECT randomizationGroup, 'Table8', COUNT(*) C FROM Table8 WHERE patid IS NOT NULL GROUP BY randomizationGroup
GO

Then, use the PIVOT operator in SQL Server 2005.

SELECT randomizationGroup, Table1,Table2,Table3,Table4,Table5,Table6,Table7,Table8
FROM AllTables P
pivot (sum(C) for Source in (Table1,Table2,Table3,Table4,Table5,Table6,Table7,Table8)) V

I won't say it is faster, but it is certainly an alternative to what you have.

What you've got is basically as good as it gets. This is largely due to the fact that you've got the same type of data in several tables. If you had them in the same table with some sort of type field then you'd be in better shape.

Well not really because then you'd need to pivot the data which ends up with a sum on a case statements unless you're using a DB that has a PIVOT built in.

Of course if you really wanted to you could make a view that does the UNION of the eight tables this if you found yourself needing it but this seems a little over-engineered, since you've got a solution that works (unless there are some requirements that I'm missing).

The below query should give you different counts for randomized groups from a single table but I guess that is not what you want but maybe it helps-

SELECT randomizationGroup, COUNT(case
     when randomizedGroup='C' then 1 end)
     AS countforC, Count(case when
     randomizedGroup='T' then 1 end) AS
     countforT from Table1 group by
     randomizationGroup

A good old JOIN should to the trick.

SELECT
  randomizationGroup = g.Grp,
  Table1 = t1.Cnt,
  Table2 = t2.Cnt,
  Table3 = t3.Cnt,
  Table4 = t4.Cnt,
  Table5 = t5.Cnt,
  Table6 = t6.Cnt,
  Table7 = t7.Cnt,
  Table8 = t8.Cnt
FROM (SELECT 'C' AS Grp UNION ALL SELECT 'T') g
  INNER JOIN (
    SELECT randomizationGroup, Cnt = COUNT(*)
    FROM Table1
    GROUP BY randomizationGroup) t1 ON g.Grp = t1.randomizationGroup
  INNER JOIN (
    SELECT randomizationGroup, Cnt = COUNT(*)
    FROM Table2
    GROUP BY randomizationGroup) t2 ON g.Grp = t2.randomizationGroup
  INNER JOIN (
    SELECT randomizationGroup, Cnt = COUNT(*)
    FROM Table3
    GROUP BY randomizationGroup) t3 ON g.Grp = t3.randomizationGroup
  INNER JOIN (
    SELECT randomizationGroup, Cnt = COUNT(*)
    FROM Table4
    GROUP BY randomizationGroup) t4 ON g.Grp = t4.randomizationGroup
  INNER JOIN (
    SELECT randomizationGroup, Cnt = COUNT(*)
    FROM Table5
    GROUP BY randomizationGroup) t5 ON g.Grp = t5.randomizationGroup
  INNER JOIN (
    SELECT randomizationGroup, Cnt = COUNT(*)
    FROM Table6
    GROUP BY randomizationGroup) t6 ON g.Grp = t6.randomizationGroup
  INNER JOIN (
    SELECT randomizationGroup, Cnt = COUNT(*)
    FROM Table7
    GROUP BY randomizationGroup) t7 ON g.Grp = t7.randomizationGroup
  INNER JOIN (
    SELECT randomizationGroup, Cnt = COUNT(*)
    FROM Table8
    GROUP BY randomizationGroup) t8 ON g.Grp = t8.randomizationGroup

This solution is not as universal as, for example, yours, or the one using PIVOT, because, as you can see, the group identifiers have to be hard coded. But if that will work for you, great. It could be helped, though, by replacing the hardcoded subselect with the one retrieving distinct randomizationGroup from all the tables.