开发者

SQL: max(count((x))

开发者 https://www.devze.com 2023-02-16 06:37 出处:网络
I have a table of baseball fielding statistics for a project. There are many fields on this table, but the ones I care about for this are playerID, pos (position), G (games).

I have a table of baseball fielding statistics for a project. There are many fields on this table, but the ones I care about for this are playerID, pos (position), G (games).

This table is historical so it contains multiple rows per playerID (one for each year/pos). What I want to be able to do is return the position that a player played the most for his career.

First what I imaging I have to do is count the games per position per play开发者_开发问答erID, then return the max of it. How can this be done in SQL? I am using SQL Server. On a side note, there may be a situation where there are ties, what would max do then?


If the player played in the same position over multiple teams over multiple games, I'd be more apt to use the sum() function, instead of count, in addition to using a group by statement, as a sub-query. See code for explanation.

SELECT playerID, pos, MAX( g_sum )
FROM (
    SELECT DISTINCT playerID, pos, SUM( G ) as g_sum
    FROM player_stats
    GROUP BY id, pos
    ORDER BY 3 DESC
) game_sums
GROUP BY playerID

It may not be the exact answer, at least it's a decent starting point and it worked on my lame testbed that I whipped up in 10 minutes.


As far as how max() acts with ties: It doesn't (as far as I can tell, at least). It's up to the actual GROUP BY statement itself, and where and how that max value shows up within the query or sub query.

If we were to include pos in the outer GROUP BY statement, in the event of a tie, it would show you both positions and the amount of games the player has played at said positions (which would be the same number). With it not in that GROUP BY statement, the query will go with the last given value for that column. So if position 2 showed up before position 3 in the sub query, the full query will show position 3 as the position that the player has played the most games in.


In SQL, I believe this will do it. Given that the same subquery is needed twice, I expect that doing this as a stored procedure would be more efficient.

    SELECT MaxGamesInAnyPosition.playerID, GamesPerPosition.pos
    FROM (
        SELECT playerID, Max(totalGames) As maxGames
        FROM (
            SELECT playerID, pos, SUM(G) As totalGames
            FROM tblStats
            GROUP BY playerId, pos) Tallies
        GROUP BY playerID) MaxGamesInAnyPosition 

    INNER JOIN (
        SELECT playerID, pos, SUM(g) As totalGames
        FROM tblStats
        GROUP BY playerID, pos) GamesPerPosition
    ON (MaxGamesInAnyPosition.playerID=GamesPerPosition.playerId 
        AND MaxGamesInAnyPosition.maxGames=GamesPerPosition.totalGames)


does not look pretty, but it is direct translation of what I built in linq to sql, give it a try and see if that's what you want:

SELECT [t2].[playerID], (
    SELECT TOP (1) [t7].[pos]
    FROM (
        SELECT [t4].[playerID], [t4].[pos], (
            SELECT COUNT(*)
            FROM (
                SELECT DISTINCT [t5].[G]
                FROM [players] AS [t5]
                WHERE ([t4].[playerID] = [t5].[playerID]) AND ([t4].[pos] = [t5].[pos])
                ) AS [t6]
            ) AS [value]
        FROM (
            SELECT [t3].[playerID], [t3].[pos]
            FROM [players] AS [t3]
            GROUP BY [t3].[playerID], [t3].[pos]
            ) AS [t4]
        ) AS [t7]
    WHERE [t2].[playerID] = [t7].[playerID]
    ORDER BY [t7].[value] DESC
    ) AS [pos]
FROM (
    SELECT [t1].[playerID]
    FROM (
        SELECT [t0].[playerID]
        FROM [players] AS [t0]
        GROUP BY [t0].[playerID], [t0].[pos]
        ) AS [t1]
    GROUP BY [t1].[playerID]
    ) AS [t2]


Here is a second answer, much better (I think) than my first kick at the can last night. Certainly much easier to read and understand.

SELECT playerID, pos
FROM (
    SELECT playerID, pos, SUM(G) As totGames
    FROM tblStats
    GROUP BY playerID, pos) Totals
WHERE NOT (Totals.totGames < ANY(
    SELECT SUM(G) 
    FROM tblStats 
    WHERE Totals.playerID=tblStats.playerID
    GROUP BY playerID, pos))

The subquery ensures that all rows will be thrown out if its games total at that given position is smaller than the number of games that player played at any other position.

In case of ties, the player in question will have all tied rows appear, as none of the tied records will be thrown out.

0

精彩评论

暂无评论...
验证码 换一张
取 消