开发者

Most recent entry from two tables

开发者 https://www.devze.com 2023-01-25 18:38 出处:网络
I have a SQL 2000 DB with and old table and a new table with combined records of over 20,000,000 records. The two tables are exactly the sam开发者_高级运维e, but were split due to performance issues.

I have a SQL 2000 DB with and old table and a new table with combined records of over 20,000,000 records. The two tables are exactly the sam开发者_高级运维e, but were split due to performance issues. I am not the DB admin, I just need data out of it and have been given DBReader rights on it.

OldTable: ClientID, AppID, ModTime, Event

NewTable: ClientID, AppID, ModTime, Event

I need to retrieve the most recent record for each client, appid and event from whichever table has the most recent entry for it. Anyone any ideas about the best method for this? I have tried using a union, but the query takes over two hours to complete. I was thinking of using a join instead, but I'm not sure the best approach.

Thanks!


you will have to use a UNION, but if the tables are DISTINCT, consider using a UNION ALL which will be faster.

Also ensure that you have the correct indexes on the tables for this kind of query.


why not perform the query on each table, union the results, and repeat the query on the union?


If you're using a plain "UNION", then that could cause issues. UNION ensures that it's output contains no duplicates, which generally requires sorting or hashing the entire dataset.

UNION ALL, on the other hand, just returns all rows from both sides.


If this is just a one-off job and you only have two tables, just run a 'most-recent-entry' query on the two tables separately. Then do a UNION ALL of the two resultsets and use GROUP BY and MAX to leave only the most recent. In SQL:

SELECT ClientID, AppID, Event, MAX(MaxModTime) FROM (
    SELECT ClientID, AppID, Event, MAX(ModTime) MaxModTime FROM table1
    GROUP BY ClientID, AppID, Event
    UNION ALL
    SELECT ClientID, AppID, Event, MAX(ModTime) MaxModTime FROM table2
    GROUP BY ClientID, AppID, Event
) Q
GROUP BY ClientID, AppID, Event

You can improve the speed of such a query by having an composite index on (ClientID, AppID, Event) for both tables, or when it is possible a clustered index on (ClientID, AppId, Event, ModTime).


For performance, I suggest inserting ClientID, AppID, and MAX(ModTime) from the old table into a temporary table, appending ClientID, AppID, and MAX(ModTime) from the new table into the same temporary table and then querying ClientID, AppID, and MAX(ModTime) from the temporary table.

0

精彩评论

暂无评论...
验证码 换一张
取 消