开发者

Building a table that summarizes data about another table - Optimization suggestions?

开发者 https://www.devze.com 2023-03-25 13:03 出处:网络
I am trying to build a table that summarizes how many times pages in our Wiki references certain macros.

I am trying to build a table that summarizes how many times pages in our Wiki references certain macros.

I have built a temporary table called @currentpages that with 55k rows that looks like this:

DECLARE @currentpages table(
ContentID       NUMERIC(19,0) NOT NULL PRIMARY KEY,
PageTitle       VARCHAR(255) NULL,
SpaceKey            VARCHAR(255) NULL,
OriginalAuthor  VARCHAR(255) NULL,
LastChangedBy   VARCHAR(255) NULL,
LastChangedDt   VARCHAR(10) NULL,
ContentBody         TEXT NULL); 

and another table that looks like this:

DECLARE @usage table(
SpaceKey        VARCHAR(255) NOT NULL PRIMARY KEY,
Macro1      NUMERIC(19,0) NULL,
Macro2      NUMERIC(19,0) NULL,
Macro3              NUMERIC(19,0) NULL);

(I've simplified the above; it actually has about 40 columns like Macro1, Macro2, etc.) I'm trying to get counts of how many references there are (in @currentpages) to these various macros.

I initialize the @usage table by creating a row for each possible SpaceKey value, zeroing out all of the Macro"x" counters:

INSERT INTO @usage (SpaceKey, Macro1, Macro2, Macro3)
SELECT S.spacekey, 0, 0, 0
FROM spaces     S
ORDER BY S.spacekey

Then I run the first of several UPDATE statements to identify each macro reference:

UPDATE @usage
SET U.AdvancedSearch = C.Counter
FROM @usage                 U
INNER JOIN (SELECT SpaceKey, COUNT(*) AS Counter 
    FROM @currentpages 
    WHERE Conte开发者_运维技巧ntBody LIKE '%{search-%' GROUP BY SpaceKey) C 
        ON U.SpaceKey = C.SpaceKey

This appears to work fine, but it runs a very long time. Is there a more efficient way to do what I'm trying to do?

This database is on SQL Server 2005.

Many thanks, Betsy


  1. Investigate full text search. (Full Text Search @ MSDN) I don't have any experience with it, so can not offer advice.

  2. Instead of querying @CurrentPages once per macro, get all the macro counts at once. Something like:

    INSERT INTO @usage (SpaceKey, Macro1, Macro2, Macro3)
    SELECT S.spacekey
        , count(case when ContentBody LIKE '%Search Macro 1%' 
            then 1 else null end) as Macro1_Count
        , count(case when ContentBody LIKE '%Search Macro 2%' 
            then 1 else null end) as Macro2_Count
        , count(case when ContentBody LIKE '%Search Macro 3%' 
            then 1 else null end) as Macro=3Count
    FROM spaces S
    LEFT OUTER JOIN @CurrentPages C ON C.SpaceKey = S.SpaceKey
    GROUP BY S.spacekey
    


Don't make a column per macro, but make a table like this.

DECLARE @macrotype table(
    type      int          NOT NULL PRIMARY KEY
  , MacroName varchar(100) NOT NULL 
  , mask      varchar(100) NOT NULL 
)

DECLARE @usage table(
  SpaceKey    VARCHAR(255) NOT NULL PRIMARY KEY,
, MacroType   int NOT NULL 
, MacroCount  int NOT NULL 
)

Once you have that you can populate it by running

  INSERT INTO @macrotype VALUES (1, 'search', '%{search-%')
                              , (2, 'yadayada', '%{yadayada-%')
                              , ...

  DECLARE @c int, @m varchar(100)
  DECLARE c CURSOR READ_ONLY FOR SELECT type, mask FROM @macrotype
  OPEN c

  FETCH NEXT FROM c INTO @t, @m

  WHILE @@FETCH_STATUS = 0 
    BEGIN
      INSERT INTO @usage (SpaceKey, MacroType, MacroCount)
      SELECT SpaceKey
           , @t
           , COUNT(*) AS Counter 
        FROM @currentpages
       WHERE ContentBody LIKE @m
    GROUP BY SpaceKey
    FETCH NEXT FROM c INTO @t, @m
  END
  CLOSE c
  DEALLOCATE c

At the end of your story you can query the @usage table to make counts in whatever form you wish.

  SELECT MacroName, count = count(*)
    FROM @usage u
    JOIN @macrotype m
      ON u.MacroType - m.type
GROUP BY MacroName

On 55k records this will run for a few minutes.


I would imagine any performance hits are going to come from the subquery containing the LIKE clause since it's going to be doing a full table scan. I don't really see what else would slow this down too much.

You could test that by running just the sub-query and comparing the time it takes to return to the time the entire update takes for one of your columns.

0

精彩评论

暂无评论...
验证码 换一张
取 消