开发者

T-SQL for finding Redundant Indexes

开发者 https://www.devze.com 2023-01-08 03:36 出处:网络
Is anyone aware of a T-SQL script that can detect redundant indexes across an entire database? An example of a redundant index in a table would be as follows:

Is anyone aware of a T-SQL script that can detect redundant indexes across an entire database? An example of a redundant index in a table would be as follows:

Index 1: 'ColumnA', 开发者_运维百科'ColumnB', 'ColumnC'
Index 2: 'ColumnA', 'ColumnB'

Ignoring other considerations, such as the width of columns and covering indexes, Index 2 would be redundant.

Thanks.


There are situations where the redundancy doesn't hold. For example, say ColumnC was a huuge field, but you'd sometimes have to retrieve it quickly. Your index 1 would not require a key lookup for:

select ColumnC from YourTable where ColumnnA = 12

On the other hand index 2 is much smaller, so it can be read in memory for queries that require an index scan:

select * from YourTable where ColumnnA like '%hello%'

So they're not really redundant.

If you're not convinced by my above argument, you can find "redundant" indexes like:

;with ind as (
    select  a.object_id
    ,       a.index_id
    ,       cast(col_list.list as varchar(max)) as list
    from    (
            select  distinct object_id
            ,       index_id
            from    sys.index_columns
            ) a
    cross apply
            (
            select  cast(column_id as varchar(16)) + ',' as [text()]
            from    sys.index_columns b
            where   a.object_id = b.object_id
                    and a.index_id = b.index_id
            for xml path(''), type
            ) col_list (list)
)
select  object_name(a.object_id) as TableName
,       asi.name as FatherIndex
,       bsi.name as RedundantIndex
from    ind a
join    sys.sysindexes asi
on      asi.id = a.object_id
        and asi.indid = a.index_id
join    ind b
on      a.object_id = b.object_id
        and a.object_id = b.object_id
        and len(a.list) > len(b.list)
        and left(a.list, LEN(b.list)) = b.list
join    sys.sysindexes bsi
on      bsi.id = b.object_id
        and bsi.indid = b.index_id

Bring cake for your users in case performance decreases "unexpectedly" :-)


Inspired by Paul Nielsen, I wrote this query to find/distinguish:

  • Duplicates (ignoring include order)
  • Redundant (different include columns)
  • Overlapping (different index columns)

And also record their usage (One might also want to use is_descending_key, but I don't need it.)

WITH IndexColumns AS
(
    SELECT I.object_id AS TableObjectId, OBJECT_SCHEMA_NAME(I.object_id) + '.' + OBJECT_NAME(I.object_id) AS TableName, I.index_id AS IndexId, I.name AS IndexName
        , (IndexUsage.user_seeks + IndexUsage.user_scans + IndexUsage.user_lookups) AS IndexUsage
        , IndexUsage.user_updates AS IndexUpdates

       , (SELECT CASE is_included_column WHEN 1 THEN NULL ELSE column_id END AS [data()]
        FROM sys.index_columns AS IndexColumns
        WHERE IndexColumns.object_id = I.object_id
          AND IndexColumns.index_id = I.index_id
        ORDER BY index_column_id, column_id
        FOR XML PATH('')
       ) AS ConcIndexColumnNrs

       ,(SELECT CASE is_included_column WHEN 1 THEN NULL ELSE COL_NAME(I.object_id, column_id) END AS [data()]
        FROM sys.index_columns AS IndexColumns
        WHERE IndexColumns.object_id = I.object_id
          AND IndexColumns.index_id = I.index_id
        ORDER BY index_column_id, column_id
        FOR XML PATH('')
       ) AS ConcIndexColumnNames

       ,(SELECT CASE is_included_column WHEN 1 THEN column_id ELSE NULL END AS [data()]
        FROM sys.index_columns AS IndexColumns
        WHERE IndexColumns.object_id = I.object_id
        AND IndexColumns.index_id = I.index_id
        ORDER BY column_id
        FOR XML PATH('')
       ) AS ConcIncludeColumnNrs

       ,(SELECT CASE is_included_column WHEN 1 THEN COL_NAME(I.object_id, column_id) ELSE NULL END AS [data()]
        FROM sys.index_columns AS IndexColumns
        WHERE IndexColumns.object_id = I.object_id
          AND IndexColumns.index_id = I.index_id
        ORDER BY column_id
        FOR XML PATH('')
       ) AS ConcIncludeColumnNames
    FROM sys.indexes AS I
       LEFT OUTER JOIN sys.dm_db_index_usage_stats AS IndexUsage
        ON IndexUsage.object_id = I.object_id
          AND IndexUsage.index_id = I.index_id
          AND IndexUsage.Database_id = db_id() 
)
SELECT
  C1.TableName
  , C1.IndexName AS 'Index1'
  , C2.IndexName AS 'Index2'
  , CASE WHEN (C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs) AND (C1.ConcIncludeColumnNrs = C2.ConcIncludeColumnNrs) THEN 'Exact duplicate'
        WHEN (C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs) THEN 'Different includes'
        ELSE 'Overlapping columns' END
--  , C1.ConcIndexColumnNrs
--  , C2.ConcIndexColumnNrs
  , C1.ConcIndexColumnNames
  , C2.ConcIndexColumnNames
--  , C1.ConcIncludeColumnNrs
--  , C2.ConcIncludeColumnNrs
  , C1.ConcIncludeColumnNames
  , C2.ConcIncludeColumnNames
  , C1.IndexUsage
  , C2.IndexUsage
  , C1.IndexUpdates
  , C2.IndexUpdates
  , 'DROP INDEX ' + C2.IndexName + ' ON ' + C2.TableName AS Drop2
  , 'DROP INDEX ' + C1.IndexName + ' ON ' + C1.TableName AS Drop1
FROM IndexColumns AS C1
  INNER JOIN IndexColumns AS C2 
    ON (C1.TableObjectId = C2.TableObjectId)
    AND (
         -- exact: show lower IndexId as 1
            (C1.IndexId < C2.IndexId
            AND C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs
            AND C1.ConcIncludeColumnNrs = C2.ConcIncludeColumnNrs)
         -- different includes: show longer include as 1
         OR (C1.ConcIndexColumnNrs = C2.ConcIndexColumnNrs
            AND LEN(C1.ConcIncludeColumnNrs) > LEN(C2.ConcIncludeColumnNrs))
         -- overlapping: show longer index as 1
         OR (C1.IndexId <> C2.IndexId
            AND C1.ConcIndexColumnNrs <> C2.ConcIndexColumnNrs
            AND C1.ConcIndexColumnNrs like C2.ConcIndexColumnNrs + ' %')
    )
ORDER BY C1.TableName, C1.ConcIndexColumnNrs


I created the following query that gives me a lot of good information to identify duplicate and near-duplicate indexes. It also includes other information like how many pages of memory an index takes, which allows me to give a higher priority to larger indexes. It shows what columns are indexed and what columns are included, so I can see if there are two indexes that are almost identical with only slight variations in the included columns.

WITH IndexSummary AS
(

SELECT DISTINCT sys.objects.name AS [Table Name],
    sys.indexes.name AS [Index Name],
    SUBSTRING((SELECT ', ' +  sys.columns.Name as [text()]
        FROM sys.columns
            INNER JOIN sys.index_columns
                ON sys.index_columns.column_id = sys.columns.column_id
                AND sys.index_columns.object_id = sys.columns.object_id
        WHERE sys.index_columns.index_id = sys.indexes.index_id
            AND sys.index_columns.object_id = sys.indexes.object_id
            AND sys.index_columns.is_included_column = 0
        ORDER BY sys.columns.name
    FOR XML Path('')), 2, 10000) AS [Indexed Column Names],
    ISNULL(SUBSTRING((SELECT ', ' +  sys.columns.Name as [text()]
        FROM sys.columns
            INNER JOIN sys.index_columns
            ON sys.index_columns.column_id = sys.columns.column_id
            AND sys.index_columns.object_id = sys.columns.object_id
        WHERE sys.index_columns.index_id = sys.indexes.index_id
            AND sys.index_columns.object_id = sys.indexes.object_id
            AND sys.index_columns.is_included_column = 1
        ORDER BY sys.columns.name
        FOR XML Path('')), 2, 10000), '') AS [Included Column Names],
    sys.indexes.index_id, sys.indexes.object_id
FROM sys.indexes
    INNER JOIN SYS.index_columns
        ON sys.indexes.index_id = SYS.index_columns.index_id
            AND sys.indexes.object_id = sys.index_columns.object_id
    INNER JOIN sys.objects
        ON sys.OBJECTS.object_id = SYS.indexES.object_id
WHERE sys.objects.type = 'U'
)

SELECT IndexSummary.[Table Name],
    IndexSummary.[Index Name],
    IndexSummary.[Indexed Column Names],
    IndexSummary.[Included Column Names],
    PhysicalStats.page_count as [Page Count],
    CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Size (MB)],
    CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Fragment %]
FROM IndexSummary
    INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)
       AS PhysicalStats
        ON PhysicalStats.index_id = IndexSummary.index_id
            AND PhysicalStats.object_id = IndexSummary.object_id
WHERE (SELECT COUNT(*) as Computed
        FROM IndexSummary Summary2
        WHERE Summary2.[Table Name] = IndexSummary.[Table Name]
            AND Summary2.[Indexed Column Names] = IndexSummary.[Indexed Column Names]) > 1
ORDER BY [Table Name], [Index Name], [Indexed Column Names], [Included Column Names]

Results of the query look like this:

Table Name  Index   Indexed Cols    Included Cols   Pages   Size (MB)   Frag %
My_Table    Indx_1     Col1         Col2, Col3       123      0.96       8.94
My_Table    Indx_2     Col1         Col2, Col3       123      0.96       8.94

Complete Description

For the complete explanation see Identifying Duplicate or Redundant Indexes in SQL Server.


Try the script below to show Unused Indexes, hope it helps

/****************************************************************
Description: Script to show Unused Indexes using DMVs
****************************************************************/

SELECT TOP 100
o.name AS ObjectName
, i.name AS IndexName
, i.index_id AS IndexID  
, dm_ius.user_seeks AS UserSeek
, dm_ius.user_scans AS UserScans
, dm_ius.user_lookups AS UserLookups
, dm_ius.user_updates AS UserUpdates
, p.TableRows
, 'DROP INDEX ' + QUOTENAME(i.name) 
+ ' ON ' + QUOTENAME(s.name) + '.' + QUOTENAME(OBJECT_NAME(dm_ius.object_id)) as 'drop statement'
FROM sys.dm_db_index_usage_stats dm_ius  
INNER JOIN sys.indexes i ON i.index_id = dm_ius.index_id AND dm_ius.object_id = i.object_id   
INNER JOIN sys.objects o on dm_ius.object_id = o.object_id
INNER JOIN sys.schemas s on o.schema_id = s.schema_id
INNER JOIN (SELECT SUM(p.rows) TableRows, p.index_id, p.object_id 
                FROM sys.partitions p GROUP BY p.index_id, p.object_id) p 
        ON p.index_id = dm_ius.index_id AND dm_ius.object_id = p.object_id
WHERE OBJECTPROPERTY(dm_ius.object_id,'IsUserTable') = 1
AND dm_ius.database_id = DB_ID()   
AND i.type_desc = 'nonclustered'
AND i.is_primary_key = 0
AND i.is_unique_constraint = 0
ORDER BY (dm_ius.user_seeks + dm_ius.user_scans + dm_ius.user_lookups) ASC
GO


I was just reading some MSDN blogs, noticed a script to do this and remembered this question.

I haven't bothered testing it side by side with Andomar's to see if one has any particular benefit over the other.

One amendment I would likely make to both though would be to take into account the size of both indexes when assessing redundancy.

Edit:

Also see Kimberley Tripp's post on Removing duplicate indexes

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号