How should tables be indexed to optimise this Oracle SELECT query?_问答_开发者

I've got the following query in Oracle10g:

select * 
  from DATA_TABLE DT, 
       LOOKUP_TABLE_A LTA, 
       LOOKUP_TABLE_B LTB
 where DT.COL_A = LTA.COL_A (+) 
   and DT.COL_B = LTA.COL_B (+) 
   and LTA.COL_C = LTB.COL_C
   and LTA.COL_B = LTB.COL_B
   and ( D开发者_如何学运维T.REF_TXT = :refTxt or DT.ALT_REF_TXT = :refTxt )
   and DT.CREATED_DATE between :startDate and :endDate

And was wondering whether you've got any hints for optimising the query.

Currently I've got the following indices:

IDX1 on DATA_TABLE (REF_TXT, CREATED_DATE)
IDX2 on DATA_TABLE (ALT_REF_TXT, CREATED_DATE)
LOOKUP_A_PK on LOOKUP_TABLE_A (COL_A, COL_B)
LOOKUP_A_IDX1 on LOOKUP_TABLE_A (COL_C, COL_B)
LOOKUP_B_PK on LOOKUP_TABLE_B (COL_C, COL_B)

Note, the LOOKUP tables are very small (<200 rows).

EDIT:

Explain plan:

Query Plan
SELECT STATEMENT   Cost = 8
  FILTER
    NESTED LOOPS
      NESTED LOOPS
        TABLE ACCESS BY INDEX ROWID DATA_TABLE
          BITMAP CONVERSION TO ROWIDS
            BITMAP OR
              BITMAP CONVERSION FROM ROWIDS
                SORT ORDER BY
                  INDEX RANGE SCAN IDX1
              BITMAP CONVERSION FROM ROWIDS
                SORT ORDER BY
                  INDEX RANGE SCAN IDX2
        TABLE ACCESS BY INDEX ROWID LOOKUP_TABLE_A
          INDEX UNIQUE SCAN LOOKUP_A_PK
      TABLE ACCESS BY INDEX ROWID LOOKUP_TABLE_B
        INDEX UNIQUE SCAN LOOKUP_B_PK

EDIT2:

The data looks like this:

There will be 10000s of distinct REF_TXT, which 10-100s of CREATED_DTs for each. ALT_REF_TXT will mostly NULL but there are going to be 100s-1000s which it will be different from REF_TXT.

EDIT3: Fixed what ALT_REF_TXT actually contains.

The execution plan you're currently getting looks pretty good. There's no obvious improvement to be made.

As other have noted, you have some outer join indicators, but then you essentially prevent the outer join by requiring equality on other columns in the two outer tables. As you can see from the execution plan, no outer join is happening. If you don't want an outer join, remove the (+) operators, they're just confusing the issue. If you do want an outer join, rewrite the query as shown by @Dems.

If you're unhappy with the current performance, I would suggest running the query with the gather_plan_statistics hint, then using DBMS_XPLAN.DISPLAY_CURSOR(?,?,'ALLSTATS LAST') to view the actual execution statistics. This will show the elapsed time attributed to each step in the execution plan.

You might get some benefit from converting one or both of the lookup tables into index-organized tables.

Your 2 index range scans on IDX1 and IDX2 will produce at most 100 rows, so your BITMAP CONVERSION TO ROWIDS will produce at most 200 rows. And from there on, it's only indexed access by rowids, leading to a likely sub-second execution. So are you really experiencing performance problems? If so, how long does it take exactly?

If you are experiencing performance problems, then please follow Dave Costa's advice and get the real plan, because in that case it's likely that you are using another plan runtime, possibly due to certain bind variable values or different optimizer environment settings.

Regards,
Rob.

This is one of those cases where it makes very little sense to try to optimize the DBMS performance without knowing what your data means.

Do you have many, many distinct CREATED_DATE values and a few rows in your DT for each date? If so you want an index on CREATED_DATE, as it will be the primary way for the DBMS to reject columns it doesn't want to process.

On the other hand, do you have only a handful of dates, and many distinct values of REF_TXT or ALT_REF_TXT? In that case you probably have the correct compound index choices.

The presence of OR in your query complicates things greatly, and throws most guesswork out the window. You must look at EXPLAIN PLAN to see what's going on.

If you have tens of millions of distinct REF_TXT and ALT_REF_TXT values, you may want to consider denormalizing this schema.

Edit. Thanks for the additional info. Your explain plan contains no smoking guns that I can see. Some things to try next if you're not happy with performance yet.

Flip the order of the columns in your compound indexes on your data tables. Maybe that will get you simpler index range scans instead of all the bitmap monkey business.

Exchange your SELECT * for the names of the columns you actually need in the query resultset. That's good programming practice in any case, and it MAY allow the optimizer to avoid some work.

If things are still too slow, try recasting this as a UNION of two queries rather than using OR. That MAY allow the alt_ref_txt part of your query, which is made a little more complex by all the NULL values in that column, to be optimized separately.

This may be the query you want using a more upto date syntax.

(And without inner joins breaking outer joins)

select
  * 
from
  DATA_TABLE DT
left outer join
  (
    LOOKUP_TABLE_A LTA
  inner join
    LOOKUP_TABLE_B LTB
      on  LTA.COL_C = LTB.COL_C
      and LTA.COL_B = LTB.COL_B
  )
    on  DT.COL_A = LTA.COL_A
    and DT.COL_B = LTA.COL_B
where
   ( DT.REF_TXT = :refTxt or DT.ALT_REF_TXT = :refTxt )
   and DT.CREATED_DATE between :startDate and :endDate

INDEXes that I'd have are...

LOOKUP_TABLE_A (COL_A, COL_B)
LOOKUP_TABLE_B (COL_B, COL_C)
DATA_TABLE (REF_TXT, CREATED_DATE)
DATA_TABLE (ALT_REF_TXT, CREATED_DATE)

Note: The first condition in the WHERE clause about contains an OR that will likely frag the use of INDEXes. In such case I have seen performance benefits in UNIONing two queries together...

  <your query>
where
   DT.REF_TXT = :refTxt
   and DT.CREATED_DATE between :startDate and :endDate

UNION

  <your query>
where
   DT.ALT_REF_TXT = :refTxt
   and DT.CREATED_DATE between :startDate and :endDate

Provide output of this query with "set autot trace". Let's see how many blocks it is pulling. Explain plan looks good, it should be very fast. If you need more, denormalize the lookup table info into DT. Violates 3rd normal form, but it will make your query faster by eliminating the joins. In a situation where milliseconds counts, everything is in buffers, and you need that query to run 1000 times/second, it can help by driving down the number of blocks looked at per row. It is the ultimate way to boost read performance, but complicates your app (and ruins your lovely ER diagram).