Now I have a really tricky thing with Oracle execution plans running havoc, when I use a DETERMINISTIC
function on the right hand side of the LIKE
operator. This is my situation:
The Situation
I thought it to be wise to execute a query like this (simplified):
SELECT [...]
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like special_char_filter(?)
And I would bind ?
to something like 'Eder%'
. Now customers
and addresses
are very large tables. That's why it's important to use indexes. Of course, there is a regular index on addresses.cust_id
. But I have also created a function-based index on special_char_filter(customers.surname)
, which works quite nicely.
The Trouble
The trouble is, the above query involving a like
clause creates execution plans with FULL TABLE SCANS on addresses
. It looks like something in this query keeps Oracle from using indexes on addresses.cust_id
.
The Workaround
开发者_高级运维I found out, that the solution to my problem is this:
SELECT [...]
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like ?
I removed the (DETERMINISTIC
!) function from the like operator's right hand side and pre-calculated the bind variable in Java. Now this query is hyper-fast, without any FULL TABLE SCANS. This, too, is very fast (although not equivalent):
SELECT [...]
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) = special_char_filter(?)
The Confusion
I don't understand this. What's wrong with having deterministic functions on the right hand side of the like
operator? I have observed this in Oracle 11.2.0.1.0
It might be nothing in the query at all. The cost-based optimizer might just be confused and think that a FULL TABLE SCAN is faster. Have you tried using a HINT on the query, forcing Oracle to use your index?
The problem is that Oracle doesn't know what "special_char_filter(?)" will return. If it returns a '%', then it will be very slow to use an index as everything would match. If it returns a 'A%' it will probably also be slow as (assuming an equal distribution across all the letters) about 4% of the rows would match. If it returns a '%FRED%', it wouldn't return a lot of rows, but a using an index range scan would perform poorly because the rows may be at the start, middle or end of the index, so it has to do the entire index.
If you know that special_char_filter will always return a string that has at least three 'solid' characters at the start, then you will probably have better luck with
SELECT [...] FROM customers cust JOIN addresses addr ON addr.cust_id = cust.id WHERE special_char_filter(cust.surname) like special_char_filter(?) AND substr(special_char_filter(cust.surname),1,3) = substr(special_char_filter(?),1,3)
with a FBI on substr(special_char_filter(cust.surname),1,3)
Though if pre-calculating the result in java works, then stick with it.
Apart from that, I'd probably look at Oracle Text for matches.
The script below shows the steps I used to get an index range scan on the ADDRESSES index. Before you look at the details, you may want to just run the whole thing. If you don't get two index range scans for the last two queries then it's probably a difference in our versions, settings, etc. I'm using 10.2.0.1.0.
If you do see the desired plan, then you may want to gradually modify my script to make it more accurately reflect the real data, and try to find the exact change that makes it break. Hopefully my setup is at least close to the real thing, and isn't missing any details that would make it irrelevant to your exact problem.
This is a weird issue, and I don't understand everything that's going on here. For example, I don't know why use_nl works but index hints don't.
(Note that my execution times are based on repeated executions. The first time you run this some queries may be slower because the data isn't cached.)
--create tables
create table customers (id number, surname varchar2(100), other varchar2(100));
create table addresses (cust_id number, other varchar2(100));
--create data and indexes
insert into customers select level, 'ASDF'||level, level from dual connect by level <= 1000000;
insert into addresses select level, level from dual connect by level <= 1000000;
create index customers_id on customers(id);
create index addresses_cust_id on addresses(cust_id);
create index customers_special_char_filter on customers(special_char_filter(surname));
--create function
create or replace function special_char_filter(surname in varchar) return varchar2 deterministic is
begin
return replace(surname, 'bad value!', null);
end;
/
--gather stats
begin
dbms_stats.gather_table_stats(ownname => user, tabname => 'CUSTOMERS', cascade => true);
dbms_stats.gather_table_stats(ownname => user, tabname => 'ADDRESSES', cascade => true);
end;
/
set autotrace on;
--Index range scan on CUSTOMERS_SPECIAL_CHAR_FILTER, but full table scan on ADDRESSES
--(0.2 seconds)
SELECT *
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like special_char_filter('ASDF100000bad value!%');
--This uses the addresses index but it does an index full scan. Not really what we want.
--I'm not sure why I can't get an index range scan here.
--Various other index hints also failed here. For example, no_index_ffs won't stop an index full scan.
--(1 second)
SELECT /*+ index(addr addresses_cust_id) */ *
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like special_char_filter('ASDF100000bad value!%');
--Success! With this hint both indexes are used and it's super-fast.
--(0.02 seconds)
SELECT /*+ use_nl(cust addr) */ *
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like special_char_filter('ASDF100000bad value!%');
--But forcing the index won't always be a good idea, for example when the value starts with '%'.
--(1.2 seconds)
SELECT /*+ use_nl(cust addr) */ *
FROM customers cust
JOIN addresses addr ON addr.cust_id = cust.id
WHERE special_char_filter(cust.surname) like special_char_filter('%ASDF100000bad value!%');
精彩评论