If I execute a query like this:
SELECT eid
FROM entidades e
WHERE distrito IN ( SELECT id FROM distritos WHERE distrito_t LIKE '%lisboa%' )
or
SELECT eid
FROM entidades e
WHERE concelho IN ( SELECT id FROM concelho WHERE concelho_t LIKE '%lisboa%' )
my indexes on distrito
or concelho
are being used.
For any of the queries stated above, the output of explain analyze will be something like this:
----------------------------------------------------------------------
Nested Loop (cost=239.36..23453.18 rows=12605 width=4) (actual time=29.995..790.191 rows=100602 loops=1)
-> HashAggregate (cost=1.38..1.39 rows=1 width=12) (actual time=0.081..0.085 rows=1 loops=1)
-> Seq Scan on distritos (cost=0.00..1.38 rows=1 width=12) (actual time=0.058..0.068 rows=1 loops=1)
Filter: ((distrito_t)::text ~~ '%lisboa%'::text)
-> Bitmap Heap Scan on entidades e (cost=237.98..23294.23 rows=12605 width=7) (actual time=29.892..389.767 rows=100602 loops=1)
Recheck Cond: (e.distrito = distritos.id)
-> Bitmap Index Scan on idx_t_ent_dis (cost=0.00..234.83 rows=12605 width=0) (actual time=26.787..26.7开发者_如何学C87 rows=100602 loops=1)
Index Cond: (e.distrito = distritos.id)
However, for the following query, indexes are not used at all...
SELECT eid
FROM entidades e
WHERE concelho IN ( SELECT id FROM concelho WHERE concelho_t LIKE '%lisboa%' )
OR distrito IN ( SELECT id FROM distritos WHERE distrito_t LIKE '%lisboa%' )
----------------------------------------------------------------------
Seq Scan on entidades e (cost=10.25..34862.71 rows=283623 width=4) (actual time=0.600..761.876 rows=100604 loops=1)
Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
SubPlan 1
-> Seq Scan on distritos (cost=0.00..1.38 rows=1 width=12) (actual time=0.083..0.093 rows=1 loops=1)
Filter: ((distrito_t)::text ~~ '%lisboa%'::text)
SubPlan 2
-> Seq Scan on concelhos (cost=0.00..8.86 rows=3 width=5) (actual time=0.173..0.258 rows=1 loops=1)
Filter: ((concelho_t)::text ~~ '%lisboa%'::text)
How can I create an index that will be used by the previous query?
According to this documentation it is possible... But I'm probably not searching for the right thing since I can't find any example at all...
update: added explain output for both query types...
The documentation you link to states: "the planner will sometimes choose to use a simple index scan even though additional indexes are available that could have been used as well".
I wasn't able to get postgres (8.4) to behave the way you want when the where
condition contained subqueries - only with simple conditions, so it might just be a limitation of the feature.
But the point really is that although the optimizer will try to choose the fastest execution path, it might not succeed, and in some cases you might need to 'encourage' another path, as I've done in the 'modified' query below with a union
:
create table distritos(id serial primary key, distrito_t text);
insert into distritos(distrito_t) select 'distrito'||generate_series(1, 10000);
create table concelho(id serial primary key, concelho_t text);
insert into concelho(concelho_t) select 'concelho'||generate_series(1, 10000);
create table entidades( eid serial primary key,
distrito integer not null references distritos,
concelho integer not null references concelho );
insert into entidades(distrito, concelho)
select generate_series(1, 10000), generate_series(1, 10000);
original:
explain analyze
select eid from entidades
where concelho in (select id from concelho where concelho_t like '%lisboa%')
or distrito in (select id from distritos where distrito_t like '%lisboa%');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on entidades (cost=299.44..494.94 rows=7275 width=4) (actual time=8.978..8.978 rows=0 loops=1)
Filter: ((hashed SubPlan 1) OR (hashed SubPlan 2))
SubPlan 1
-> Seq Scan on concelho (cost=0.00..149.71 rows=2 width=4) (actual time=3.922..3.922 rows=0 loops=1)
Filter: (concelho_t ~~ '%lisboa%'::text)
SubPlan 2
-> Seq Scan on distritos (cost=0.00..149.71 rows=2 width=4) (actual time=3.363..3.363 rows=0 loops=1)
Filter: (distrito_t ~~ '%lisboa%'::text)
modified:
explain analyze
select eid from entidades
where concelho in (select id from concelho where concelho_t like '%lisboa%')
union
select eid from entidades
where distrito in (select id from distritos where distrito_t like '%lisboa%');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=648.98..650.92 rows=194 width=4) (actual time=5.409..5.409 rows=0 loops=1)
-> Append (cost=149.74..648.50 rows=194 width=4) (actual time=5.399..5.399 rows=0 loops=1)
-> Hash Semi Join (cost=149.74..323.28 rows=97 width=4) (actual time=2.743..2.743 rows=0 loops=1)
Hash Cond: (stack.entidades.concelho = concelho.id)
-> Seq Scan on entidades (cost=0.00..147.00 rows=9700 width=8) (actual time=0.013..0.013 rows=1 loops=1)
-> Hash (cost=149.71..149.71 rows=2 width=4) (actual time=2.723..2.723 rows=0 loops=1)
-> Seq Scan on concelho (cost=0.00..149.71 rows=2 width=4) (actual time=2.716..2.716 rows=0 loops=1)
Filter: (concelho_t ~~ '%lisboa%'::text)
-> Hash Semi Join (cost=149.74..323.28 rows=97 width=4) (actual time=2.655..2.655 rows=0 loops=1)
Hash Cond: (stack.entidades.distrito = distritos.id)
-> Seq Scan on entidades (cost=0.00..147.00 rows=9700 width=8) (actual time=0.006..0.006 rows=1 loops=1)
-> Hash (cost=149.71..149.71 rows=2 width=4) (actual time=2.642..2.642 rows=0 loops=1)
-> Seq Scan on distritos (cost=0.00..149.71 rows=2 width=4) (actual time=2.642..2.642 rows=0 loops=1)
Filter: (distrito_t ~~ '%lisboa%'::text)
I don't have an answer for you, only a question, why use a subquery ?
SELECT eid
FROM entidades e
LEFT JOIN concelho c ON e.concelho = c.id
LEFT JOIN distritos d ON e.distrito = d.id
WHERE
concelho_t LIKE '%lisboa%' OR
distrito_t LIKE '%lisboa%';
Try:
SELECT eid
FROM entidades e
WHERE concelho IN ( SELECT id FROM concelho WHERE concelho_t LIKE '%lisboa%' )
UNION
SELECT eid
FROM entidades e
OR distrito IN ( SELECT id FROM distritos WHERE distrito_t LIKE '%lisboa%' )
But the real problem is lack of Normalisation in the database, the absence of a hierarchy for Country, Territory, county (condado, parroquia), town, suburb. If you had that, then organisation could belong to the structure in the right place, and you would not have "Lisboa" occurring at both council at district levels.
精彩评论