I'm fairly new to SQL, and I'm trying to work out how to speed up a complex SQL query in postgres, perhaps by improving my use of indexes. This is the query:
SELECT
(SELECT ev.code FROM classification_item ci, enumeration_value ev
WHERE ev.key_id = :ak_0
AND ci.entry_id = t.id AND ci.value_id = ev.id) AS axis_0,
(SELECT ev.code FROM classification_item ci, enumeration_value ev
WHERE ev.key_id = :ak_1
AND ci.entry_id = t.id AND ci.value_id = ev.id) AS axis_1,
SU开发者_Go百科M(t.amount) as amount,
(SELECT ev.code FROM classification_item ci, enumeration_value ev
WHERE ev.key_id = :key_time_id
AND ci.entry_id = t.id AND ci.value_id = ev.id) AS time
FROM "entry" t
WHERE t.dataset_id = :dataset_id
AND t.id IN (SELECT ci.entry_id FROM classification_item ci, enumeration_value ev
WHERE ev.key_id = :k_0
AND ev.code = :v_0 AND ci.value_id = ev.id)
GROUP BY time, axis_0, axis_1
This is basically the database schema (as defined in Pylons):
table_dataset = Table('dataset', meta.metadata,
Column('id', Integer, primary_key=True),
)
table_entry = Table('entry', meta.metadata,
Column('id', Integer, primary_key=True),
Column('dataset_id', Integer, ForeignKey('dataset.id')),
Column('amount', Float()),
)
table_classification_item = Table('classification_item', meta.metadata,
Column('id', Integer, primary_key=True),
Column('entry_id', Integer, ForeignKey('entry.id'), index=True),
Column('value_id', Integer, ForeignKey('enumeration_value.id'), index=True)
)
table_enumeration_value = Table('enumeration_value', meta.metadata,
Column('id', Integer, primary_key=True),
Column('key_id', Integer, ForeignKey('key.id'), index=True),
Column('code', UnicodeText(), index=True),
)
And it has indexes as follows:
"dataset_pkey" PRIMARY KEY, btree (id)
"entry_pkey" PRIMARY KEY, btree (id)
"classification_item_pkey" PRIMARY KEY, btree (id)
"ix_classification_item_entry_id" btree (entry_id)
"ix_classification_item_value_id" btree (value_id)
"enumeration_value_pkey" PRIMARY KEY, btree (id)
"ix_enumeration_value_code" btree (code)
"ix_enumeration_value_key_id" btree (key_id)
Am I missing any obvious index that would speed up the query? In particular:
- should I be using 'clustered' indexes?
- should I also be indexing
amount
onentry
, or would that make no difference toSUM(t.amount) as amount
?
Thanks for your help. I know this is a pretty complex question, so please tell me if I can do anything to improve it.
------ UPDATE --------------
The output from EXPLAIN ANALYZE on the above query.
If the enumeration_value
table is small, I guess you can get some improvment by making axis_1
and axis_0
as join and add an extra index.
something like this (not tested)
CREATE INDEX idx_ci_vi_ei ON classification_item(value_id, entry_id);
CREATE INDEX idx_id_ki ON enumeration_value(id, key_id);
SELECT
ci_0.code AS axis_0,
ci_1.code AS axis_1,
SUM(t.amount) as amount,
ci_t.code AS time
FROM
"entry" t,
(SELECT ev.code FROM classification_item ci, enumeration_value ev
WHERE ev.key_id = :ak_0 AND ci.value_id = ev.id) ci_0,
(SELECT ev.code FROM classification_item ci, enumeration_value ev
WHERE ev.key_id = :ak_1 AND ci.value_id = ev.id) ci_1,
(SELECT ev.code FROM classification_item ci, enumeration_value ev
WHERE ev.key_id = :key_time_id AND ci.value_id = ev.id) ci_t
WHERE t.dataset_id = :dataset_id
AND t.id IN (SELECT ci.entry_id FROM classification_item ci, enumeration_value ev
WHERE ev.key_id = :k_0
AND ev.code = :v_0 AND ci.value_id = ev.id)
AND t.id = ci_0.entry_id AND t.id = ci_1.entry_id AND t.id = ci_t.entry_id
GROUP BY time, axis_0, axis_1
What has EXPLAIN ANALYZE to say about the queryplan?
精彩评论