开发者

SQLite join optimisation

开发者 https://www.devze.com 2023-01-24 22:21 出处:网络
If you have a query such as: select a.Name, a.Description from a inner join b on a.id1 = b.id1 inner join c on b.id2 = c.id2

If you have a query such as:

select a.Name, a.Description from a
inner join b on a.id1 = b.id1
inner join c on b.id2 = c.id2
group by a.Name, a.Description

What would be the most optimal columns to index for this query in SQLite if you consider that there are over 100,000 rows in开发者_如何学Python each of the tables?

The reason that I ask is that I do not get the performance with the query with the group by that I would expect from another RDBMS (SQL Server) when I apply the same optimisation.

Would I be right in thinking that all columns referenced on a single table in a query in SQLite need to be included in a single composite index for best performance?


The problem is that you're expecting SQLite to have the same performance characteristics as a full RDBMS. It won't. SQLLite doesn't have the luxury of getting to cache quite as much in memory, has to rebuild the cache every time you run the application, is probably limited to set number of cores, etc, etc, etc. Tradeoffs for using an embedded RDBMS over a full one.

As far as optimizations go, try indexing the lookup columns and test. Then try creating a covering index. Be sure to test both selects and code paths that update the database, you're speeding up one at the expense of the other. Find the indexing that gives the best balance between the two for your needs and go with it.


From the SQLite query optimization overview:

When doing an indexed lookup of a row, the usual procedure is to do a binary search on the index to find the index entry, then extract the rowid from the index and use that rowid to do a binary search on the original table. Thus a typical indexed lookup involves two binary searches. If, however, all columns that were to be fetched from the table are already available in the index itself, SQLite will use the values contained in the index and will never look up the original table row. This saves one binary search for each row and can make many queries run twice as fast.

For any other RDBMS, I'd say to put a clustered index on b.id1 and c.id2. For SQLite, you might be better off including any columns from b and c that you want to lookup in those indexes too.


Beware: I know nothing of possible intricacies of SQLite and its execution plans.

You definitely need indexes on a.id1, b.id1, b.id2 and c.id2. I think a composite index (b.id1, b.id2) could yield a small performance increase. The same goes for (a.id1, a.Name, a.Description).


Since you're not using the other tables for your return columns, perhaps this will be faster:

SELECT DISTINCT a.Name, a.Description
FROM a, b, c
WHERE a.id1 = b.id1
AND b.id2 = c.id2

Looking at the returned columns, since the criteria seems to be only that they must be linked from a to b to c, you could look for all unique a.Name and a.Description pairs.

SELECT DISTINCT a.Name, a.Description
FROM a
WHERE a.id1 IN (
 SELECT b.id1
 FROM b
 WHERE b.id2 IN (
  SELECT c.id2
  FROM c
  )
 )

Or, depending on if every pair of a.Name and a.Description is already unique, there should be some gain in finding out first the unique id's then fetching the other columns.

SELECT a.Name, a.Description
FROM a 
WHERE a.id1 IN (
 SELECT DISTINCT a.id1
 FROM a
 WHERE a.id1 IN (
  SELECT b.id1
  FROM b
  WHERE b.id2 IN (
   SELECT c.id2
   FROM c
   )
  )
 )


I think indexes on a.id1 and b.id2 would give you about as much benefit as you could get in terms of the JOINs. But SQLite offers EXPLAIN, and it might help you determine if there's an avoidable in efficiency in the current execution plan.

0

精彩评论

暂无评论...
验证码 换一张
取 消