I'm trying to make开发者_StackOverflow社区 a database of positions with a start and stop: basically lines on a 1D axis. I want to efficiently query all positions that overlap a given interval. In a traditional table, the query would require two inequalities, so it cannot be indexed. You can also use an R-Tree index, but they seem designed for multidimensional range queries. Is there a more efficient way to store lines on an axis?
If anybody curious, the database is to store genome intervals. Here's an example table:
CREATE TABLE lines (id INTEGER PRIMARY KEY, start INTEGER, stop INTEGER);
The basic way to do this is:
SELECT * FROM lines WHERE start <= <end of interval> AND stop >= <start of interval>;
Again, that's really slow and can't be indexed. The R-Tree would work like this:
CREATE VIRTUAL TABLE lines_index USING RTREE (id, start, stop);
SELECT * from lines_index WHERE start <= <end of interval> AND stop >= <start of interval>;
R-Trees aren't ideal for our implementation, so I'm wondering if there are any alternatives...
First of all, allthough you can't fully index it you could index by just the start interval. If 90% of the intervals have start=stop, that should make a big improvement. The only slowdown would be with very long intervals.
精彩评论