开发者

Why are my spatial searches slower in SQL Server than PostGIS?

开发者 https://www.devze.com 2023-01-11 02:34 出处:网络
I\'m working on moving some spatial searching capabilities from Postgres with PostGIS to SQL Server and I\'m seeing some pretty terrible performance, even with indexes.

I'm working on moving some spatial searching capabilities from Postgres with PostGIS to SQL Server and I'm seeing some pretty terrible performance, even with indexes.

My data is around a million points, and I want to find out which of those points are within given shapes, so the query looks something like this:

DECLARE @Shape GEOMETRY = ...
SELECT * FROM PointsTable WHERE Point.STWithin(@Shape) = 1

If I select a fairly small shape, I can sometimes get sub-second times, but if my shape is fairly large (which they sometimes are), I can get times over 5 minutes. If I run the same searches in Postgres, they're always under a second (in fact, almost all are under 200 ms).

I've tried several different grid sizes on my indexes (all high, all medium, all low), different cells per object (16, 64, 256), and no matter what I do the times stay fairly constant. I'd like to try more combinations but I don't even know what direction to go. More cells per object? Less? Some strange combination of grid sizes?

I've looked at my query plans and they're always using the index, it's just not helping at all. I've even tried without the index, and it's not much worse.

Is there any advice anyone can give on this? Everything I can find suggests "we can't give you any advice on indexes, just try everything and maybe one will work", but with it taking 10 minutes to create an index, doing this blindly is a massive waste of time.

EDIT: I also posted this on a Microsoft forum. Here's some information they asked for on there:

The best working index I could get was this one:

CREATE SPATIAL INDEX MapTesting_Location_Medium_Medium_Medium_Medium_16_NDX
    ON MapTesting (Location)
 USING GEOMETRY_GRID
  WITH (
    BOUNDING_BOX = ( -- The extent of our data, data is clustered in cities, but this is about as small as the index can be without missing thousands of points
        XMIN = -12135832,
        YMIN = 4433884,
        XMAX = -11296439,
        YMAX = 5443645),
    GRIDS = (
        LEVEL_1 = MEDIUM,
        LEVEL_2 = MEDIUM,
        LEVEL_3 = MEDIUM,
        LEVEL_4 = MEDIUM),
     CELLS_PER_OBJECT = 256 -- This was set to 16 but it was much slower
  )

I had some issues getting the index used, but this is different.

For these tests I ran a test search (the one listed in my origi开发者_Go百科nal post) with a WITH(INDEX(...)) clause for each of my indexes (testing various settings for grid size and cells per object), and one without any hint. I also ran sp_help_spatial_geometry_index using each index and the same search shape. The index listed above ran fastest and also was listed as most efficient in sp_help_spatial_geometry_index.

When running the search I get these statistics:

(1 row(s) affected)
Table 'MapTesting'. Scan count 0, logical reads 361142, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'extended_index_592590491_384009'. Scan count 1827, logical reads 8041, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 6735 ms,  elapsed time = 13499 ms.

I also tried using random points as data (since I can't give out our real data), but it turns out that this search is really fast with random data. This lead us to believe that our problem is how the grid system works with our data.

Our data is addresses across the entire state, so there are a few very high density regions, but mostly sparse data. I think the problem is that no setting for the grid sizes works well for both. With grids set to HIGH, the index returns too many cells in low-density areas, and with grids set to LOW, the grids are useless in high density areas (at MEDIUM, it's not as bad, but still not good at either).

I am able to get the index used, it's just not helping. Every test was run with "show actual execution plan" turned on, and it always shows the index.


I've just spent the day on a similar problem. In particular, we are doing a point-in-polygon type of query, where there was a relatively small set of polygons, but each polygon was large and complex.

Solution turned out to be as follows, for the spatial index on the polygon table:

  1. Use "geometry auto grid" instead of the old MMLL etc. This gives 8 levels of indexing instead of the old 4, and the settings are automatic. AND...
  2. Set 'cells per object' to 2000 or 4000. (Not an easy thing to guess, given that the default is 16!)

This made a huge difference. It was 10 times faster than a spatial index in the default configuration, and 60 times faster than no index at all.


Here are some remarks about SQL-Server's spatial extensions and how to ensure that the index is efficiently used:

  • http://sqlskills.com/BLOGS/BOBB/post/How-to-ensure-your-spatial-index-is-being-used.aspx

Apparently, the planner has difficulties to build a good plan if he does not know the actual geometry during parse time. The autor suggest to insert exec sp_executesql:

Replace:

-- does not use the spatial index without a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
select a.id, a.shape.STAsText() 
from zipcodes a 
where a.shape.STIntersects(@latlonPoint)=1
go

with:

-- this does use the spatial index without using a hint
declare @latlonPoint geometry = geometry::Parse('POINT (45.518066 -122.767464)')
exec sp_executesql 
N'select a.id, a.shape.STAsText() 
from zipcodes a 
where a.shape.STIntersects(@latlonPoint)=1', N'@latlonPoint geometry', @latlonPoint
go


My gut reaction is "because Microsoft hasn't bothered to make it fast, because it's not an Enterprise Feature". Maybe I'm being cynical.

I'm not sure why you're migrating away from Postgres either.


I believe STIntersects is better optimized for using the index would have better performance than STWithin, especially for larger shapes.


Implementation efficency issues aside SQL server uses Quadtree index whereas PostGIS uses R-tree index.

R-tree is for most cases the better algorithm, especially for large datasets with varying geometry size.


Have you set up your spatial index correctly? Is your bounding box correct? Are all points inside? In your case probably HHMM for GRIDS would work the best (depending again on a bouding box).

Can you try to use sp_help_spatial_geometry_index, to see what's wrong? http://msdn.microsoft.com/en-us/library/cc627426.aspx

Try using Filter operation instead and tell us what perf numbers you get? (it executes only primary filter (use index) without going through secondary filter (true spatial operation))

Something is wrong with your setup. Spatial is indeed new feature but it's not that bad.


You might try breaking it down into two passes:

  1. select candidates into a temp table w/ .Filter().
  2. query candidates w/ .STWithin().

eg:

SELECT * INTO #this FROM PointsTable WHERE Point.Filter(@Shape) = 1
SELECT * FROM #this WHERE Point.STWithin(@Shape) = 1

(replacing SELECT * with only the actual columns you need to reduce I/O)

This kind of micro-optimization shouldn't be necessary, but I have seen decent performance improvements before. Also, you will be able to gauge how selective your index is by the ratio of (1) to (2).


I'm not familiar with spatial queries, but it could be a parameterized query problem

try writing a query (without using parameters) with a fixed value (use a value that performs slow with the parameterized query) and run it. Compare the times with the parameterized version. If its much faster, then your problem is parameterized queries.

If the above is much faster, then I would dynamically build your sql string with the parameter values embedded in the string, that way you can remove parameters from causing problems.

0

精彩评论

暂无评论...
验证码 换一张
取 消