最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

Using GIST index with PostGIS makes query slower than without it - Stack Overflow

programmeradmin0浏览0评论

I have 2 sample tables with 1m rows:

create table deliveries
(
    id                 serial
        primary key,
    order_id           integer               not null
        references orders,
    point              geometry(Point, 4326) not null,
    delivery_timestamp timestamp             not null,
    year               integer               not null
);

One of them has index:

create index idx_deliveries_geom
    on deliveries using gist (point);

Sample query:

SELECT ST_AsText(point), order_id, delivery_timestamp
FROM deliveries
WHERE ST_Intersects(
              point,
              ST_MakeEnvelope(32.085065712094298, 54.87186750531864, 32.238874305844298, 54.895072980740217, 4326)
      )
ORDER BY delivery_timestamp
LIMIT 100;

Query with index works about 2.5 times slower than without index. Plan with index:

Limit  (cost=20647022.44..20647086.19 rows=100 width=44)
  ->  Result  (cost=20647022.44..21672876.60 rows=1609183 width=44)
        ->  Sort  (cost=20647022.44..20651045.39 rows=1609183 width=44)
              Sort Key: delivery_timestamp
              ->  Index Scan using idx_deliveries_geom on deliveries  (cost=0.42..20585520.62 rows=1609183 width=44)
                    Index Cond: (point && '0103000020E610000001000000050000005DBAE96EE30A4040F3E4BA5A996F4B405DBAE96EE30A4040FFE75DC091724B405DBAE96E931E4040FFE75DC091724B405DBAE96E931E4040F3E4BA5A996F4B405DBAE96EE30A4040F3E4BA5A996F4B40'::geometry)
"                    Filter: st_intersects(point, '0103000020E610000001000000050000005DBAE96EE30A4040F3E4BA5A996F4B405DBAE96EE30A4040FFE75DC091724B405DBAE96E931E4040FFE75DC091724B405DBAE96E931E4040F3E4BA5A996F4B405DBAE96EE30A4040F3E4BA5A996F4B40'::geometry)"

Plan without:

Limit  (cost=51975155.40..51975229.56 rows=100 width=44)
  ->  Gather Merge  (cost=51975155.40..53282318.69 rows=1762448 width=44)
        Workers Planned: 2
        ->  Result  (cost=51974155.37..52535935.67 rows=881224 width=44)
              ->  Sort  (cost=51974155.37..51976358.43 rows=881224 width=44)
                    Sort Key: delivery_timestamp
                    ->  Parallel Seq Scan on deliveries  (cost=0.00..51940475.62 rows=881224 width=44)
"                          Filter: st_intersects(point, '0103000020E610000001000000050000005DBAE96EE30A4040F3E4BA5A996F4B405DBAE96EE30A4040FFE75DC091724B405DBAE96E931E4040FFE75DC091724B405DBAE96E931E4040F3E4BA5A996F4B405DBAE96EE30A4040F3E4BA5A996F4B40'::geometry)"
JIT:
  Functions: 7
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"

I dont understand 2 things:

  • why postgres uses same filter twice? Index Cond and st_intersects? Looks like it just doubles the work, checking itself one more time.
  • parallel scan is faster than index work, but with deliveries table growth index should become faster?

I have another bigger query that only seconds this question, parallel scan works faster than index at least on my sample data. Data is generated randomly around points 32.0401, 54.7818 (the logic for generation is slightly harder, but all points are in about 10x10 km square).

发布评论

评论列表(0)

  1. 暂无评论