I have 2 sample tables with 1m rows:
create table deliveries
(
id serial
primary key,
order_id integer not null
references orders,
point geometry(Point, 4326) not null,
delivery_timestamp timestamp not null,
year integer not null
);
One of them has index:
create index idx_deliveries_geom
on deliveries using gist (point);
Sample query:
SELECT ST_AsText(point), order_id, delivery_timestamp
FROM deliveries
WHERE ST_Intersects(
point,
ST_MakeEnvelope(32.085065712094298, 54.87186750531864, 32.238874305844298, 54.895072980740217, 4326)
)
ORDER BY delivery_timestamp
LIMIT 100;
Query with index works about 2.5 times slower than without index. Plan with index:
Limit (cost=20647022.44..20647086.19 rows=100 width=44)
-> Result (cost=20647022.44..21672876.60 rows=1609183 width=44)
-> Sort (cost=20647022.44..20651045.39 rows=1609183 width=44)
Sort Key: delivery_timestamp
-> Index Scan using idx_deliveries_geom on deliveries (cost=0.42..20585520.62 rows=1609183 width=44)
Index Cond: (point && '0103000020E610000001000000050000005DBAE96EE30A4040F3E4BA5A996F4B405DBAE96EE30A4040FFE75DC091724B405DBAE96E931E4040FFE75DC091724B405DBAE96E931E4040F3E4BA5A996F4B405DBAE96EE30A4040F3E4BA5A996F4B40'::geometry)
" Filter: st_intersects(point, '0103000020E610000001000000050000005DBAE96EE30A4040F3E4BA5A996F4B405DBAE96EE30A4040FFE75DC091724B405DBAE96E931E4040FFE75DC091724B405DBAE96E931E4040F3E4BA5A996F4B405DBAE96EE30A4040F3E4BA5A996F4B40'::geometry)"
Plan without:
Limit (cost=51975155.40..51975229.56 rows=100 width=44)
-> Gather Merge (cost=51975155.40..53282318.69 rows=1762448 width=44)
Workers Planned: 2
-> Result (cost=51974155.37..52535935.67 rows=881224 width=44)
-> Sort (cost=51974155.37..51976358.43 rows=881224 width=44)
Sort Key: delivery_timestamp
-> Parallel Seq Scan on deliveries (cost=0.00..51940475.62 rows=881224 width=44)
" Filter: st_intersects(point, '0103000020E610000001000000050000005DBAE96EE30A4040F3E4BA5A996F4B405DBAE96EE30A4040FFE75DC091724B405DBAE96E931E4040FFE75DC091724B405DBAE96E931E4040F3E4BA5A996F4B405DBAE96EE30A4040F3E4BA5A996F4B40'::geometry)"
JIT:
Functions: 7
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
I dont understand 2 things:
- why postgres uses same filter twice? Index Cond and st_intersects? Looks like it just doubles the work, checking itself one more time.
- parallel scan is faster than index work, but with deliveries table growth index should become faster?
I have another bigger query that only seconds this question, parallel scan works faster than index at least on my sample data. Data is generated randomly around points 32.0401, 54.7818 (the logic for generation is slightly harder, but all points are in about 10x10 km square).