Thread: ORDER BY operator index scans and filtering
Hi,
Currently, index scans that order by an operator (for instance, `location <-> POINT(0, 0)`) and have a filter for the same expression (`location <-> POINT(0, 0) < 2`) can end up scanning much more of the index than is necessary.
Here's a complete example:
CREATE TABLE stores (location point);
INSERT INTO stores SELECT POINT(0, i) FROM generate_series(1, 100000) i;
CREATE INDEX ON stores USING gist (location);
EXPLAIN (ANALYZE, COSTS OFF) SELECT * FROM stores WHERE location <-> POINT(0, 0) < 2 ORDER BY location <-> POINT(0, 0) LIMIT 10;
Once the second tuple returned from the index has a distance >= 2, the scan should be able to end (as it's an ascending order scan). Instead, it scans the entire index, filtering out the next 99,998 rows.
Limit (actual time=0.166..32.573 rows=1 loops=1)
-> Index Only Scan using stores_location_idx on stores (actual time=0.165..32.570 rows=1 loops=1)
Order By: (location <-> '(0,0)'::point)
Filter: ((location <-> '(0,0)'::point) < '2'::double precision)
Rows Removed by Filter: 99999
This can be especially costly for vector index scans (this was found while working on an upcoming feature for pgvector [1]).
- Andrew
[1] https://github.com/pgvector/pgvector/issues/678
Currently, index scans that order by an operator (for instance, `location <-> POINT(0, 0)`) and have a filter for the same expression (`location <-> POINT(0, 0) < 2`) can end up scanning much more of the index than is necessary.
Here's a complete example:
CREATE TABLE stores (location point);
INSERT INTO stores SELECT POINT(0, i) FROM generate_series(1, 100000) i;
CREATE INDEX ON stores USING gist (location);
EXPLAIN (ANALYZE, COSTS OFF) SELECT * FROM stores WHERE location <-> POINT(0, 0) < 2 ORDER BY location <-> POINT(0, 0) LIMIT 10;
Once the second tuple returned from the index has a distance >= 2, the scan should be able to end (as it's an ascending order scan). Instead, it scans the entire index, filtering out the next 99,998 rows.
Limit (actual time=0.166..32.573 rows=1 loops=1)
-> Index Only Scan using stores_location_idx on stores (actual time=0.165..32.570 rows=1 loops=1)
Order By: (location <-> '(0,0)'::point)
Filter: ((location <-> '(0,0)'::point) < '2'::double precision)
Rows Removed by Filter: 99999
This can be especially costly for vector index scans (this was found while working on an upcoming feature for pgvector [1]).
- Andrew
[1] https://github.com/pgvector/pgvector/issues/678