Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Alena Rybakina |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | 33cfe17d-5462-4c82-9b5e-2865cf870bcf@postgrespro.ru Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Peter Geoghegan <pg@bowt.ie>) |
List | pgsql-hackers |
Hi!
On 03.10.2024 23:15, Peter Geoghegan wrote:
I do think that this patch got a lot better, and simpler, but I'm a little worried about it not covering cases that are only very slightly different to the ones that you're targeting. It's easiest to see what I mean using an example. After the standard regression tests have run, the following tests can be run from psql (this uses the recent v40 revision): pg@regression:5432 =# create index on tenk1(four, ten); -- setup CREATE INDEX Very fast INT_MAX query, since we successful use the transformation added by the patch: pg@regression:5432 =# explain (analyze,buffers) select * from tenk1 where four = 1 or four = 2_147_483_647 order by four, ten limit 5; ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Limit (cost=0.29..1.73 rows=5 width=244) (actual time=0.011..0.014 rows=5 loops=1) │ │ Buffers: shared hit=4 │ │ -> Index Scan using tenk1_four_ten_idx on tenk1 (cost=0.29..721.25 rows=2500 width=244) (actual time=0.011..0.012 rows=5 loops=1) │ │ Index Cond: (four = ANY ('{1,2147483647}'::integer[])) │ │ Index Searches: 1 │ │ Buffers: shared hit=4 │ │ Planning Time: 0.067 ms │ │ Execution Time: 0.022 ms │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (8 rows) Much slower query, which is not capable of applying the transformation due only to the fact that I've "inadvertently" mixed together multiple types (int4 and int8): pg@regression:5432 =# explain (analyze,buffers) select * from tenk1 where four = 1 or four = 2_147_483_648 order by four, ten limit 5; ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Limit (cost=0.29..2.08 rows=5 width=244) (actual time=0.586..0.588 rows=5 loops=1) │ │ Buffers: shared hit=1368 │ │ -> Index Scan using tenk1_four_ten_idx on tenk1 (cost=0.29..900.25 rows=2500 width=244) (actual time=0.586..0.587 rows=5 loops=1) │ │ Index Searches: 1 │ │ Filter: ((four = 1) OR (four = '2147483648'::bigint)) │ │ Rows Removed by Filter: 2500 │ │ Buffers: shared hit=1368 │ │ Planning Time: 0.050 ms │ │ Execution Time: 0.595 ms │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (9 rows) Do you think this problem can be fixed easily? This behavior seems surprising, and is best avoided. Performance cliffs that happen when we tweak one detail of a query just seem worth avoiding on general principle. Now that you're explicitly creating RestrictInfos for a particular index, I suppose that it might be easier to do this kind of thing -- you have more context. Perhaps the patch can be made to recognize a mix of constants like this as all being associated with the same B-Tree operator family (the opfamily that the input opclass belongs to)? Perhaps the constants could all be normalized to the same type via casts/coercions into the underlying B-Tree input opclass -- that extra step should be correct ("64.1.2. Behavior of B-Tree Operator Classes" describes certain existing guarantees that this step would need to rely on). Note that the patch already works in cross-type scenarios, with cross-type operators. The issue I've highlighted is caused by the use of a mixture of types among the constants themselves -- the patch wants an array with elements that are all of the same type, which it can't quite manage. And so I can come up with a cross-type variant query that *can* still use a SAOP as expected with v40, despite involving a cross-type = btree operator: pg@regression:5432 [2181876]=# explain (analyze,buffers) select * from tenk1 where four = 2_147_483_648 or four = 2_147_483_649 order by four, ten limit 5; ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Limit (cost=0.29..6.53 rows=1 width=244) (actual time=0.004..0.005 rows=0 loops=1) │ │ Buffers: shared hit=2 │ │ -> Index Scan using tenk1_four_ten_idx on tenk1 (cost=0.29..6.53 rows=1 width=244) (actual time=0.004..0.004 rows=0 loops=1) │ │ Index Cond: (four = ANY ('{2147483648,2147483649}'::bigint[])) │ │ Index Searches: 1 │ │ Buffers: shared hit=2 │ │ Planning Time: 0.044 ms │ │ Execution Time: 0.011 ms │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (8 rows) The fact that this third and final example works as expected makes me even more convinced that the second example should behave similarly.
Yes, I agree with you that it should be added in the feature but in the future thread.
The patch does not solve all the problems we planned for, as the previous patch did (discussed here [0]), but it also does not cause the performance problems that
were associated with building a suboptimal plan.
Furthermore I think this issue, like the one noted here [0], can be fixed in a way I proposed before [1], but I assume it is better resolved in the next thread related to the patch.
[1] https://www.postgresql.org/message-id/985f2924-9769-4927-ad6e-d430c394054d%40postgrespro.ru
-- Regards, Alena Rybakina Postgres Professional
pgsql-hackers by date: