Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | CAH2-WznLiw20U_Tp8Otm=g7hGcfWnodM4EOGzV=F3W-EvfQDTw@mail.gmail.com Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: POC, WIP: OR-clause support for indexes
Re: POC, WIP: OR-clause support for indexes Re: POC, WIP: OR-clause support for indexes |
List | pgsql-hackers |
On Tue, Oct 1, 2024 at 6:25 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > I think this patchset got much better, and it could possible be > committed after another round of cleanup and comment/docs improvement. > It would be very kind if you share your view on the decisions made in > this patchset. I do think that this patch got a lot better, and simpler, but I'm a little worried about it not covering cases that are only very slightly different to the ones that you're targeting. It's easiest to see what I mean using an example. After the standard regression tests have run, the following tests can be run from psql (this uses the recent v40 revision): pg@regression:5432 =# create index on tenk1(four, ten); -- setup CREATE INDEX Very fast INT_MAX query, since we successful use the transformation added by the patch: pg@regression:5432 =# explain (analyze,buffers) select * from tenk1 where four = 1 or four = 2_147_483_647 order by four, ten limit 5; ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Limit (cost=0.29..1.73 rows=5 width=244) (actual time=0.011..0.014 rows=5 loops=1) │ │ Buffers: shared hit=4 │ │ -> Index Scan using tenk1_four_ten_idx on tenk1 (cost=0.29..721.25 rows=2500 width=244) (actual time=0.011..0.012 rows=5 loops=1) │ │ Index Cond: (four = ANY ('{1,2147483647}'::integer[])) │ │ Index Searches: 1 │ │ Buffers: shared hit=4 │ │ Planning Time: 0.067 ms │ │ Execution Time: 0.022 ms │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (8 rows) Much slower query, which is not capable of applying the transformation due only to the fact that I've "inadvertently" mixed together multiple types (int4 and int8): pg@regression:5432 =# explain (analyze,buffers) select * from tenk1 where four = 1 or four = 2_147_483_648 order by four, ten limit 5; ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Limit (cost=0.29..2.08 rows=5 width=244) (actual time=0.586..0.588 rows=5 loops=1) │ │ Buffers: shared hit=1368 │ │ -> Index Scan using tenk1_four_ten_idx on tenk1 (cost=0.29..900.25 rows=2500 width=244) (actual time=0.586..0.587 rows=5 loops=1) │ │ Index Searches: 1 │ │ Filter: ((four = 1) OR (four = '2147483648'::bigint)) │ │ Rows Removed by Filter: 2500 │ │ Buffers: shared hit=1368 │ │ Planning Time: 0.050 ms │ │ Execution Time: 0.595 ms │ └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (9 rows) Do you think this problem can be fixed easily? This behavior seems surprising, and is best avoided. Performance cliffs that happen when we tweak one detail of a query just seem worth avoiding on general principle. Now that you're explicitly creating RestrictInfos for a particular index, I suppose that it might be easier to do this kind of thing -- you have more context. Perhaps the patch can be made to recognize a mix of constants like this as all being associated with the same B-Tree operator family (the opfamily that the input opclass belongs to)? Perhaps the constants could all be normalized to the same type via casts/coercions into the underlying B-Tree input opclass -- that extra step should be correct ("64.1.2. Behavior of B-Tree Operator Classes" describes certain existing guarantees that this step would need to rely on). Note that the patch already works in cross-type scenarios, with cross-type operators. The issue I've highlighted is caused by the use of a mixture of types among the constants themselves -- the patch wants an array with elements that are all of the same type, which it can't quite manage. And so I can come up with a cross-type variant query that *can* still use a SAOP as expected with v40, despite involving a cross-type = btree operator: pg@regression:5432 [2181876]=# explain (analyze,buffers) select * from tenk1 where four = 2_147_483_648 or four = 2_147_483_649 order by four, ten limit 5; ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Limit (cost=0.29..6.53 rows=1 width=244) (actual time=0.004..0.005 rows=0 loops=1) │ │ Buffers: shared hit=2 │ │ -> Index Scan using tenk1_four_ten_idx on tenk1 (cost=0.29..6.53 rows=1 width=244) (actual time=0.004..0.004 rows=0 loops=1) │ │ Index Cond: (four = ANY ('{2147483648,2147483649}'::bigint[])) │ │ Index Searches: 1 │ │ Buffers: shared hit=2 │ │ Planning Time: 0.044 ms │ │ Execution Time: 0.011 ms │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ (8 rows) The fact that this third and final example works as expected makes me even more convinced that the second example should behave similarly. -- Peter Geoghegan
pgsql-hackers by date: