Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id CAH2-WznLiw20U_Tp8Otm=g7hGcfWnodM4EOGzV=F3W-EvfQDTw@mail.gmail.com
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: POC, WIP: OR-clause support for indexes
Re: POC, WIP: OR-clause support for indexes
Re: POC, WIP: OR-clause support for indexes
List pgsql-hackers
On Tue, Oct 1, 2024 at 6:25 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> I think this patchset got much better, and it could possible be
> committed after another round of cleanup and comment/docs improvement.
> It would be very kind if you share your view on the decisions made in
> this patchset.

I do think that this patch got a lot better, and simpler, but I'm a
little worried about it not covering cases that are only very slightly
different to the ones that you're targeting. It's easiest to see what
I mean using an example.

After the standard regression tests have run, the following tests can
be run from psql (this uses the recent v40 revision):

pg@regression:5432 =# create index on tenk1(four, ten); -- setup
CREATE INDEX

Very fast INT_MAX query, since we successful use the transformation
added by the patch:

pg@regression:5432 =# explain (analyze,buffers) select * from tenk1
where four = 1 or four = 2_147_483_647 order by four, ten limit 5;

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                              QUERY
PLAN                                                               │

├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=0.29..1.73 rows=5 width=244) (actual time=0.011..0.014
rows=5 loops=1)                                                   │
│   Buffers: shared hit=4
                                                                 │
│   ->  Index Scan using tenk1_four_ten_idx on tenk1
(cost=0.29..721.25 rows=2500 width=244) (actual time=0.011..0.012
rows=5 loops=1) │
│         Index Cond: (four = ANY ('{1,2147483647}'::integer[]))
                                                                 │
│         Index Searches: 1
                                                                 │
│         Buffers: shared hit=4
                                                                 │
│ Planning Time: 0.067 ms
                                                                 │
│ Execution Time: 0.022 ms
                                                                 │

└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

Much slower query, which is not capable of applying the transformation
due only to
the fact that I've "inadvertently" mixed together multiple types (int4
and int8):

pg@regression:5432 =# explain (analyze,buffers) select * from tenk1
where four = 1 or four = 2_147_483_648 order by four, ten limit 5;

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                              QUERY
PLAN                                                               │

├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=0.29..2.08 rows=5 width=244) (actual time=0.586..0.588
rows=5 loops=1)                                                   │
│   Buffers: shared hit=1368
                                                                 │
│   ->  Index Scan using tenk1_four_ten_idx on tenk1
(cost=0.29..900.25 rows=2500 width=244) (actual time=0.586..0.587
rows=5 loops=1) │
│         Index Searches: 1
                                                                 │
│         Filter: ((four = 1) OR (four = '2147483648'::bigint))
                                                                 │
│         Rows Removed by Filter: 2500
                                                                 │
│         Buffers: shared hit=1368
                                                                 │
│ Planning Time: 0.050 ms
                                                                 │
│ Execution Time: 0.595 ms
                                                                 │

└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

Do you think this problem can be fixed easily? This behavior seems
surprising, and is best avoided. Performance cliffs that happen when
we tweak one detail of a query just seem worth avoiding on general
principle.

Now that you're explicitly creating RestrictInfos for a particular
index, I suppose that it might be easier to do this kind of thing --
you have more context. Perhaps the patch can be made to recognize
a mix of constants like this as all being associated with the same
B-Tree operator family (the opfamily that the input opclass belongs
to)? Perhaps the constants could all be normalized to the same type via
casts/coercions into the underlying B-Tree input opclass -- that
extra step should be correct ("64.1.2. Behavior of B-Tree Operator Classes"
describes certain existing guarantees that this step would need to rely
on).

Note that the patch already works in cross-type scenarios, with
cross-type operators. The issue I've highlighted is caused by the use
of a mixture of types among the constants themselves -- the patch
wants an array with elements that are all of the same type, which it
can't quite manage. And so I can come up with a cross-type variant
query that *can* still use a SAOP as expected with v40, despite
involving a cross-type = btree operator:

pg@regression:5432 [2181876]=# explain (analyze,buffers) select * from
tenk1 where four = 2_147_483_648 or four = 2_147_483_649 order by
four, ten limit 5;

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                            QUERY
PLAN                                                            │

├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=0.29..6.53 rows=1 width=244) (actual time=0.004..0.005
rows=0 loops=1)                                              │
│   Buffers: shared hit=2
                                                            │
│   ->  Index Scan using tenk1_four_ten_idx on tenk1  (cost=0.29..6.53
rows=1 width=244) (actual time=0.004..0.004 rows=0 loops=1) │
│         Index Cond: (four = ANY
('{2147483648,2147483649}'::bigint[]))
                          │
│         Index Searches: 1
                                                            │
│         Buffers: shared hit=2
                                                            │
│ Planning Time: 0.044 ms
                                                            │
│ Execution Time: 0.011 ms
                                                            │

└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

The fact that this third and final example works as expected makes me
even more convinced that the second example should behave similarly.

--
Peter Geoghegan

pgsql-hackers by date:

Previous
From: Alena Rybakina
Date:
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Next
From: Laurenz Albe
Date:
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization