Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id 33cfe17d-5462-4c82-9b5e-2865cf870bcf@postgrespro.ru
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers

Hi!

On 03.10.2024 23:15, Peter Geoghegan wrote:
I do think that this patch got a lot better, and simpler, but I'm a
little worried about it not covering cases that are only very slightly
different to the ones that you're targeting. It's easiest to see what
I mean using an example.

After the standard regression tests have run, the following tests can
be run from psql (this uses the recent v40 revision):

pg@regression:5432 =# create index on tenk1(four, ten); -- setup
CREATE INDEX

Very fast INT_MAX query, since we successful use the transformation
added by the patch:

pg@regression:5432 =# explain (analyze,buffers) select * from tenk1
where four = 1 or four = 2_147_483_647 order by four, ten limit 5;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                              QUERY
PLAN                                                               │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=0.29..1.73 rows=5 width=244) (actual time=0.011..0.014
rows=5 loops=1)                                                   │
│   Buffers: shared hit=4                                                                 │
│   ->  Index Scan using tenk1_four_ten_idx on tenk1
(cost=0.29..721.25 rows=2500 width=244) (actual time=0.011..0.012
rows=5 loops=1) │
│         Index Cond: (four = ANY ('{1,2147483647}'::integer[]))                                                                 │
│         Index Searches: 1                                                                 │
│         Buffers: shared hit=4                                                                 │
│ Planning Time: 0.067 ms                                                                 │
│ Execution Time: 0.022 ms                                                                 │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

Much slower query, which is not capable of applying the transformation
due only to
the fact that I've "inadvertently" mixed together multiple types (int4
and int8):

pg@regression:5432 =# explain (analyze,buffers) select * from tenk1
where four = 1 or four = 2_147_483_648 order by four, ten limit 5;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                              QUERY
PLAN                                                               │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=0.29..2.08 rows=5 width=244) (actual time=0.586..0.588
rows=5 loops=1)                                                   │
│   Buffers: shared hit=1368                                                                 │
│   ->  Index Scan using tenk1_four_ten_idx on tenk1
(cost=0.29..900.25 rows=2500 width=244) (actual time=0.586..0.587
rows=5 loops=1) │
│         Index Searches: 1                                                                 │
│         Filter: ((four = 1) OR (four = '2147483648'::bigint))                                                                 │
│         Rows Removed by Filter: 2500                                                                 │
│         Buffers: shared hit=1368                                                                 │
│ Planning Time: 0.050 ms                                                                 │
│ Execution Time: 0.595 ms                                                                 │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(9 rows)

Do you think this problem can be fixed easily? This behavior seems
surprising, and is best avoided. Performance cliffs that happen when
we tweak one detail of a query just seem worth avoiding on general
principle.

Now that you're explicitly creating RestrictInfos for a particular
index, I suppose that it might be easier to do this kind of thing --
you have more context. Perhaps the patch can be made to recognize
a mix of constants like this as all being associated with the same
B-Tree operator family (the opfamily that the input opclass belongs
to)? Perhaps the constants could all be normalized to the same type via
casts/coercions into the underlying B-Tree input opclass -- that
extra step should be correct ("64.1.2. Behavior of B-Tree Operator Classes"
describes certain existing guarantees that this step would need to rely
on).

Note that the patch already works in cross-type scenarios, with
cross-type operators. The issue I've highlighted is caused by the use
of a mixture of types among the constants themselves -- the patch
wants an array with elements that are all of the same type, which it
can't quite manage. And so I can come up with a cross-type variant
query that *can* still use a SAOP as expected with v40, despite
involving a cross-type = btree operator:

pg@regression:5432 [2181876]=# explain (analyze,buffers) select * from
tenk1 where four = 2_147_483_648 or four = 2_147_483_649 order by
four, ten limit 5;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                            QUERY
PLAN                                                            │
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=0.29..6.53 rows=1 width=244) (actual time=0.004..0.005
rows=0 loops=1)                                              │
│   Buffers: shared hit=2                                                            │
│   ->  Index Scan using tenk1_four_ten_idx on tenk1  (cost=0.29..6.53
rows=1 width=244) (actual time=0.004..0.004 rows=0 loops=1) │
│         Index Cond: (four = ANY
('{2147483648,2147483649}'::bigint[]))                          │
│         Index Searches: 1                                                            │
│         Buffers: shared hit=2                                                            │
│ Planning Time: 0.044 ms                                                            │
│ Execution Time: 0.011 ms                                                            │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

The fact that this third and final example works as expected makes me
even more convinced that the second example should behave similarly.

Yes, I agree with you that it should be added in the feature but in the future thread.

The patch does not solve all the problems we planned for, as the previous patch did (discussed here [0]), but it also does not cause the performance problems that
were associated with building a suboptimal plan.

Furthermore I think this issue, like the one noted here [0], can be fixed in a way I proposed before [1], but I assume it is better resolved in the next thread related to the patch.


[0] https://www.postgresql.org/message-id/CAPpHfdvF864n%3DLzmjd2XBi9TwboZvrhRtLSt2hCP%2BJVUv6XKzg%40mail.gmail.com

[1] https://www.postgresql.org/message-id/985f2924-9769-4927-ad6e-d430c394054d%40postgrespro.ru

-- 
Regards,
Alena Rybakina
Postgres Professional

pgsql-hackers by date:

Previous
From: Alena Rybakina
Date:
Subject: Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Next
From: shveta malik
Date:
Subject: Re: Logical Replication of sequences