Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id 7d5aed92-d4cc-4b76-8ae0-051d182c9eec@postgrespro.ru
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: POC, WIP: OR-clause support for indexes
List pgsql-hackers

Hi, thank you for your work with this subject!

On 14.06.2024 15:00, Alexander Korotkov wrote:
On Mon, Apr 8, 2024 at 1:34 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I've revised the patch.  Did some beautification, improvements for
documentation, commit messages etc.

I've pushed the 0001 patch without 0002.  I think 0001 is good by
itself given that there is the or_to_any_transform_limit GUC option.
The more similar OR clauses are here the more likely grouping them
into SOAP will be a win.  But I've changed the default value to 5.
This will make it less invasive and affect only queries with obvious
repeating patterns.  That also reduced the changes in the regression
tests expected outputs.

Regarding 0002, it seems questionable since it could cause a planning
slowdown for SAOP's with large arrays.  Also, it might reduce the win
of transformation made by 0001.  So, I think we should skip it for
now.
The patch has been reverted from pg17.  Let me propose a new version
for pg18 based on the valuable feedback from Tom Lane [1][2].
 * The transformation is moved to the stage of adding restrictinfos to
the base relation (in particular add_base_clause_to_rel()).  This
leads to interesting consequences.  While this allows IndexScans to
use transformed clauses, BitmapScans and SeqScans seem unaffected.
Therefore, I wasn't able to find a planning regression. * As soon as there is no planning regression anymore, I've removed
or_to_any_transform_limit GUC, which was a source of critics. * Now, not only Consts allowed in the SAOP's list, but also Params. * The criticized hash based on expression jumbling has been removed.
Now, the plain list is used instead. * OrClauseGroup now gets a legal node tag.  That allows to mix it in
the list with other nodes without hacks.

I think this patch shouldn't be as good as before for optimizing
performance of large OR lists, given that BitmapScans and SeqScans
still deal with ORs.  However, it allows IndexScans to handle more,
doesn't seem to cause planning regression and therefore introduce no
extra GUC.  Overall, this seems like a good compromise.

This patch could use some polishing, but I'd like to first hear some
feedback on general design.

Links
1. https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us
2. https://www.postgresql.org/message-id/3649287.1712642139%40sss.pgh.pa.us

I noticed that 7 libraries have been added to src/backend/optimizer/plan/initsplan.c, and as far as I remember, Tom Lane has already expressed doubts about the approach that requires adding a large number of libraries [0], but I'm afraid I'm out of ideas about alternative approach. In addition, I checked the fix in the previous cases that you wrote earlier [1] and noticed that SeqScan continues to generate, unfortunately, without converting expressions:

with patch:

create table test as (select (random()*10)::int x, (random()*1000) y from generate_series(1,1000000) i); create index test_x_1_y on test (y) where x = 1; create index test_x_2_y on test (y) where x = 2; vacuum analyze test; SELECT 1000000 CREATE INDEX CREATE INDEX VACUUM alena@postgres=# explain select * from test where (x = 1 or x = 2) and y = 100; QUERY PLAN -------------------------------------------------------------------------- Gather (cost=1000.00..12690.10 rows=1 width=12) Workers Planned: 2 -> Parallel Seq Scan on test (cost=0.00..11690.00 rows=1 width=12) Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision)) (4 rows) alena@postgres=# set enable_seqscan =off; SET alena@postgres=# explain select * from test where (x = 1 or x = 2) and y = 100; QUERY PLAN ------------------------------------------------------------------------- Seq Scan on test (cost=10000000000.00..10000020440.00 rows=1 width=12) Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision)) (2 rows)

without patch:

-------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test (cost=8.60..12.62 rows=1 width=12) Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = '100'::double precision) AND (x = 2))) -> BitmapOr (cost=8.60..8.60 rows=1 width=0) -> Bitmap Index Scan on test_x_1_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y = '100'::double precision) -> Bitmap Index Scan on test_x_2_y (cost=0.00..4.30 rows=1 width=0) Index Cond: (y = '100'::double precision) (7 rows)

[0] https://www.postgresql.org/message-id/3604469.1712628736%40sss.pgh.pa.us

[1] https://www.postgresql.org/message-id/CAPpHfduJtO0s9E%3DSHUTzrCD88BH0eik0UNog1_q3XBF2wLmH6g%40mail.gmail.com

-- 
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: Alena Rybakina
Date:
Subject: Re: Vacuum statistics
Next
From: Bertrand Drouvot
Date:
Subject: Re: Avoid orphaned objects dependencies, take 3