Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id CAPpHfdtGmUHv=E4QWGWHk2QUuJ-x0mTNBssW_Ehv2dcOQ2L66g@mail.gmail.com
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Responses Re: POC, WIP: OR-clause support for indexes
Re: POC, WIP: OR-clause support for indexes
List pgsql-hackers
Hi!

On Mon, Apr 1, 2024 at 9:38 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 28/3/2024 16:54, Alexander Korotkov wrote:
> > The current patch has a boolean guc enable_or_transformation.
> > However, when we have just a few ORs to be transformated, then we
> > should get less performance gain from the transformation and higher
> > chances to lose a good bitmap scan plan from that.  When there is a
> > huge list of ORs to be transformed, then the performance gain is
> > greater and it is less likely we could lose a good bitmap scan plan.
> >
> > What do you think about introducing a GUC threshold value: the minimum
> > size of list to do OR-to-ANY transformation?
> > min_list_or_transformation or something.
> I labelled it or_transformation_limit (see in attachment). Feel free to
> rename it.
> It's important to note that the limiting GUC doesn't operate
> symmetrically for forward, OR -> SAOP, and backward SAOP -> OR
> operations. In the forward case, it functions as you've proposed.
> However, in the backward case, we only check whether the feature is
> enabled or not. This is due to our existing limitation,
> MAX_SAOP_ARRAY_SIZE, and the fact that we can't match the length of the
> original OR list with the sizes of the resulting SAOPs. For instance, a
> lengthy OR list with 100 elements can be transformed into 3 SAOPs, each
> with a size of around 30 elements.
> One aspect that requires attention is the potential inefficiency of our
> OR -> ANY transformation when we have a number of elements less than
> MAX_SAOP_ARRAY_SIZE. This is because we perform a reverse transformation
> ANY -> OR at the stage of generating bitmap scans. If the BitmapScan
> path dominates, we may have done unnecessary work. Is this an occurrence
> that we should address?
> But the concern above may just be a point of improvement later: We can
> add one more strategy to the optimizer: testing each array element as an
> OR clause; we can also provide a BitmapOr path, where SAOP is covered
> with a minimal number of partial indexes (likewise, previous version).

I've revised the patch.  Did some beautification, improvements for
documentation, commit messages etc.

I've pushed the 0001 patch without 0002.  I think 0001 is good by
itself given that there is the or_to_any_transform_limit GUC option.
The more similar OR clauses are here the more likely grouping them
into SOAP will be a win.  But I've changed the default value to 5.
This will make it less invasive and affect only queries with obvious
repeating patterns.  That also reduced the changes in the regression
tests expected outputs.

Regarding 0002, it seems questionable since it could cause a planning
slowdown for SAOP's with large arrays.  Also, it might reduce the win
of transformation made by 0001.  So, I think we should skip it for
now.

------
Regards,
Alexander Korotkov



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Cluster::restart dumping logs when stop fails
Next
From: Tom Lane
Date:
Subject: Re: Add bump memory context type and use it for tuplesorts