Home > mailing lists

Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From	Alena Rybakina
Subject	Re: POC, WIP: OR-clause support for indexes
Date	July 31, 2023 16:38:23
Msg-id	8d714510-af73-a908-99c8-fc14536f2669@yandex.ru Whole thread Raw
In response to	Re: POC, WIP: OR-clause support for indexes (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: POC, WIP: OR-clause support for indexes
List	pgsql-hackers

Tree view

Hi!

>> I think it really helps to speed up the search with similar deep
>> filtering compared to cluster indexes, but do we have cases where we
>> don't use this algorithm because it takes longer than an usual index?
>> I thought about the situation with wide indexes (with a lot of multiple
>> columns) and having a lot of filtering predicates for them.
> I think that it should be possible for the optimizer to only use
> multi-column SAOP index paths when there is at least likely to be some
> small advantage -- that's definitely my goal. Importantly, we may not
> really need to accurately model the costs where the new approach turns
> out to be much faster. The only essential thing is that we avoid cases
> where the new approach is much slower than the old approach. Which is
> possible (in at least some cases) by making the runtime behavior
> adaptive.
>
> The best decision that the planner can make may be no decision at all.
> Better to wait until runtime where at all possible, since that gives
> us the latest and most accurate picture of things.
>
>> But I'm not sure about this, so it seems to me that this is a problem of
>> improper use of indexes rather.
> It's hard to say how true that is.
>
> Certainly, workloads similar to the TPC-DS benchmark kinda need
> something like MDAM. It's just not practical to have enough indexes to
> support every possible query -- the benchmark is deliberately designed
> to have unpredictable, hard-to-optimize access paths. It seems to
> require having fewer, more general indexes that can support
> multi-dimensional access reasonably efficiently.
>
> Of course, with OLTP it's much more likely that the workload will have
> predictable access patterns. That makes having exactly the right
> indexes much more practical. So maybe you're right there. But, I still
> see a lot of value in a design that is as forgiving as possible. Users
> really like that kind of thing in my experience.
I tend to agree with you, but a runtime estimate cannot give us an 
accurate picture when using indexes correctly or
any other optimizations due to the unstable state of the environment in 
which the query is executed.
I believe that a more complex analysis is needed here.
>>> Currently, the optimizer doesn't recognize multi-column indexes with
>>> SAOPs on every column as having a valid sort order, except on the
>>> first column. It seems possible that that has consequences for your
>>> patch. (I'm really only guessing, though; don't trust anything that I
>>> say about the optimizer too much.)
>>>
>> Honestly, I couldn't understand your concerns very well, could you
>> describe it in more detail?
> Well, I'm not sure if there is any possible scenario where the
> transformation from your patch makes it possible to go from an access
> path that has a valid sort order (usually because there is an
> underlying index scan) into an access path that doesn't. In fact, the
> opposite situation seems more likely (which is good news) --
> especially if you assume that my own patch is also present.
>
> Going from a bitmap OR (which cannot return sorted output) to a
> multi-column SAOP index scan (which now can) may have significant
> value in some specific circumstances. Most obviously, it's really
> useful when it enables us to feed tuples into a GroupAggregate without
> a separate sort step, and without a hash aggregate (that's why I see
> value in combining your patch with my own patch). You just need to be
> careful about allowing the opposite situation to take place.
>
> More generally, there is a need to think about strange second order
> effects. We want to be open to useful second order effects that make
> query execution much faster in some specific context, while avoiding
> harmful second order effects. Intuitively, I think that it should be
> possible to do this with the transformations performed by your patch.
>
> In other words, "helpful serendipity" is an important advantage, while
> "harmful anti-serendipity" is what we really want to avoid. Ideally by
> making the harmful cases impossible "by construction".
>
I noticed only one thing there: when we have unsorted array values in 
SOAP, the query takes longer than
when it has a sorted array. I'll double-check it just in case and write 
about the results later.

I am also testing some experience with multi-column indexes using SAOPs.

-- 
Regards,
Alena Rybakina
Postgres Professional

pgsql-hackers by date:

From: Peter Geoghegan
Date: 31 July 2023, 16:34:47
Subject: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From: Robert Haas
Date: 31 July 2023, 16:53:37
Subject: Re: pgsql: Fix search_path to a safe value during maintenance operations.

Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

Previous

Next