Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id CAH2-WznHNTXQ=bX4Kc9hHebyXvR0a0CxH9OnirkP0ahgQAX7Aw@mail.gmail.com
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (Alena Rybakina <lena.ribackina@yandex.ru>)
Responses Re: POC, WIP: OR-clause support for indexes
List pgsql-hackers
On Wed, Jul 26, 2023 at 6:30 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
> > I didn't intend to imply that you might have the same problem here. I
> > just meant that OR optimizations can have problems with duplicate
> > elimination, in general. I suspect that your patch doesn't have that
> > problem, because you are performing a transformation of one kind of OR
> > into another kind of OR.
>
> Yes, you are right, but I studied this topic and two other sources to
> accumulate my knowledge. It was an exciting experience for me)

Cool! Yeah, a lot of the value with these sorts of things comes from
the way that they can interact with each other. This is hard to
describe exactly, but still important.

> I was especially inspired after studying the interview with Goetz Graf
> [2], his life experience is the most inspiring, and from this article I
> was able to get a broad understanding of the field of databases:
> current problems, future development, how it works... Thank you for the
> recommendation.

I also think that his perspective is very interesting.

> I think it really helps to speed up the search with similar deep
> filtering compared to cluster indexes, but do we have cases where we
> don't use this algorithm because it takes longer than an usual index?
> I thought about the situation with wide indexes (with a lot of multiple
> columns) and having a lot of filtering predicates for them.

I think that it should be possible for the optimizer to only use
multi-column SAOP index paths when there is at least likely to be some
small advantage -- that's definitely my goal. Importantly, we may not
really need to accurately model the costs where the new approach turns
out to be much faster. The only essential thing is that we avoid cases
where the new approach is much slower than the old approach. Which is
possible (in at least some cases) by making the runtime behavior
adaptive.

The best decision that the planner can make may be no decision at all.
Better to wait until runtime where at all possible, since that gives
us the latest and most accurate picture of things.

> But I'm not sure about this, so it seems to me that this is a problem of
> improper use of indexes rather.

It's hard to say how true that is.

Certainly, workloads similar to the TPC-DS benchmark kinda need
something like MDAM. It's just not practical to have enough indexes to
support every possible query -- the benchmark is deliberately designed
to have unpredictable, hard-to-optimize access paths. It seems to
require having fewer, more general indexes that can support
multi-dimensional access reasonably efficiently.

Of course, with OLTP it's much more likely that the workload will have
predictable access patterns. That makes having exactly the right
indexes much more practical. So maybe you're right there. But, I still
see a lot of value in a design that is as forgiving as possible. Users
really like that kind of thing in my experience.

> > Currently, the optimizer doesn't recognize multi-column indexes with
> > SAOPs on every column as having a valid sort order, except on the
> > first column. It seems possible that that has consequences for your
> > patch. (I'm really only guessing, though; don't trust anything that I
> > say about the optimizer too much.)
> >
> Honestly, I couldn't understand your concerns very well, could you
> describe it in more detail?

Well, I'm not sure if there is any possible scenario where the
transformation from your patch makes it possible to go from an access
path that has a valid sort order (usually because there is an
underlying index scan) into an access path that doesn't. In fact, the
opposite situation seems more likely (which is good news) --
especially if you assume that my own patch is also present.

Going from a bitmap OR (which cannot return sorted output) to a
multi-column SAOP index scan (which now can) may have significant
value in some specific circumstances. Most obviously, it's really
useful when it enables us to feed tuples into a GroupAggregate without
a separate sort step, and without a hash aggregate (that's why I see
value in combining your patch with my own patch). You just need to be
careful about allowing the opposite situation to take place.

More generally, there is a need to think about strange second order
effects. We want to be open to useful second order effects that make
query execution much faster in some specific context, while avoiding
harmful second order effects. Intuitively, I think that it should be
possible to do this with the transformations performed by your patch.

In other words, "helpful serendipity" is an important advantage, while
"harmful anti-serendipity" is what we really want to avoid. Ideally by
making the harmful cases impossible "by construction".

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Faster "SET search_path"
Next
From: Soumyadeep Chakraborty
Date:
Subject: Re: brininsert optimization opportunity