Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | CAH2-WznHNTXQ=bX4Kc9hHebyXvR0a0CxH9OnirkP0ahgQAX7Aw@mail.gmail.com Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Alena Rybakina <lena.ribackina@yandex.ru>) |
Responses |
Re: POC, WIP: OR-clause support for indexes
|
List | pgsql-hackers |
On Wed, Jul 26, 2023 at 6:30 PM Alena Rybakina <lena.ribackina@yandex.ru> wrote: > > I didn't intend to imply that you might have the same problem here. I > > just meant that OR optimizations can have problems with duplicate > > elimination, in general. I suspect that your patch doesn't have that > > problem, because you are performing a transformation of one kind of OR > > into another kind of OR. > > Yes, you are right, but I studied this topic and two other sources to > accumulate my knowledge. It was an exciting experience for me) Cool! Yeah, a lot of the value with these sorts of things comes from the way that they can interact with each other. This is hard to describe exactly, but still important. > I was especially inspired after studying the interview with Goetz Graf > [2], his life experience is the most inspiring, and from this article I > was able to get a broad understanding of the field of databases: > current problems, future development, how it works... Thank you for the > recommendation. I also think that his perspective is very interesting. > I think it really helps to speed up the search with similar deep > filtering compared to cluster indexes, but do we have cases where we > don't use this algorithm because it takes longer than an usual index? > I thought about the situation with wide indexes (with a lot of multiple > columns) and having a lot of filtering predicates for them. I think that it should be possible for the optimizer to only use multi-column SAOP index paths when there is at least likely to be some small advantage -- that's definitely my goal. Importantly, we may not really need to accurately model the costs where the new approach turns out to be much faster. The only essential thing is that we avoid cases where the new approach is much slower than the old approach. Which is possible (in at least some cases) by making the runtime behavior adaptive. The best decision that the planner can make may be no decision at all. Better to wait until runtime where at all possible, since that gives us the latest and most accurate picture of things. > But I'm not sure about this, so it seems to me that this is a problem of > improper use of indexes rather. It's hard to say how true that is. Certainly, workloads similar to the TPC-DS benchmark kinda need something like MDAM. It's just not practical to have enough indexes to support every possible query -- the benchmark is deliberately designed to have unpredictable, hard-to-optimize access paths. It seems to require having fewer, more general indexes that can support multi-dimensional access reasonably efficiently. Of course, with OLTP it's much more likely that the workload will have predictable access patterns. That makes having exactly the right indexes much more practical. So maybe you're right there. But, I still see a lot of value in a design that is as forgiving as possible. Users really like that kind of thing in my experience. > > Currently, the optimizer doesn't recognize multi-column indexes with > > SAOPs on every column as having a valid sort order, except on the > > first column. It seems possible that that has consequences for your > > patch. (I'm really only guessing, though; don't trust anything that I > > say about the optimizer too much.) > > > Honestly, I couldn't understand your concerns very well, could you > describe it in more detail? Well, I'm not sure if there is any possible scenario where the transformation from your patch makes it possible to go from an access path that has a valid sort order (usually because there is an underlying index scan) into an access path that doesn't. In fact, the opposite situation seems more likely (which is good news) -- especially if you assume that my own patch is also present. Going from a bitmap OR (which cannot return sorted output) to a multi-column SAOP index scan (which now can) may have significant value in some specific circumstances. Most obviously, it's really useful when it enables us to feed tuples into a GroupAggregate without a separate sort step, and without a hash aggregate (that's why I see value in combining your patch with my own patch). You just need to be careful about allowing the opposite situation to take place. More generally, there is a need to think about strange second order effects. We want to be open to useful second order effects that make query execution much faster in some specific context, while avoiding harmful second order effects. Intuitively, I think that it should be possible to do this with the transformations performed by your patch. In other words, "helpful serendipity" is an important advantage, while "harmful anti-serendipity" is what we really want to avoid. Ideally by making the harmful cases impossible "by construction". -- Peter Geoghegan
pgsql-hackers by date: