Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Alena Rybakina |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | 8d714510-af73-a908-99c8-fc14536f2669@yandex.ru Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: POC, WIP: OR-clause support for indexes
|
List | pgsql-hackers |
Hi! >> I think it really helps to speed up the search with similar deep >> filtering compared to cluster indexes, but do we have cases where we >> don't use this algorithm because it takes longer than an usual index? >> I thought about the situation with wide indexes (with a lot of multiple >> columns) and having a lot of filtering predicates for them. > I think that it should be possible for the optimizer to only use > multi-column SAOP index paths when there is at least likely to be some > small advantage -- that's definitely my goal. Importantly, we may not > really need to accurately model the costs where the new approach turns > out to be much faster. The only essential thing is that we avoid cases > where the new approach is much slower than the old approach. Which is > possible (in at least some cases) by making the runtime behavior > adaptive. > > The best decision that the planner can make may be no decision at all. > Better to wait until runtime where at all possible, since that gives > us the latest and most accurate picture of things. > >> But I'm not sure about this, so it seems to me that this is a problem of >> improper use of indexes rather. > It's hard to say how true that is. > > Certainly, workloads similar to the TPC-DS benchmark kinda need > something like MDAM. It's just not practical to have enough indexes to > support every possible query -- the benchmark is deliberately designed > to have unpredictable, hard-to-optimize access paths. It seems to > require having fewer, more general indexes that can support > multi-dimensional access reasonably efficiently. > > Of course, with OLTP it's much more likely that the workload will have > predictable access patterns. That makes having exactly the right > indexes much more practical. So maybe you're right there. But, I still > see a lot of value in a design that is as forgiving as possible. Users > really like that kind of thing in my experience. I tend to agree with you, but a runtime estimate cannot give us an accurate picture when using indexes correctly or any other optimizations due to the unstable state of the environment in which the query is executed. I believe that a more complex analysis is needed here. >>> Currently, the optimizer doesn't recognize multi-column indexes with >>> SAOPs on every column as having a valid sort order, except on the >>> first column. It seems possible that that has consequences for your >>> patch. (I'm really only guessing, though; don't trust anything that I >>> say about the optimizer too much.) >>> >> Honestly, I couldn't understand your concerns very well, could you >> describe it in more detail? > Well, I'm not sure if there is any possible scenario where the > transformation from your patch makes it possible to go from an access > path that has a valid sort order (usually because there is an > underlying index scan) into an access path that doesn't. In fact, the > opposite situation seems more likely (which is good news) -- > especially if you assume that my own patch is also present. > > Going from a bitmap OR (which cannot return sorted output) to a > multi-column SAOP index scan (which now can) may have significant > value in some specific circumstances. Most obviously, it's really > useful when it enables us to feed tuples into a GroupAggregate without > a separate sort step, and without a hash aggregate (that's why I see > value in combining your patch with my own patch). You just need to be > careful about allowing the opposite situation to take place. > > More generally, there is a need to think about strange second order > effects. We want to be open to useful second order effects that make > query execution much faster in some specific context, while avoiding > harmful second order effects. Intuitively, I think that it should be > possible to do this with the transformations performed by your patch. > > In other words, "helpful serendipity" is an important advantage, while > "harmful anti-serendipity" is what we really want to avoid. Ideally by > making the harmful cases impossible "by construction". > I noticed only one thing there: when we have unsorted array values in SOAP, the query takes longer than when it has a sorted array. I'll double-check it just in case and write about the results later. I am also testing some experience with multi-column indexes using SAOPs. -- Regards, Alena Rybakina Postgres Professional
pgsql-hackers by date: