Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: POC, WIP: OR-clause support for indexes
Date
Msg-id CAH2-WzkzzDK7FPEv7M5GL_jo0E81DDG41t7wHU9SOAwnqwX=eQ@mail.gmail.com
Whole thread Raw
In response to Re: POC, WIP: OR-clause support for indexes  (Alena Rybakina <lena.ribackina@yandex.ru>)
Responses Re: POC, WIP: OR-clause support for indexes
List pgsql-hackers
On Tue, Jun 27, 2023 at 6:19 AM Alena Rybakina <lena.ribackina@yandex.ru> wrote:
> I learned something new from your letter, thank you very much for that!

Cool. The MDAM paper is also worth a read:

https://vldb.org/conf/1995/P710.PDF

Some of the techniques it describes are already in Postgres. With
varying degrees of maturity.

The paper actually mentions OR optimization at one point, under
"Duplicate Elimination". The general idea is that ScalarArrayOpExpr
execution can "eliminate duplicates before the data is read". The
important underlying principle is that it can be really useful to give
the B-Tree code the context it requires to be clever about stuff like
that. We can do this by (say) using one ScalarArrayOpExpr, rather than
using two or more index scans that the B-Tree code will treat as
independent things. So a lot of the value in your patch comes from the
way that it can enable other optimizations (the immediate benefits are
 also nice).

In the past, OR optimizations have been prototyped that were later
withdrawn/rejected because the duplicate elimination aspect was...too
scary [1]. It's very easy to see that ScalarArrayOpExpr index scans
don't really have the same problem. "Giving the B-Tree code the
required context" helps here too.

> I analyzed the buffer consumption when I ran control regression tests using my patch. diff shows me that there is no
differencebetween the number of buffer block scans without and using my patch, as far as I have seen.
(regression.diffs)

To be clear, I wasn't expecting that there'd be any regressions from
your patch. Intuitively, it seems like this optimization should make
the query plan do almost the same thing at execution time -- just
slightly more efficiently on average, and much more efficiently in
some individual cases.

It would probably be very hard for the optimizer to model/predict how
much work it can save by using a ScalarArrayOpExpr instead of an
"equivalent" set of bitmap index scans, OR'd together. But it doesn't
necessarily matter -- the only truly critical detail is understanding
the worst case for the transformation optimization. It cannot be too
bad (maybe it's ~zero added runtime overhead relative to not doing the
transformation, even?). At the same time, nbtree can be clever about
ScalarArrayOpExpr execution at runtime (once that's implemented),
without ever needing to make any kind of up-front commitment to
navigating through the index in any particular way. It's all dynamic,
and can be driven by the actual observed characteristics of the index
structure.

In other words, we don't really need to gamble (in the planner, or at
execution time). We're just keeping our options open in more cases.
(My thinking on these topics was influenced by Goetz Graefe -- "choice
is confusion" [2]).

[1] https://www.postgresql.org/message-id/flat/1397.1486598083%40sss.pgh.pa.us#310f974a8dc84478d6d3c70f336807bb
[2] https://sigmodrecord.org/publications/sigmodRecord/2009/pdfs/05_Profiles_Graefe.pdf
--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: POC, WIP: OR-clause support for indexes
Next
From: "Joel Jacobson"
Date:
Subject: Re: Do we want a hashset type?