Home > mailing lists

Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: POC, WIP: OR-clause support for indexes
Date	November 28, 2023 01:15:48
Msg-id	CAH2-WznKiUwPtzLOVa7SQVSfurttZ+ULSRqhyseG_XbkYeyiag@mail.gmail.com Whole thread Raw
In response to	Re: POC, WIP: OR-clause support for indexes (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: POC, WIP: OR-clause support for indexes Re: POC, WIP: OR-clause support for indexes
List	pgsql-hackers

Tree view

On Mon, Nov 27, 2023 at 1:04 PM Robert Haas <robertmhaas@gmail.com> wrote:
> The use of op_mergejoinable() seems pretty random to me. Why should we
> care about that? If somebody writes a<1 or a<2 or a<3 or a<4, you can
> transform that to a<any(array[1,2,3,4]) if you want. It might not be a
> good idea, but I think it's a legal transformation.

That kind of transformation is likely to be a very good idea, because
nbtree's _bt_preprocess_array_keys() function knows how to perform
preprocessing that makes the final index qual "a < 1". Obviously that
could be far more efficient.

Further suppose you have a machine generated query  "a<1 or a<2 or a<3
or a<4 AND a = 2" -- same as before, except that I added "AND a = 2"
to the end. Now _bt_preprocess_array_keys() will be able to do the
aforementioned inequality preprocessing, just as before. But this time
_bt_preprocess_keys() (a different function with a similar name) can
see that the quals are contradictory. That makes the entire index scan
end, before it ever really began.

Obviously, this is an example of a more general principle: a great
deal of the benefit of these transformations is indirect, in the sense
that they come from enabling further transformations/optimizations,
that apply in the context of some particular query. So you have to
think holistically.

It's perhaps a little unfortunate that all of this nbtree
preprocessing stuff is totally opaque to the planner. Tom has
expressed concerns about that in the past, FWIW:

https://www.postgresql.org/message-id/2587523.1647982549@sss.pgh.pa.us
(see the final paragraph for the reference)

There might be some bigger picture to doing all of these
transformations, in a way that maximizes opportunities to apply
further transformations/optimizations. You know much more about the
optimizer than I do, so maybe this is already very obvious. Just
pointing it out.

> Honestly, it seems very hard to avoid the conclusion that this
> transformation is being done at too early a stage. Parse analysis is
> not the time to try to do query optimization. I can't really believe
> that there's a way to produce a committable patch along these lines.
> Ideally, a transformation like this should be done after we know what
> plan shape we're using (or considering using), so that we can make
> cost-based decisions about whether to transform or not. But at the
> very least it should happen somewhere in the planner. There's really
> no justification for parse analysis rewriting the SQL that the user
> entered.

I am sure that there is a great deal of truth to this. The general
conclusion about parse analysis being the wrong place for this seems
very hard to argue with. But I'm much less sure that there needs to be
a conventional cost model.

The planner's cost model is supposed to have some basis in physical
runtime costs, which is not the case for any of these transformations.
Not in any general sense; they're just transformations that enable
finding a cheaper way to execute the query. While they have to pay for
themselves, in some sense, I think that that's purely a matter of
managing the added planner cycles. In principle they shouldn't have
any direct impact on the physical costs incurred by physical
operators. No?

As I keep pointing out, there is a sound theoretical basis to the idea
of normalizing to conjunctive normal form as its own standard step in
query processing. To some extent we do this already, but it's all
rather ad-hoc. Even if (say) the nbtree preprocessing transformations
that I described were something that the planner already knew about
directly, they still wouldn't really need to be costed. They're pretty
much strictly better at runtime (at most you only have to worry about
the fixed cost of determining if they apply at all).

--
Peter Geoghegan

pgsql-hackers by date:

From: Peter Smith
Date: 28 November 2023, 01:06:07
Subject: Re: logical decoding and replication of sequences, take 2

From: Nathan Bossart
Date: 28 November 2023, 01:16:25
Subject: Re: common signal handler protection

Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

Previous

Next