Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?
Date
Msg-id CA+TgmoZreVw+JomzAOz7h0XdPDJZLS7As648f9MjM3wMm8OCUw@mail.gmail.com
Whole thread Raw
In response to Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Condition pushdown: why (=) is pushed down into join, but BETWEEN or >= is not?  (Andy Fan <zhihui.fan1213@gmail.com>)
List pgsql-hackers
On Tue, Mar 1, 2022 at 5:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > This topic has been discussed a number of times, and Tom has basically
> > always said that he thinks this would be expensive to plan (which I
> > think is true) and that we wouldn't get much benefit (which I think is
> > false).
>
> I think the trick here, as in so many other places, is to not impose
> significant extra planning cost on queries that don't end up getting
> any benefit.

I agree. My question is: why shouldn't every case where we can deduce
an implied inequality be reasonably likely to show a benefit? If the
query specifies that a.x = b.x and also that a.x < 42, the only reason
to suppose that evaluating a.x < 42 rather than b.x < 42 or in
addition to b.x < 42 is likely to be better is if we assume the user
knows how the query optimizer works and has employed that knowledge in
crafting the query. And admittedly, sophisticated users are probably
likely to do that, and even unsophisticated users may do it more
likely than chance would dictate. But it still feels like we have a
good chance of landing of coming out ahead pretty often unless the
user really knows what they are doing. And even then, any mechanism we
add here can have an off switch.

> I'm not in favor of complicating the EquivalenceClass
> mechanism for this, because (a) I don't think that such an approach
> will lead to success on that metric, and (b) what it definitely will do
> is make ECs harder to understand and reason about.  If we develop a
> separate mechanism that can infer things from inequalities, and it only
> kicks in when there are some inequalities, that might work out okay.
> But because of that, I don't even like the 0001 patch in this series.
> I've not looked at the subsequent ones.

I don't think 0001 is right either, although maybe for somewhat
different reasons. First, I think it only considers VAR OP CONST style
clauses, but that is leaving money on the table, because given a.x =
b.x AND mumble(a.x), we can decide to instead test mumble(b.x) if the
equality operator in question has is-binary-identical semantics. It
does not seem necessary for a first patch to deal with both that and
the somewhat more pleasing case where we're making deductions based on
operator families ... but we shouldn't commit to a design for the VAR
OP CONST case without understanding how it could be generalized.
Second, it looks to me like the patch takes the rather naive strategy
of enforcing the derived clauses everywhere that they can legally be
put, which seems certain not to be optimal.

I don't know whether attaching something to the equivalence class data
structure is the right idea or not. Presumably, we don't want to make
an extra pass over the query tree to gather the information needed for
this kind of optimization, and it feels like we need to know which
vars are EMs before we try to derive alternate/additional quals. So I
guess we'd want to study clauses for possible use by this kind of
mechanism after we've derived ECs but before we do any costing stuff,
yet without introducing a whole new pass. Once we do derive that
information, where are we going to put it? We have to be able to tell
efficiently when looking at a baserel whether there are any implied
inequalities that we should be thinking about ... and there's nothing
obvious tying all of the relevant places together other than the EM.
But I'm kind of blathering here: I feel like there are a lot of
complexities I haven't thought hard enough about to have an
intelligent opinion.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Next
From: "osumi.takamichi@fujitsu.com"
Date:
Subject: RE: Failed transaction statistics to measure the logical replication progress