Re: NOT IN subquery optimization - Mailing list pgsql-hackers

From Tom Lane
Subject Re: NOT IN subquery optimization
Date
Msg-id 18203.1551543939@sss.pgh.pa.us
Whole thread Raw
In response to Re: NOT IN subquery optimization  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: NOT IN subquery optimization
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> On Sat, 2 Mar 2019 at 13:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah --- that has a nontrivial risk of making things significantly worse,
>> which makes it a hard sell.  I think the most reasonable bet here is
>> simply to not perform the transformation if we can't prove the inner side
>> NOT NULL.  That's going to catch most of the useful cases anyway IMO.

> Did you mean outer side NOT NULL?

Sorry, sloppy thinking.

>   Of course, the inner side needs to not produce NULLs either, but
> that's due to the fact that if a NULL exists in the inner side then
> the anti-join should not produce any records.

Right.  So the path of least resistance is to transform to antijoin
only if we can prove both of those things (and maybe we need to check
that the join operator is strict, too?  -ENOCAFFEINE).  The question
before us is what is the cost-benefit ratio of trying to cope with
additional cases.  I'm skeptical that it's attractive: the cost
certainly seems high, and I don't know that there are very many
real-world cases where we'd get a win.

Hmm ... thinking about the strictness angle some more: what we really
need to optimize NOT IN, IIUC, is an assumption that the join operator
will never return NULL.  While not having NULL inputs is certainly a
*necessary* condition for that (assuming a strict operator) it's not a
*sufficient* condition.  Any Postgres function/operator is capable
of returning NULL whenever it feels like it.  So checking strictness
does not lead to a mathematically correct optimization.

My initial thought about plugging that admittedly-academic point is
to insist that the join operator be both strict and a member of a
btree opclass (hash might be OK too; less sure about other index types).
The system already contains assumptions that btree comparators never
return NULL.  I doubt that this costs us any real-world applicability,
because if the join operator can neither merge nor hash, we're screwed
anyway for finding a join plan that's better than nested-loop.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Online verification of checksums
Next
From: Stephen Frost
Date:
Subject: Re: GSoC 2019