Re: Fast path for empty relids in check_outerjoin_delay() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Fast path for empty relids in check_outerjoin_delay()
Date
Msg-id 24942.1547408080@sss.pgh.pa.us
Whole thread Raw
In response to Re: Fast path for empty relids in check_outerjoin_delay()  (Richard Guo <riguo@pivotal.io>)
Responses Re: Fast path for empty relids in check_outerjoin_delay()  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Richard Guo <riguo@pivotal.io> writes:
> On Tue, Jan 8, 2019 at 11:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The bigger picture here, of course, is that check_outerjoin_delay's
>> API is not at all well matched to what check_equivalence_delay needs:
>> it has to make the extra bms_copy shown above, and it has no use
>> for either of the relid sets that check_outerjoin_delay wants to
>> return.  I believe it's like that because check_outerjoin_delay is
>> much older than the EquivalenceClass logic, and I didn't want to
>> redesign it when I put in ECs, and I also didn't want to just
>> duplicate all that logic.  But maybe it's time to put some more thought
>> into that, and try to find a refactoring of check_outerjoin_delay that
>> suits both callers better.

> I am thinking about whether we can quickly tell a qual is
> outerjoin_delayed or not by some bms_overlap operations.

> A qual is regarded as being outerjoin_delayed if:

> 1) It references any nullable rels of some OJ below this qual, and
> 2) We haven't included all the rels of that OJ in relids.

> For 1), we can collect all the nullable rels of OJs below this qual in
> deconstruct_recurse() (like nullable_baserels, initsplan.c:976) and
> bms_overlap this qual's relids with the collected nullable_relids.

Yeah, although I'm not sure that there's a good way to produce that
info in check_equivalence_delay?

> For 2), I haven't figured out a good way.

I'm not sure that there's any realistic way to do better given just
the existing SpecialJoinInfo data structures.  In particular, the
fact that we consider the join_info_list to be unordered is a
problem for this code --- it means we have to make multiple passes
to ensure we've accounted for everything.

I deliberately made the SpecialJoinInfo data structure as lean as
possible when I first designed it, reasoning that that would minimize
hazards from incorrect data structure contents that might force
rewriting lots of code.  But now that it's been stable a long while,
maybe it's time to try to be smarter.  For example, maybe it'd be sane
to convert the list into a tree and include fields that represent the
effects of not only each outer join but all its children.  Then you
could (in principle) do what check_outerjoin_delay needs to do just
by looking at the current tree element.  I don't have any concrete
thoughts here, though, it's just handwaving.  One issue that would
need careful consideration is how to not break outer-join reordering
rules; probably, you'd need to be sure that whatever derived data
you add is defined in a way whereby it doesn't change in the face
of legal join reorderings.

            regards, tom lane


pgsql-hackers by date:

Previous
From: James Coleman
Date:
Subject: Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's
Next
From: Tom Lane
Date:
Subject: Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's