Re: EquivalenceClasses and subqueries and PlaceHolderVars, oh my - Mailing list pgsql-hackers

From Tom Lane
Subject Re: EquivalenceClasses and subqueries and PlaceHolderVars, oh my
Date
Msg-id 23034.1331834451@sss.pgh.pa.us
Whole thread Raw
In response to Re: EquivalenceClasses and subqueries and PlaceHolderVars, oh my  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: EquivalenceClasses and subqueries and PlaceHolderVars, oh my
List pgsql-hackers
I wrote:
> Yeb Havinga <yebhavinga@gmail.com> writes:
>> I'm having a hard time imagining that add_child_rel_equivalences is not 
>> just plain wrong. Even though it will only add child equivalence members 
>> to a parent eq class when certain conditions are met, isn't it the case 
>> that since a union (all) is addition of tuples and not joining, any kind 
>> of propagating restrictions on a append rel child member to other areas 
>> of the plan can cause unwanted results, like the ones currently seen?

> None of the known problems are the fault of that, really.  The child
> entries don't cause merging of ECs, which would be the only way that
> they'd affect the semantics of the query at large.  So in that sense
> they are not really members of the EC but just some auxiliary entries
> that ease figuring out whether a child expression matches an EC.

After further thought about that, I've concluded that indeed my patch
57664ed25e5dea117158a2e663c29e60b3546e1c was just plain wrong, and
Teodor was more nearly on the right track than I was in the original
discussion.  If child EC members aren't full-fledged members then
there's no a-priori reason why they need to be distinct from each other.
There are only a few functions that actually match anything to child
members (although there are some others that could use Asserts or tests
to make it clearer that they aren't paying attention to child members).
AFAICT, if we don't try to enforce uniqueness of child members, the only
consequences will be:

(1) It'll be order-dependent which EquivalenceClass a child index column
is thought to match.  As I explained earlier, this is not really the
fault of this representational detail, but is a basic shortcoming of the
whole current concept of ECs.  Taking the first match is fine for now.

(2) It'll be unclear which of several identical subplan output columns
should be sorted by in prepare_sort_from_pathkeys.  Now ordinarily that
does not particularly matter --- if you have multiple identical
nonvolatile expressions, you can take any one (and we already have a
hack in there for the volatile case).  I think it *only* matters for
MergeAppend, where we need to be sure that the sort column locations
match across all the children.  However, we can fix that in some
localized way instead of screwing up ECs generally.  The idea I have in
mind at the moment, since create_merge_append_plan already starts by
determining the sort column locations for the MergeAppend itself, is to
pass down that info to the calls for the child plans and insist that we
match to the same column locations we found for the parent MergeAppend.

So I now propose reverting the earlier two patches (but not their
regression test cases of course) and instead hacking MergeAppend plan
building as per (2).
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tareq Aljabban
Date:
Subject: Storage Manager crash at mdwrite()
Next
From: Dimitri Fontaine
Date:
Subject: Command Triggers, v16