Re: Problem with CVS HEAD's handling of mergejoins - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Problem with CVS HEAD's handling of mergejoins
Date
Msg-id 11594.1199906002@sss.pgh.pa.us
Whole thread Raw
In response to Problem with CVS HEAD's handling of mergejoins  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> A perhaps less invasive idea is to discard any proposed mergeclauses
> that are redundant in this sense.  This would still require some
> reshuffling of responsibility between select_mergejoin_clauses and
> the code in pathkeys.c, since right now select_mergejoin_clauses
> takes no account of that.  However, I'm worried that that might result
> in planner failure on some FULL JOIN cases that work today, since we
> require all the join clauses to be mergejoinable for a FULL JOIN.
> People seem to complain when the planner fails, even for really
> stupid queries ;-).  I think this would only be acceptable if we can
> prove that ignoring clauses that are redundant in this sense doesn't
> change the result --- which might be the case, but I'm not sure.

On further study, this doesn't seem nearly as bad as it looked.  I had
hastily misread some of the existing code as assuming one-for-one
correspondence of mergeclauses and pathkeys, but actually it just
assumes that the pathkeys list contains at least one pathkey matching
each mergeclause.  Redundancy between two pathkeys in a list is
eliminated by allowing adjacent mergeclauses to share the same pathkey.
So that part actually all works, and the only problem is when an
equivalence class is redundant-for-sorting in itself --- that is,
when one of the eclass members is a constant.  So that explains why
we'd not seen it before; except in very odd corner cases, the old code
would have eliminated the mergejoinable clause anyway.

If we simply make select_mergejoin_clauses() reject a clause as
not-mergejoinable if either side is equated to a constant, then
the problems go away.  We'd have a new problem if this meant that
we fail to do a FULL JOIN, but AFAICS that can't happen: a variable
coming from a full join can't be equivalenced to a constant, except
perhaps in a below_outer_join eclass, which isn't considered redundant
for sorting anyway.

Bottom line is that it just takes another half dozen lines of code
to fix this; attached is the core of the fix (there are some
uninteresting prototype changes and macro-moving as well).

> I think I can fix this in a day or so, but I now definitely feel that
> we'll need an RC2 :-(

I no longer think that about this problem, but we've still got some
nasty-looking issues in xml.c, plus Hannes Dorbath's unsolved reports
of GIST/GIN problems ...
        regards, tom lane
 
+         /*
+          * Insist that each side have a non-redundant eclass.  This
+          * restriction is needed because various bits of the planner expect
+          * that each clause in a merge be associatable with some pathkey in a
+          * canonical pathkey list, but redundant eclasses can't appear in
+          * canonical sort orderings.  (XXX it might be worth relaxing this,
+          * but not enough time to address it for 8.3.)
+          *
+          * Note: it would be bad if this condition failed for an otherwise
+          * mergejoinable FULL JOIN clause, since that would result in
+          * undesirable planner failure.  I believe that is not possible
+          * however; a variable involved in a full join could only appear
+          * in below_outer_join eclasses, which aren't considered redundant.
+          *
+          * This *can* happen for left/right join clauses, however: the
+          * outer-side variable could be equated to a constant.  (Because we
+          * will propagate that constant across the join clause, the loss of
+          * ability to do a mergejoin is not really all that big a deal, and
+          * so it's not clear that improving this is important.)
+          */
+         cache_mergeclause_eclasses(root, restrictinfo);
+ 
+         if (EC_MUST_BE_REDUNDANT(restrictinfo->left_ec) ||
+             EC_MUST_BE_REDUNDANT(restrictinfo->right_ec))
+         {
+             have_nonmergeable_joinclause = true;
+             continue;            /* can't handle redundant eclasses */
+         }
+          result_list = lappend(result_list, restrictinfo);     } 


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: OUTER JOIN performance regression remains in 8.3beta4
Next
From: "Tim Yardley"
Date:
Subject: tzdata issue on cross-compiled postgresql