Re: ERROR: left and right pathkeys do not match in mergejoin - Mailing list pgsql-hackers

From Tom Lane
Subject Re: ERROR: left and right pathkeys do not match in mergejoin
Date
Msg-id 30597.1519329124@sss.pgh.pa.us
Whole thread Raw
In response to Re: ERROR: left and right pathkeys do not match in mergejoin  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
I wrote:
> The third possibility is to decide that create_mergejoin_plan is being
> overly paranoid and it's okay to extract merge details from a "redundant"
> path key even though it specifies the opposite sort order from what the
> current merge clause seems to need.  This is scary at first glance, but
> it seems like it should work.

BTW, while working through this, I realized that there's an additional
problem in the same area, which can be demonstrated thus in the
regression database:

explain select * from
  (select * from tenk1 a order by a.unique2) a
  right join
  (select * from tenk1 b order by b.thousand, b.twothousand, b.fivethous) b
  on a.unique2 = b.thousand and a.hundred = b.twothousand and a.unique2 = b.fivethous;
ERROR:  outer pathkeys do not match mergeclauses

The problem here is that find_mergeclauses_for_pathkeys has an API
designed on the assumption that the outer and inner pathkeys have
identical relationships to the mergeclauses, ie, if you want to find
the mergeclauses to use given a particular available sort ordering,
it works basically the same way for outer or inner pathkeys.  But per
this discussion, tain't so.  What really happens is that initially
we choose a list of mergeclauses that line up with the outer pathkeys,
and then we identify inner pathkeys that make sense given that list
of mergeclauses; but, in the presence of redundancy, those inner pathkeys
are not necessarily one-for-one with the mergeclauses.  All fine so
far.  But then, generate_mergejoin_paths looks at inner paths that have
available sort keys that are truncations of the original inner pathkey
list, and it uses find_mergeclauses_for_pathkeys to decide which
mergeclauses still make sense for that truncated pathkey list.  That
doesn't work.  In my example, we consider the pre-ordered sub-select
output for B as the outer side, and we choose all three mergeclauses
in the order the example has them, and we arrive at the initial inner
pathkey list of ({a.unique2}, {a.hundred}), dropping the redundant
second appearance of {a.unique2}.  We can successfully make a plan
from that.  But then we consider the truncated inner pathkey list
({a.unique2}), which this example is set up to ensure will win since
it avoids an extra sort step.  As the code stands, the mergeclause list
that's extracted to use with that pair of input paths is
    a.unique2 = b.thousand and a.unique2 = b.fivethous
(from the find_mergeclauses_for_pathkeys call with outer_keys = false).
That's just wrong, because it doesn't match the already-determined
outer path order, and create_mergejoin_plan is quite right to whine.

So it seems like find_mergeclauses_for_pathkeys should be renamed to
find_mergeclauses_for_outer_pathkeys, and then we need a second
function for the task of truncating the outer mergeclauses --- not
selecting them from scratch --- given a truncated inner pathkey list.
In this example, we could keep a.unique2 = b.thousand, but we'd have
to drop a.hundred = b.twothousand and then everything after it, since
the inner path doesn't have the sort order needed for that mergeclause.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Allow workers to override datallowconn
Next
From: Andres Freund
Date:
Subject: Re: Online enabling of checksums