Re: ERROR: left and right pathkeys do not match in mergejoin - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: ERROR: left and right pathkeys do not match in mergejoin |
Date | |
Msg-id | 30597.1519329124@sss.pgh.pa.us Whole thread Raw |
In response to | Re: ERROR: left and right pathkeys do not match in mergejoin (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
I wrote: > The third possibility is to decide that create_mergejoin_plan is being > overly paranoid and it's okay to extract merge details from a "redundant" > path key even though it specifies the opposite sort order from what the > current merge clause seems to need. This is scary at first glance, but > it seems like it should work. BTW, while working through this, I realized that there's an additional problem in the same area, which can be demonstrated thus in the regression database: explain select * from (select * from tenk1 a order by a.unique2) a right join (select * from tenk1 b order by b.thousand, b.twothousand, b.fivethous) b on a.unique2 = b.thousand and a.hundred = b.twothousand and a.unique2 = b.fivethous; ERROR: outer pathkeys do not match mergeclauses The problem here is that find_mergeclauses_for_pathkeys has an API designed on the assumption that the outer and inner pathkeys have identical relationships to the mergeclauses, ie, if you want to find the mergeclauses to use given a particular available sort ordering, it works basically the same way for outer or inner pathkeys. But per this discussion, tain't so. What really happens is that initially we choose a list of mergeclauses that line up with the outer pathkeys, and then we identify inner pathkeys that make sense given that list of mergeclauses; but, in the presence of redundancy, those inner pathkeys are not necessarily one-for-one with the mergeclauses. All fine so far. But then, generate_mergejoin_paths looks at inner paths that have available sort keys that are truncations of the original inner pathkey list, and it uses find_mergeclauses_for_pathkeys to decide which mergeclauses still make sense for that truncated pathkey list. That doesn't work. In my example, we consider the pre-ordered sub-select output for B as the outer side, and we choose all three mergeclauses in the order the example has them, and we arrive at the initial inner pathkey list of ({a.unique2}, {a.hundred}), dropping the redundant second appearance of {a.unique2}. We can successfully make a plan from that. But then we consider the truncated inner pathkey list ({a.unique2}), which this example is set up to ensure will win since it avoids an extra sort step. As the code stands, the mergeclause list that's extracted to use with that pair of input paths is a.unique2 = b.thousand and a.unique2 = b.fivethous (from the find_mergeclauses_for_pathkeys call with outer_keys = false). That's just wrong, because it doesn't match the already-determined outer path order, and create_mergejoin_plan is quite right to whine. So it seems like find_mergeclauses_for_pathkeys should be renamed to find_mergeclauses_for_outer_pathkeys, and then we need a second function for the task of truncating the outer mergeclauses --- not selecting them from scratch --- given a truncated inner pathkey list. In this example, we could keep a.unique2 = b.thousand, but we'd have to drop a.hundred = b.twothousand and then everything after it, since the inner path doesn't have the sort order needed for that mergeclause. regards, tom lane
pgsql-hackers by date: