pgsql: Fix planner failures with overlapping mergejoin clauses in anou - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Fix planner failures with overlapping mergejoin clauses in anou
Date
Msg-id E1epINu-0000U0-I8@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix planner failures with overlapping mergejoin clauses in an outer join.

Given overlapping or partially redundant join clauses, for example
        t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x
the planner's EquivalenceClass machinery will ordinarily refactor the
clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't
see multiple references to the same EquivalenceClass in a list of join
equality clauses.  However, if the join is outer, it's incorrect to derive
a restriction clause on the outer side from the join conditions, so the
clause refactoring does not happen and we end up with overlapping join
conditions.  The code that attempted to deal with such cases had several
subtle bugs, which could result in "left and right pathkeys do not match in
mergejoin" or "outer pathkeys do not match mergeclauses" planner errors,
if the selected join plan type was a mergejoin.  (It does not appear that
any actually incorrect plan could have been emitted.)

The core of the problem really was failure to recognize that the outer and
inner relations' pathkeys have different relationships to the mergeclause
list.  A join's mergeclause list is constructed by reference to the outer
pathkeys, so it will always be ordered the same as the outer pathkeys, but
this cannot be presumed true for the inner pathkeys.  If the inner sides of
the mergeclauses contain multiple references to the same EquivalenceClass
({t2.x} in the above example) then a simplistic rendering of the required
inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery
recognizes that the second sort column is redundant and throws it away.
The mergejoin planning code failed to account for that behavior properly.
One error was to try to generate cut-down versions of the mergeclause list
from cut-down versions of the inner pathkeys in the same way as the initial
construction of the mergeclause list from the outer pathkeys was done; this
could lead to choosing a mergeclause list that fails to match the outer
pathkeys.  The other problem was that the pathkey cross-checking code in
create_mergejoin_plan treated the inner and outer pathkey lists
identically, whereas actually the expectations for them must be different.
That led to false "pathkeys do not match" failures in some cases, and in
principle could have led to failure to detect bogus plans in other cases,
though there is no indication that such bogus plans could be generated.

Reported by Alexander Kuzmenkov, who also reviewed this patch.  This has
been broken for years (back to around 8.3 according to my testing), so
back-patch to all supported branches.

Discussion: https://postgr.es/m/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru

Branch
------
REL9_3_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/71a0d0c5aad26642b58dcd8d5d2438411e08640e

Modified Files
--------------
src/backend/optimizer/path/joinpath.c   |  30 +++----
src/backend/optimizer/path/pathkeys.c   | 150 ++++++++++++++++++++++++++------
src/backend/optimizer/plan/createplan.c | 145 +++++++++++++++---------------
src/include/optimizer/paths.h           |  10 ++-
src/test/regress/expected/join.out      |  80 +++++++++++++++++
src/test/regress/sql/join.sql           |  31 +++++++
6 files changed, 322 insertions(+), 124 deletions(-)


pgsql-committers by date:

Previous
From: Robert Haas
Date:
Subject: pgsql: Revise API for partition bound search functions.
Next
From: Noah Misch
Date:
Subject: pgsql: Synchronize doc/ copies of src/test/examples/.