pgsql: Fix foreign-key selectivity estimation in the presence of consta - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Fix foreign-key selectivity estimation in the presence of consta
Date
Msg-id E1kXnB5-0001oj-O0@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix foreign-key selectivity estimation in the presence of constants.

get_foreign_key_join_selectivity() looks for join clauses that equate
the two sides of the FK constraint.  However, if we have a query like
"WHERE fktab.a = pktab.a and fktab.a = 1", it won't find any such join
clause, because equivclass.c replaces the given clauses with "fktab.a
= 1 and pktab.a = 1", which can be enforced at the scan level, leaving
nothing to be done for column "a" at the join level.

We can fix that expectation without much trouble, but then a new problem
arises: applying the foreign-key-based selectivity rule produces a
rowcount underestimate, because we're effectively double-counting the
selectivity of the "fktab.a = 1" clause.  So we have to cancel that
selectivity out of the estimate.

To fix, refactor process_implied_equality() so that it can pass back the
new RestrictInfo to its callers in equivclass.c, allowing the generated
"fktab.a = 1" clause to be saved in the EquivalenceClass's ec_derives
list.  Then it's not much trouble to dig out the relevant RestrictInfo
when we need to adjust an FK selectivity estimate.  (While at it, we
can also remove the expensive use of initialize_mergeclause_eclasses()
to set up the new RestrictInfo's left_ec and right_ec pointers.
The equivclass.c code can set those basically for free.)

This seems like clearly a bug fix, but I'm hesitant to back-patch it,
first because there's some API/ABI risk for extensions and second because
we're usually loath to destabilize plan choices in stable branches.

Per report from Sigrid Ehrenreich.

Discussion: https://postgr.es/m/1019549.1603770457@sss.pgh.pa.us
Discussion: https://postgr.es/m/AM6PR02MB5287A0ADD936C1FA80973E72AB190@AM6PR02MB5287.eurprd02.prod.outlook.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ad1c36b0709e47cdb3cc4abd6c939fe64279b63f

Modified Files
--------------
src/backend/nodes/outfuncs.c            |   1 +
src/backend/optimizer/path/costsize.c   |  57 +++++++++-
src/backend/optimizer/path/equivclass.c | 121 ++++++++++++++++-----
src/backend/optimizer/plan/initsplan.c  | 184 +++++++++++++++++++++-----------
src/backend/optimizer/util/plancat.c    |   2 +
src/include/nodes/pathnodes.h           |   3 +
src/include/optimizer/paths.h           |   2 +
src/include/optimizer/planmain.h        |  20 ++--
src/test/regress/expected/join.out      |  50 +++++++++
src/test/regress/sql/join.sql           |  29 +++++
10 files changed, 366 insertions(+), 103 deletions(-)


pgsql-committers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: pgsql: Add pg_relation_check_pages() to check on-disk pages of a relati
Next
From: Tom Lane
Date:
Subject: pgsql: Don't use custom OID symbols in pg_proc.dat.