Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join - Mailing list pgsql-performance

From David Rowley
Subject Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
Date
Msg-id CAApHDvqh4wSDOD+UVZ2xdaymu_-6rvkxjdcTQHOVAT-RTCdzog@mail.gmail.com
Whole thread Raw
In response to Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join  ("Ehrenreich, Sigrid" <Ehrenreich@consist.de>)
Responses Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Tue, 27 Oct 2020 at 06:54, Ehrenreich, Sigrid <Ehrenreich@consist.de> wrote:
>   ->  Hash Join  (cost=226.27..423.82 rows=115 width=0) (actual time=3.150..7.511 rows=3344 loops=1)   <===========
Withthe FK, the estimation should be 3344, but it is 115 rows
 

I'd have expected this to find the foreign key and have the join
selectivity of 1.0, but I see it does not due to the fact that one of
the EquivalenceClass has a constant due to the fact.low_card = 1 qual.

In build_join_rel() we call build_joinrel_restrictlist() to get the
join quals that need to be evaluated at the join level, but we only
get the fact.anydata1=dim.anydata1 and fact.anydata2=dim.anydata2
quals there.  The low_card qual gets pushed down to the scan level on
each side of the join, so no need for it to get evaluated at the join
level. Later in build_join_rel() we do set_joinrel_size_estimates().
The restrictlist with just the two quals is what we pass to
get_foreign_key_join_selectivity().  Only two of the foreign key
columns are matched there, therefore we don't class that as a match
and just leave it up to the normal selectivity functions.

I feel like we could probably do better there and perhaps somehow
count ECs with ec_has_const as matched, but there seems to be some
assumptions later in get_foreign_key_join_selectivity() where we
determine the selectivity based on the base rel's tuple count.  We'd
need to account for how many rows remainder after filtering the ECs
with ec_has_const == true, else we'd be doing the wrong thing.  That
needs more thought than I have time for right now.

Your case would work if the foreign key had been on just anydata1 and
anydata2, but there's not much chance of that working without a unique
index on those two columns.

Extended statistics won't help you here either since they're currently
not used for join estimations.

David



pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Postgres Optimizer ignores information about foreign key relationship, severely misestimating number of returned rows in join
Next
From: Tom Lane
Date:
Subject: Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join