On Tue, 27 Oct 2020 at 06:54, Ehrenreich, Sigrid <Ehrenreich@consist.de> wrote:
> -> Hash Join (cost=226.27..423.82 rows=115 width=0) (actual time=3.150..7.511 rows=3344 loops=1) <===========
Withthe FK, the estimation should be 3344, but it is 115 rows
I'd have expected this to find the foreign key and have the join
selectivity of 1.0, but I see it does not due to the fact that one of
the EquivalenceClass has a constant due to the fact.low_card = 1 qual.
In build_join_rel() we call build_joinrel_restrictlist() to get the
join quals that need to be evaluated at the join level, but we only
get the fact.anydata1=dim.anydata1 and fact.anydata2=dim.anydata2
quals there. The low_card qual gets pushed down to the scan level on
each side of the join, so no need for it to get evaluated at the join
level. Later in build_join_rel() we do set_joinrel_size_estimates().
The restrictlist with just the two quals is what we pass to
get_foreign_key_join_selectivity(). Only two of the foreign key
columns are matched there, therefore we don't class that as a match
and just leave it up to the normal selectivity functions.
I feel like we could probably do better there and perhaps somehow
count ECs with ec_has_const as matched, but there seems to be some
assumptions later in get_foreign_key_join_selectivity() where we
determine the selectivity based on the base rel's tuple count. We'd
need to account for how many rows remainder after filtering the ECs
with ec_has_const == true, else we'd be doing the wrong thing. That
needs more thought than I have time for right now.
Your case would work if the foreign key had been on just anydata1 and
anydata2, but there's not much chance of that working without a unique
index on those two columns.
Extended statistics won't help you here either since they're currently
not used for join estimations.
David