RE: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join - Mailing list pgsql-performance

From Ehrenreich, Sigrid
Subject RE: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
Date
Msg-id AM0PR02MB5283D15DE7274926B3B212E8AB160@AM0PR02MB5283.eurprd02.prod.outlook.com
Whole thread Raw
In response to Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Hi Tom,

A patch would be very much appreciated.
We are currently running on Version 12, but could upgrade to 13, if necessary.

Could you send me a notification if you managed to program a patch for that?

Regards,
Sigrid

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: Monday, October 26, 2020 11:54 PM
To: David Rowley <dgrowleyml@gmail.com>
Cc: Ehrenreich, Sigrid <Ehrenreich@consist.de>; pgsql-performance@lists.postgresql.org
Subject: Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of
returnedrows in join 

David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, 27 Oct 2020 at 06:54, Ehrenreich, Sigrid <Ehrenreich@consist.de> wrote:
>> ->  Hash Join  (cost=226.27..423.82 rows=115 width=0) (actual time=3.150..7.511 rows=3344 loops=1)   <===========
Withthe FK, the estimation should be 3344, but it is 115 rows 

> I'd have expected this to find the foreign key and have the join
> selectivity of 1.0, but I see it does not due to the fact that one of
> the EquivalenceClass has a constant due to the fact.low_card = 1 qual.

Right.

> I feel like we could probably do better there and perhaps somehow
> count ECs with ec_has_const as matched, but there seems to be some
> assumptions later in get_foreign_key_join_selectivity() where we
> determine the selectivity based on the base rel's tuple count.  We'd
> need to account for how many rows remainder after filtering the ECs
> with ec_has_const == true, else we'd be doing the wrong thing.  That
> needs more thought than I have time for right now.

Yeah, I'm fooling with a patch for that now.  The basic problem is
that the selectivity of the x = constant clauses has already been
factored into the sizes of both join input relations, so we're
double-counting it if we just apply the existing FK-based
selectivity estimate.  I think though that we can recover the
selectivity associated with that qual on the FK side (or should
it be the PK side?) and cancel it out of the FK selectivity.

            regards, tom lane



pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Postgres Optimizer ignores information about foreign key relationship, severly misestimating number of returned rows in join
Next
From: Mats Olsen
Date:
Subject: Re: Query Performance / Planner estimate off