Re: Understanding bad estimate (related to FKs?) - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Understanding bad estimate (related to FKs?)
Date
Msg-id 20201029224845.ptteqhzq65bh6jsh@development
Whole thread Raw
In response to Re: Understanding bad estimate (related to FKs?)  (Philip Semanchuk <philip@americanefficient.com>)
Responses Re: Understanding bad estimate (related to FKs?)  (Philip Semanchuk <philip@americanefficient.com>)
List pgsql-performance
On Thu, Oct 29, 2020 at 11:25:48AM -0400, Philip Semanchuk wrote:
>
>
>> On Oct 28, 2020, at 9:13 PM, Justin Pryzby <pryzby@telsasoft.com>
>> wrote:
>>
>> On Mon, Oct 26, 2020 at 11:20:01AM -0600, Michael Lewis wrote:
>>> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk
>>> <philip@americanefficient.com> wrote:
>>>
>>>>>> The item I'm focused on is node 23. The estimate is for 7 rows,
>>>>>> actual
>>>> is 896 (multiplied by 1062 loops). I'm confused about two things in
>>>> this node.
>>>>>>
>>>>>> The first is Postgres' estimate. The condition for this index
>>>>>> scan
>>>> contains three expressions --
>>>>>>
>>>>>> (five_uniform = zulu_five.five_uniform) AND (whiskey_mike =
>>>>>> juliet_india.whiskey_mike) AND (bravo = 'mike'::text)
>>>>
>>>
>>> Are the columns correlated?
>>
>> I guess it shouldn't matter, since the FKs should remove all but one
>> of the conditions.
>
>Yes, I had the same expectation. I thought Postgres would calculate the
>selectivity as 1.0 * 1.0 * whatever estimate it has for the frequency
>of ‘mike’, but since the frequency estimate is very accurate but the
>planner’s estimate is not, there’s something else going on.
>

Well, this is quite a bit more complicated, I'm afraid :-( The clauses
include parameters passed from the nodes above the index scan. So even
if we had extended stats on the table, we couldn't use them as that
requires (Var op Const) conditions. So this likely ends up with a
product of estimates for each clause, and even then we can't use any
particular value so we probably end up with something like 1/ndistinct
or something like that. So if the values actually passed to the index
scan are more common and/or if the columns are somehow correlated, it's
not surprising we end up with an overestimate.

>> Maybe you saw this other thread, which I tentatively think also
>> affects your case (equijoin with nonjoin condition)
>>
https://www.postgresql.org/message-id/AM6PR02MB5287A0ADD936C1FA80973E72AB190%40AM6PR02MB5287.eurprd02.prod.outlook.com
>
>Yes, thank you, I read that thread with interest. I tried your clever
>trick using BETWEEN, but it didn’t change the plan. Does that suggest
>there’s some other cause for the planner’s poor estimate?
>

I don't think that's related - to hit that bug, there would have to be
implied conditions pushed-down to the scan level. And there's nothing
like that in this case.

FWIW I don't think this has anything to do with join cardinality
estimation - at least not for the node 23.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-performance by date:

Previous
From: Philip Semanchuk
Date:
Subject: Re: Understanding bad estimate (related to FKs?)
Next
From: Philip Semanchuk
Date:
Subject: Re: Understanding bad estimate (related to FKs?)