Thread: Understanding bad estimate (related to FKs?)

Understanding bad estimate (related to FKs?)

From
Philip Semanchuk
Date:
I'm trying to understand a bad estimate by the planner, and what I can do about it. The anonymized plan is here:
https://explain.depesz.com/s/0MDz

The item I'm focused on is node 23. The estimate is for 7 rows, actual is 896 (multiplied by 1062 loops). I'm confused
abouttwo things in this node. 

The first is Postgres' estimate. The condition for this index scan contains three expressions --

(five_uniform = zulu_five.five_uniform) AND
(whiskey_mike = juliet_india.whiskey_mike) AND
(bravo = 'mike'::text)

The columns in the first two expressions (five_uniform and whiskey_mike) are NOT NULL, and have foreign key constraints
totheir respective tables (zulu_five.five_uniform and juliet_india.whiskey_mike). The planner can know in advance that
100%of the rows in the table will satisfy those criteria. 

For the third expression (bravo = 'mike'), Postgres has excellent statistics. The estimated frequency of 'mike' is
2.228%,actual frequency is 2.242%, so Postgres' estimate is only off by a tiny amount (0.014%). 

From what I understand, the planner has all the information it needs to make a very accurate estimate here, but it's
offby quite a lot. What information am I failing to give to the planner? 

My second point of confusion is related. There are 564,071 rows in the source table (xray_india, aliased as papa) that
satisfythe condition bravo = 'mike'. EXPLAIN reports the actual number of rows returned as 896*1062 ~= 951,552. I
understandthat the number reported by EXPLAIN is often a bit bigger, but this discrepancy is much larger than I'm
expecting.What am I missing here? 

Thanks,
Philip


Re: Understanding bad estimate (related to FKs?)

From
Justin Pryzby
Date:
On Mon, Oct 26, 2020 at 12:50:38PM -0400, Philip Semanchuk wrote:
> I'm trying to understand a bad estimate by the planner, and what I can do about it. The anonymized plan is here:
https://explain.depesz.com/s/0MDz

What postgres version ?
Since 9.6(?) FKs affect estimates.

> The item I'm focused on is node 23. The estimate is for 7 rows, actual is 896 (multiplied by 1062 loops). I'm
confusedabout two things in this node.
 
> 
> The first is Postgres' estimate. The condition for this index scan contains three expressions --
> 
> (five_uniform = zulu_five.five_uniform) AND
> (whiskey_mike = juliet_india.whiskey_mike) AND
> (bravo = 'mike'::text)
> 
> The columns in the first two expressions (five_uniform and whiskey_mike) are NOT NULL, and have foreign key
constraintsto their respective tables (zulu_five.five_uniform and juliet_india.whiskey_mike). The planner can know in
advancethat 100% of the rows in the table will satisfy those criteria.
 
> 
> For the third expression (bravo = 'mike'), Postgres has excellent statistics. The estimated frequency of 'mike' is
2.228%,actual frequency is 2.242%, so Postgres' estimate is only off by a tiny amount (0.014%).
 
> 
> From what I understand, the planner has all the information it needs to make a very accurate estimate here, but it's
offby quite a lot. What information am I failing to give to the planner?
 
> 
> My second point of confusion is related. There are 564,071 rows in the source table (xray_india, aliased as papa)
thatsatisfy the condition bravo = 'mike'. EXPLAIN reports the actual number of rows returned as 896*1062 ~= 951,552. I
understandthat the number reported by EXPLAIN is often a bit bigger, but this discrepancy is much larger than I'm
expecting.What am I missing here?
 



Re: Understanding bad estimate (related to FKs?)

From
Philip Semanchuk
Date:

> On Oct 26, 2020, at 1:04 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Oct 26, 2020 at 12:50:38PM -0400, Philip Semanchuk wrote:
>> I'm trying to understand a bad estimate by the planner, and what I can do about it. The anonymized plan is here:
https://explain.depesz.com/s/0MDz
>
> What postgres version ?
> Since 9.6(?) FKs affect estimates.

We’re using 11.6 (under AWS Aurora).


>
>> The item I'm focused on is node 23. The estimate is for 7 rows, actual is 896 (multiplied by 1062 loops). I'm
confusedabout two things in this node. 
>>
>> The first is Postgres' estimate. The condition for this index scan contains three expressions --
>>
>> (five_uniform = zulu_five.five_uniform) AND
>> (whiskey_mike = juliet_india.whiskey_mike) AND
>> (bravo = 'mike'::text)
>>
>> The columns in the first two expressions (five_uniform and whiskey_mike) are NOT NULL, and have foreign key
constraintsto their respective tables (zulu_five.five_uniform and juliet_india.whiskey_mike). The planner can know in
advancethat 100% of the rows in the table will satisfy those criteria. 
>>
>> For the third expression (bravo = 'mike'), Postgres has excellent statistics. The estimated frequency of 'mike' is
2.228%,actual frequency is 2.242%, so Postgres' estimate is only off by a tiny amount (0.014%). 
>>
>> From what I understand, the planner has all the information it needs to make a very accurate estimate here, but it's
offby quite a lot. What information am I failing to give to the planner? 
>>
>> My second point of confusion is related. There are 564,071 rows in the source table (xray_india, aliased as papa)
thatsatisfy the condition bravo = 'mike'. EXPLAIN reports the actual number of rows returned as 896*1062 ~= 951,552. I
understandthat the number reported by EXPLAIN is often a bit bigger, but this discrepancy is much larger than I'm
expecting.What am I missing here? 




Re: Understanding bad estimate (related to FKs?)

From
Michael Lewis
Date:
On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk <philip@americanefficient.com> wrote:
>> The item I'm focused on is node 23. The estimate is for 7 rows, actual is 896 (multiplied by 1062 loops). I'm confused about two things in this node.
>>
>> The first is Postgres' estimate. The condition for this index scan contains three expressions --
>>
>> (five_uniform = zulu_five.five_uniform) AND
>> (whiskey_mike = juliet_india.whiskey_mike) AND
>> (bravo = 'mike'::text)

Are the columns correlated? Have you tried to create extended statistics and see if the estimate changes? I believe that extended stats will not directly help with joins though, only group bys and perhaps choosing an index scan vs table scan when comparing the correlated columns to static values rather than joining up tables. Wouldn't be much effort to try it though.

Re: Understanding bad estimate (related to FKs?)

From
Philip Semanchuk
Date:

> On Oct 26, 2020, at 1:20 PM, Michael Lewis <mlewis@entrata.com> wrote:
>
> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk <philip@americanefficient.com> wrote:
> >> The item I'm focused on is node 23. The estimate is for 7 rows, actual is 896 (multiplied by 1062 loops). I'm
confusedabout two things in this node. 
> >>
> >> The first is Postgres' estimate. The condition for this index scan contains three expressions --
> >>
> >> (five_uniform = zulu_five.five_uniform) AND
> >> (whiskey_mike = juliet_india.whiskey_mike) AND
> >> (bravo = 'mike'::text)
>
> Are the columns correlated? Have you tried to create extended statistics and see if the estimate changes? I believe
thatextended stats will not directly help with joins though, only group bys and perhaps choosing an index scan vs table
scanwhen comparing the correlated columns to static values rather than joining up tables. Wouldn't be much effort to
tryit though. 


There’s not a lot of correlation between whiskey_mike and bravo --
stxkind     stxndistinct    stxdependencies
['d', 'f']  {"7, 12": 42}   {"12 => 7": 0.000274}

Those stats didn’t help the planner.

I should have mentioned that five_uniform has ~63k unique values, whereas whiskey_mike has only 3, and bravo only 19.

Cheers
Philip


Re: Understanding bad estimate (related to FKs?)

From
Philip Semanchuk
Date:

> On Oct 28, 2020, at 9:13 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Oct 26, 2020 at 11:20:01AM -0600, Michael Lewis wrote:
>> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk <philip@americanefficient.com> wrote:
>>
>>>>> The item I'm focused on is node 23. The estimate is for 7 rows, actual
>>> is 896 (multiplied by 1062 loops). I'm confused about two things in this
>>> node.
>>>>>
>>>>> The first is Postgres' estimate. The condition for this index scan
>>> contains three expressions --
>>>>>
>>>>> (five_uniform = zulu_five.five_uniform) AND
>>>>> (whiskey_mike = juliet_india.whiskey_mike) AND
>>>>> (bravo = 'mike'::text)
>>>
>>
>> Are the columns correlated?
>
> I guess it shouldn't matter, since the FKs should remove all but one of the
> conditions.

Yes, I had the same expectation. I thought Postgres would calculate the selectivity as 1.0 * 1.0 * whatever estimate it
hasfor the frequency of ‘mike’, but since the frequency estimate is very accurate but the planner’s estimate is not,
there’ssomething else going on.  

> Maybe you saw this other thread, which I tentatively think also affects your
> case (equijoin with nonjoin condition)
>
https://www.postgresql.org/message-id/AM6PR02MB5287A0ADD936C1FA80973E72AB190%40AM6PR02MB5287.eurprd02.prod.outlook.com

Yes, thank you, I read that thread with interest. I tried your clever trick using BETWEEN, but it didn’t change the
plan.Does that suggest there’s some other cause for the planner’s poor estimate? 

Cheers
Philip







Re: Understanding bad estimate (related to FKs?)

From
Tomas Vondra
Date:
On Thu, Oct 29, 2020 at 11:25:48AM -0400, Philip Semanchuk wrote:
>
>
>> On Oct 28, 2020, at 9:13 PM, Justin Pryzby <pryzby@telsasoft.com>
>> wrote:
>>
>> On Mon, Oct 26, 2020 at 11:20:01AM -0600, Michael Lewis wrote:
>>> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk
>>> <philip@americanefficient.com> wrote:
>>>
>>>>>> The item I'm focused on is node 23. The estimate is for 7 rows,
>>>>>> actual
>>>> is 896 (multiplied by 1062 loops). I'm confused about two things in
>>>> this node.
>>>>>>
>>>>>> The first is Postgres' estimate. The condition for this index
>>>>>> scan
>>>> contains three expressions --
>>>>>>
>>>>>> (five_uniform = zulu_five.five_uniform) AND (whiskey_mike =
>>>>>> juliet_india.whiskey_mike) AND (bravo = 'mike'::text)
>>>>
>>>
>>> Are the columns correlated?
>>
>> I guess it shouldn't matter, since the FKs should remove all but one
>> of the conditions.
>
>Yes, I had the same expectation. I thought Postgres would calculate the
>selectivity as 1.0 * 1.0 * whatever estimate it has for the frequency
>of ‘mike’, but since the frequency estimate is very accurate but the
>planner’s estimate is not, there’s something else going on.
>

Well, this is quite a bit more complicated, I'm afraid :-( The clauses
include parameters passed from the nodes above the index scan. So even
if we had extended stats on the table, we couldn't use them as that
requires (Var op Const) conditions. So this likely ends up with a
product of estimates for each clause, and even then we can't use any
particular value so we probably end up with something like 1/ndistinct
or something like that. So if the values actually passed to the index
scan are more common and/or if the columns are somehow correlated, it's
not surprising we end up with an overestimate.

>> Maybe you saw this other thread, which I tentatively think also
>> affects your case (equijoin with nonjoin condition)
>>
https://www.postgresql.org/message-id/AM6PR02MB5287A0ADD936C1FA80973E72AB190%40AM6PR02MB5287.eurprd02.prod.outlook.com
>
>Yes, thank you, I read that thread with interest. I tried your clever
>trick using BETWEEN, but it didn’t change the plan. Does that suggest
>there’s some other cause for the planner’s poor estimate?
>

I don't think that's related - to hit that bug, there would have to be
implied conditions pushed-down to the scan level. And there's nothing
like that in this case.

FWIW I don't think this has anything to do with join cardinality
estimation - at least not for the node 23.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Understanding bad estimate (related to FKs?)

From
Philip Semanchuk
Date:

> On Oct 29, 2020, at 6:48 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Thu, Oct 29, 2020 at 11:25:48AM -0400, Philip Semanchuk wrote:
>>
>>
>>> On Oct 28, 2020, at 9:13 PM, Justin Pryzby <pryzby@telsasoft.com>
>>> wrote:
>>>
>>> On Mon, Oct 26, 2020 at 11:20:01AM -0600, Michael Lewis wrote:
>>>> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk
>>>> <philip@americanefficient.com> wrote:
>>>>
>>>>>>> The item I'm focused on is node 23. The estimate is for 7 rows,
>>>>>>> actual
>>>>> is 896 (multiplied by 1062 loops). I'm confused about two things in
>>>>> this node.
>>>>>>>
>>>>>>> The first is Postgres' estimate. The condition for this index
>>>>>>> scan
>>>>> contains three expressions --
>>>>>>>
>>>>>>> (five_uniform = zulu_five.five_uniform) AND (whiskey_mike =
>>>>>>> juliet_india.whiskey_mike) AND (bravo = 'mike'::text)
>>>>>
>>>>
>>>> Are the columns correlated?
>>>
>>> I guess it shouldn't matter, since the FKs should remove all but one
>>> of the conditions.
>>
>> Yes, I had the same expectation. I thought Postgres would calculate the
>> selectivity as 1.0 * 1.0 * whatever estimate it has for the frequency
>> of ‘mike’, but since the frequency estimate is very accurate but the
>> planner’s estimate is not, there’s something else going on.
>>
>
> Well, this is quite a bit more complicated, I'm afraid :-( The clauses
> include parameters passed from the nodes above the index scan. So even
> if we had extended stats on the table, we couldn't use them as that
> requires (Var op Const) conditions. So this likely ends up with a
> product of estimates for each clause, and even then we can't use any
> particular value so we probably end up with something like 1/ndistinct
> or something like that. So if the values actually passed to the index
> scan are more common and/or if the columns are somehow correlated, it's
> not surprising we end up with an overestimate.

I appreciate the insight. 1/ndistinct is exactly right. In pg_stats, five_uniform’s ndistinct = 26326, and
whiskey_mike’sndistinct = 3. The estimated frequency of bravo = ‘mike’ is .02228. There are 25156157 rows in the source
table,so we have:  

25156157 * (1/26326.0) * (1/3.0) * .02228 = 7.0966494209

Hence the estimate of 7 rows returned.

It's interesting that five_uniform’s estimated ndistinct is low by > 50% (actual = 62958). Paradoxically, if I manually
setndistinct to the correct value of 62958, the estimate gets worse (3 rows instead of 7).  

Suggestions for fixing this are of course welcome. :-)

On a related topic, are there any in depth guides to the planner that I could read? I can (and have) read the source
codeand it’s been informative, but something higher level than the source code would help. 

Thanks so much
Philip









Re: Understanding bad estimate (related to FKs?)

From
Guillaume Lelarge
Date:
Hi,

Le ven. 30 oct. 2020 à 15:57, Philip Semanchuk <philip@americanefficient.com> a écrit :


> On Oct 29, 2020, at 6:48 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Thu, Oct 29, 2020 at 11:25:48AM -0400, Philip Semanchuk wrote:
>>
>>
>>> On Oct 28, 2020, at 9:13 PM, Justin Pryzby <pryzby@telsasoft.com>
>>> wrote:
>>>
>>> On Mon, Oct 26, 2020 at 11:20:01AM -0600, Michael Lewis wrote:
>>>> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk
>>>> <philip@americanefficient.com> wrote:
>>>>
>>>>>>> The item I'm focused on is node 23. The estimate is for 7 rows,
>>>>>>> actual
>>>>> is 896 (multiplied by 1062 loops). I'm confused about two things in
>>>>> this node.
>>>>>>>
>>>>>>> The first is Postgres' estimate. The condition for this index
>>>>>>> scan
>>>>> contains three expressions --
>>>>>>>
>>>>>>> (five_uniform = zulu_five.five_uniform) AND (whiskey_mike =
>>>>>>> juliet_india.whiskey_mike) AND (bravo = 'mike'::text)
>>>>>
>>>>
>>>> Are the columns correlated?
>>>
>>> I guess it shouldn't matter, since the FKs should remove all but one
>>> of the conditions.
>>
>> Yes, I had the same expectation. I thought Postgres would calculate the
>> selectivity as 1.0 * 1.0 * whatever estimate it has for the frequency
>> of ‘mike’, but since the frequency estimate is very accurate but the
>> planner’s estimate is not, there’s something else going on.
>>
>
> Well, this is quite a bit more complicated, I'm afraid :-( The clauses
> include parameters passed from the nodes above the index scan. So even
> if we had extended stats on the table, we couldn't use them as that
> requires (Var op Const) conditions. So this likely ends up with a
> product of estimates for each clause, and even then we can't use any
> particular value so we probably end up with something like 1/ndistinct
> or something like that. So if the values actually passed to the index
> scan are more common and/or if the columns are somehow correlated, it's
> not surprising we end up with an overestimate.

I appreciate the insight. 1/ndistinct is exactly right. In pg_stats, five_uniform’s ndistinct = 26326, and whiskey_mike’s ndistinct = 3. The estimated frequency of bravo = ‘mike’ is .02228. There are 25156157 rows in the source table, so we have:

25156157 * (1/26326.0) * (1/3.0) * .02228 = 7.0966494209

Hence the estimate of 7 rows returned.

It's interesting that five_uniform’s estimated ndistinct is low by > 50% (actual = 62958). Paradoxically, if I manually set ndistinct to the correct value of 62958, the estimate gets worse (3 rows instead of 7).

Suggestions for fixing this are of course welcome. :-)

On a related topic, are there any in depth guides to the planner that I could read? I can (and have) read the source code and it’s been informative, but something higher level than the source code would help.


You may already know this, but there's a bunch of documents up there: https://wiki.postgresql.org/wiki/Using_EXPLAIN

I'm also working on a project to better document this. I'm just at the beginning, writing it all, in english (which isn't my native language), so it takes time. I already have most of it in french in various documents/formats, but it takes time to go through all of these, summarize them, and translate them. Anyway, work in progress as they say. You can have a look at it there: https://pgplanner.readthedocs.io/en/latest/index.html. Any comment/help is very welcome.


--
Guillaume.

Re: Understanding bad estimate (related to FKs?)

From
Philip Semanchuk
Date:

> On Oct 31, 2020, at 9:53 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
>
> Hi,
>
> Le ven. 30 oct. 2020 à 15:57, Philip Semanchuk <philip@americanefficient.com> a écrit :
>
>
> > On Oct 29, 2020, at 6:48 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Thu, Oct 29, 2020 at 11:25:48AM -0400, Philip Semanchuk wrote:
> >>
> >>
> >>> On Oct 28, 2020, at 9:13 PM, Justin Pryzby <pryzby@telsasoft.com>
> >>> wrote:
> >>>
> >>> On Mon, Oct 26, 2020 at 11:20:01AM -0600, Michael Lewis wrote:
> >>>> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk
> >>>> <philip@americanefficient.com> wrote:
> >>>>
> >>>>>>> The item I'm focused on is node 23. The estimate is for 7 rows,
> >>>>>>> actual
> >>>>> is 896 (multiplied by 1062 loops). I'm confused about two things in
> >>>>> this node.
> >>>>>>>
> >>>>>>> The first is Postgres' estimate. The condition for this index
> >>>>>>> scan
> >>>>> contains three expressions --
> >>>>>>>
> >>>>>>> (five_uniform = zulu_five.five_uniform) AND (whiskey_mike =
> >>>>>>> juliet_india.whiskey_mike) AND (bravo = 'mike'::text)
> >>>>>
> >>>>
> >>>> Are the columns correlated?
> >>>
> >>> I guess it shouldn't matter, since the FKs should remove all but one
> >>> of the conditions.
> >>
> >> Yes, I had the same expectation. I thought Postgres would calculate the
> >> selectivity as 1.0 * 1.0 * whatever estimate it has for the frequency
> >> of ‘mike’, but since the frequency estimate is very accurate but the
> >> planner’s estimate is not, there’s something else going on.
> >>
> >
> > Well, this is quite a bit more complicated, I'm afraid :-( The clauses
> > include parameters passed from the nodes above the index scan. So even
> > if we had extended stats on the table, we couldn't use them as that
> > requires (Var op Const) conditions. So this likely ends up with a
> > product of estimates for each clause, and even then we can't use any
> > particular value so we probably end up with something like 1/ndistinct
> > or something like that. So if the values actually passed to the index
> > scan are more common and/or if the columns are somehow correlated, it's
> > not surprising we end up with an overestimate.
>
> I appreciate the insight. 1/ndistinct is exactly right. In pg_stats, five_uniform’s ndistinct = 26326, and
whiskey_mike’sndistinct = 3. The estimated frequency of bravo = ‘mike’ is .02228. There are 25156157 rows in the source
table,so we have:  
>
> 25156157 * (1/26326.0) * (1/3.0) * .02228 = 7.0966494209
>
> Hence the estimate of 7 rows returned.
>
> It's interesting that five_uniform’s estimated ndistinct is low by > 50% (actual = 62958). Paradoxically, if I
manuallyset ndistinct to the correct value of 62958, the estimate gets worse (3 rows instead of 7).  
>
> Suggestions for fixing this are of course welcome. :-)
>
> On a related topic, are there any in depth guides to the planner that I could read? I can (and have) read the source
codeand it’s been informative, but something higher level than the source code would help. 
>
>
> You may already know this, but there's a bunch of documents up there: https://wiki.postgresql.org/wiki/Using_EXPLAIN


Bien merci, yes, I've visited most of those links and learned an enormous amount from them. I've downloaded many of
themfor re-reading, including yours. :-) It's helpful to be reminded of them again. 

EXPLAIN ANALYZE tells me what choices the planner made, but it doesn't tell me why the planner made those choices. For
instance,Tomas Vondra's post enabled me to calculate how the planner arrived at its estimate of 7 rows for one node of
myquery. I would prefer not to reverse engineer the planner's calculation, but instead have the planner just tell me.  

If I was able to combine that information with a summary of the planner's algorithm (a lot to ask for!), then I could
understandhow the planner chose its plan. 

The query I asked about in the original post of this thread has 13 relations in it. IIUC, that's 13! or > 6 billion
possibleplans. How did the planner pick one plan out of 6 billion? I'm curious, both for practical purposes (I want my
queryto run well) and also because it's fascinating. 

Cheers
Philip




Re: Understanding bad estimate (related to FKs?)

From
Tom Lane
Date:
Philip Semanchuk <philip@americanefficient.com> writes:
> The query I asked about in the original post of this thread has 13 relations in it. IIUC, that's 13! or > 6 billion
possibleplans. How did the planner pick one plan out of 6 billion? I'm curious, both for practical purposes (I want my
queryto run well) and also because it's fascinating. 

The twenty-thousand-foot overview is

https://www.postgresql.org/docs/devel/planner-optimizer.html

and then ten-thousand-foot level is the planner README file,

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/optimizer/README;hb=HEAD

and then you pretty much gotta start reading code.  You could also dig
into various planner expository talks that people have given at PG
conferences.  I don't have links at hand, but there have been several.

            regards, tom lane



Re: Understanding bad estimate (related to FKs?)

From
Guillaume Lelarge
Date:
Hi,

Le lun. 2 nov. 2020 à 20:09, Philip Semanchuk <philip@americanefficient.com> a écrit :


> On Oct 31, 2020, at 9:53 AM, Guillaume Lelarge <guillaume@lelarge.info> wrote:
>
> Hi,
>
> Le ven. 30 oct. 2020 à 15:57, Philip Semanchuk <philip@americanefficient.com> a écrit :
>
>
> > On Oct 29, 2020, at 6:48 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Thu, Oct 29, 2020 at 11:25:48AM -0400, Philip Semanchuk wrote:
> >>
> >>
> >>> On Oct 28, 2020, at 9:13 PM, Justin Pryzby <pryzby@telsasoft.com>
> >>> wrote:
> >>>
> >>> On Mon, Oct 26, 2020 at 11:20:01AM -0600, Michael Lewis wrote:
> >>>> On Mon, Oct 26, 2020 at 11:14 AM Philip Semanchuk
> >>>> <philip@americanefficient.com> wrote:
> >>>>
> >>>>>>> The item I'm focused on is node 23. The estimate is for 7 rows,
> >>>>>>> actual
> >>>>> is 896 (multiplied by 1062 loops). I'm confused about two things in
> >>>>> this node.
> >>>>>>>
> >>>>>>> The first is Postgres' estimate. The condition for this index
> >>>>>>> scan
> >>>>> contains three expressions --
> >>>>>>>
> >>>>>>> (five_uniform = zulu_five.five_uniform) AND (whiskey_mike =
> >>>>>>> juliet_india.whiskey_mike) AND (bravo = 'mike'::text)
> >>>>>
> >>>>
> >>>> Are the columns correlated?
> >>>
> >>> I guess it shouldn't matter, since the FKs should remove all but one
> >>> of the conditions.
> >>
> >> Yes, I had the same expectation. I thought Postgres would calculate the
> >> selectivity as 1.0 * 1.0 * whatever estimate it has for the frequency
> >> of ‘mike’, but since the frequency estimate is very accurate but the
> >> planner’s estimate is not, there’s something else going on.
> >>
> >
> > Well, this is quite a bit more complicated, I'm afraid :-( The clauses
> > include parameters passed from the nodes above the index scan. So even
> > if we had extended stats on the table, we couldn't use them as that
> > requires (Var op Const) conditions. So this likely ends up with a
> > product of estimates for each clause, and even then we can't use any
> > particular value so we probably end up with something like 1/ndistinct
> > or something like that. So if the values actually passed to the index
> > scan are more common and/or if the columns are somehow correlated, it's
> > not surprising we end up with an overestimate.
>
> I appreciate the insight. 1/ndistinct is exactly right. In pg_stats, five_uniform’s ndistinct = 26326, and whiskey_mike’s ndistinct = 3. The estimated frequency of bravo = ‘mike’ is .02228. There are 25156157 rows in the source table, so we have:
>
> 25156157 * (1/26326.0) * (1/3.0) * .02228 = 7.0966494209
>
> Hence the estimate of 7 rows returned.
>
> It's interesting that five_uniform’s estimated ndistinct is low by > 50% (actual = 62958). Paradoxically, if I manually set ndistinct to the correct value of 62958, the estimate gets worse (3 rows instead of 7).
>
> Suggestions for fixing this are of course welcome. :-)
>
> On a related topic, are there any in depth guides to the planner that I could read? I can (and have) read the source code and it’s been informative, but something higher level than the source code would help.
>
>
> You may already know this, but there's a bunch of documents up there: https://wiki.postgresql.org/wiki/Using_EXPLAIN


Bien merci, yes, I've visited most of those links and learned an enormous amount from them. I've downloaded many of them for re-reading, including yours. :-) It's helpful to be reminded of them again.

EXPLAIN ANALYZE tells me what choices the planner made, but it doesn't tell me why the planner made those choices. For instance, Tomas Vondra's post enabled me to calculate how the planner arrived at its estimate of 7 rows for one node of my query. I would prefer not to reverse engineer the planner's calculation, but instead have the planner just tell me.

If I was able to combine that information with a summary of the planner's algorithm (a lot to ask for!), then I could understand how the planner chose its plan.

The query I asked about in the original post of this thread has 13 relations in it. IIUC, that's 13! or > 6 billion possible plans. How did the planner pick one plan out of 6 billion? I'm curious, both for practical purposes (I want my query to run well) and also because it's fascinating.


I understand, and I mostly agree, especially on the fascinating side of the planner. 

Anyway, that's what I'm working on right now. It will take a lot of time, and it will probably contain a lot of errors at the beginning, but I'll be happy to fix them.


--
Guillaume.

Re: Understanding bad estimate (related to FKs?)

From
Michael Lewis
Date:
The query I asked about in the original post of this thread has 13 relations in it. IIUC, that's 13! or > 6 billion possible plans. How did the planner pick one plan out of 6 billion? I'm curious, both for practical purposes (I want my query to run well) and also because it's fascinating.

Have you increased geqo_threshold and join_collapse_limit from the defaults?

Re: Understanding bad estimate (related to FKs?)

From
Tomas Vondra
Date:
On Mon, Nov 02, 2020 at 03:08:12PM -0500, Tom Lane wrote:
>Philip Semanchuk <philip@americanefficient.com> writes:
>> The query I asked about in the original post of this thread has 13 relations in it. IIUC, that's 13! or > 6 billion
possibleplans. How did the planner pick one plan out of 6 billion? I'm curious, both for practical purposes (I want my
queryto run well) and also because it's fascinating.
 
>
>The twenty-thousand-foot overview is
>
>https://www.postgresql.org/docs/devel/planner-optimizer.html
>
>and then ten-thousand-foot level is the planner README file,
>
>https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/optimizer/README;hb=HEAD
>
>and then you pretty much gotta start reading code.  You could also dig
>into various planner expository talks that people have given at PG
>conferences.  I don't have links at hand, but there have been several.
>

Yeah. The jump from high-level overviews to reading source code is a bit
brutal, though ...


FWIW a short list of relevant talks I'm aware of & would recommend:

* Explaining the Postgres Query Optimizer [Bruce Momjian]
   https://www.postgresql.org/files/developer/tour.pdf

* Intro to Postgres Planner Hacking [Melanie Plageman]
   https://www.pgcon.org/2019/schedule/events/1379.en.html

* Learning to Hack on Postgres Planner [Melanie Plageman]
   https://www.pgcon.org/2019/schedule/attachments/540_debugging_planner_pgcon2019_v4.pdf

* What’s in a Plan? [Robert Haas]
   https://www.postgresql.eu/events/pgconfeu2019/schedule/session/2741-whats-in-a-plan/
   
* A Tour of PostgreSQL Internals [Tom Lane]
   https://www.postgresql.org/files/developer/tour.pdf

* Inside thePostgreSQL Query Optimizer [Neil Conway]
   http://www.neilconway.org/talks/optimizer/optimizer.pdf

Some are a bit dated, but the overall principles don't change much.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: Understanding bad estimate (related to FKs?)

From
Philip Semanchuk
Date:

> On Nov 2, 2020, at 6:09 PM, Michael Lewis <mlewis@entrata.com> wrote:
>
> The query I asked about in the original post of this thread has 13 relations in it. IIUC, that's 13! or > 6 billion
possibleplans. How did the planner pick one plan out of 6 billion? I'm curious, both for practical purposes (I want my
queryto run well) and also because it's fascinating. 
>
> Have you increased geqo_threshold and join_collapse_limit from the defaults?


Yes, thanks you, I should have said that. We avoid the GEQO, so geqo_threshold=25, and
join_collapse_limit=from_collapse_limit=24.We tend to have long running queries, so we’re happy to pay a few seconds of
extra planner cost to increase the likelihood of getting a better plan. 

Cheers
Philip


Re: Understanding bad estimate (related to FKs?)

From
Philip Semanchuk
Date:

> On Nov 2, 2020, at 10:17 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Mon, Nov 02, 2020 at 03:08:12PM -0500, Tom Lane wrote:
>> Philip Semanchuk <philip@americanefficient.com> writes:
>>> The query I asked about in the original post of this thread has 13 relations in it. IIUC, that's 13! or > 6 billion
possibleplans. How did the planner pick one plan out of 6 billion? I'm curious, both for practical purposes (I want my
queryto run well) and also because it's fascinating. 
>>
>> The twenty-thousand-foot overview is
>>
>> https://www.postgresql.org/docs/devel/planner-optimizer.html
>>
>> and then ten-thousand-foot level is the planner README file,
>>
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/optimizer/README;hb=HEAD
>>
>> and then you pretty much gotta start reading code.  You could also dig
>> into various planner expository talks that people have given at PG
>> conferences.  I don't have links at hand, but there have been several.
>>
>
> Yeah. The jump from high-level overviews to reading source code is a bit
> brutal, though ...
>
>
> FWIW a short list of relevant talks I'm aware of & would recommend:
>
> * Explaining the Postgres Query Optimizer [Bruce Momjian]
>  https://www.postgresql.org/files/developer/tour.pdf
>
> * Intro to Postgres Planner Hacking [Melanie Plageman]
>  https://www.pgcon.org/2019/schedule/events/1379.en.html
>
> * Learning to Hack on Postgres Planner [Melanie Plageman]
>  https://www.pgcon.org/2019/schedule/attachments/540_debugging_planner_pgcon2019_v4.pdf
>
> * What’s in a Plan? [Robert Haas]
>  https://www.postgresql.eu/events/pgconfeu2019/schedule/session/2741-whats-in-a-plan/
>  * A Tour of PostgreSQL Internals [Tom Lane]
>  https://www.postgresql.org/files/developer/tour.pdf
>
> * Inside thePostgreSQL Query Optimizer [Neil Conway]
>  http://www.neilconway.org/talks/optimizer/optimizer.pdf
>
> Some are a bit dated, but the overall principles don't change much.


Thank you so much to Tomas V, Tom L, Guillaume, Justin, and Michael for all the suggestions and direction. I really
appreciateyour time & wisdom (not to mention your contributions to Postgres!) 

Cheers
Philip