Re: [HACKERS] 200 = 199 + 1? - Mailing list pgsql-hackers

From Marko Tiikkaja
Subject Re: [HACKERS] 200 = 199 + 1?
Date
Msg-id CAL9smLD9x1tccjtrLXvytH-LXzot-pctPKXgsH75vyyj4RLDoQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] 200 = 199 + 1?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] 200 = 199 + 1?
List pgsql-hackers
On Wed, Sep 27, 2017 at 5:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marko Tiikkaja <marko@joh.to> writes:
> I wonder if the nested loop shouldn't have some kind of a cap on its own
> estimate if it's wildly off of what you'd get by multiplying the child
> nodes' estimates with each other?

Nonstarter I'm afraid.  The join relation's size estimate is determined
long before we get to a point where we could multiply the sizes of these
particular child paths to arrive at the conclusion that it should be
something different than what we decided originally.

Ah hah.  Thanks for the explanation, that makes sense.
 
Adjusting the size
of the nestloop result at that point would merely give it an unfair
advantage over other paths for the same join relation.  (I think it would
also break some assumptions about paths for the same relation all giving
the same number of rows, unless parameterized.)

With the previous paragraph in mind, I would agree; it's not a very good idea.
 
Looking at it another way, the main thing that the combination of hashagg
outer path + indexscan inner path knows that eqjoinsel_semi didn't account
for is that there's a unique index on foo.id.  But that info is available
to eqjoinsel_semi, in the sense that it's been given a nondefault estimate
that nd1 is equal to the outer relation size.  So the mistake that it's
making is to throw up its hands and use an 0.5 selectivity estimate just
because it has no info about the inner relation.  I think if we'd pushed
through the nd2/nd1 calculation after setting nd2 = size of inner rel,
we'd end up with an estimate matching the product of these path sizes.
(Caution: inadequate caffeine absorbed yet, this might be all wrong.)

This sounds very reasonable to me.


.m

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Binary search in fmgr_isbuiltin() is a bottleneck.
Next
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] proposal - Default namespaces for XPath expressions(PostgreSQL 11)