Re: nested loop semijoin estimates - Mailing list pgsql-hackers

From Tom Lane
Subject Re: nested loop semijoin estimates
Date
Msg-id 9224.1433297422@sss.pgh.pa.us
Whole thread Raw
In response to Re: nested loop semijoin estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: nested loop semijoin estimates  (Robert Haas <robertmhaas@gmail.com>)
Re: nested loop semijoin estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 06/02/15 23:27, Tom Lane wrote:
>> Do we have instructions around here anyplace on how to set up/use
>>> TPC-DS? I couldn't find anything about it on the wiki ...

> Not that I'm aware of, but it's not really all that difficult.
> [ instructions... ]

Thanks.  I added some debugging printouts to trace the logic, and found
that once in awhile we get cases like this:

patch would change precheck result to fail for path cost 133752.12..136070.23 vs old path 135378.72..135550.22
phantom path causes rejection of old path with cost 135378.72..135550.22
phantom path has final cost 133752.12..136161.64, accepted

That is, we have an unparameterized path whose (initial estimate of) total
cost is just slightly more than the total cost of some old path, but whose
startup cost is less.  The existing logic in add_path_precheck allows
construction of this path to proceed.  If its finished total cost is still
within 1% of the old path's cost, it is able to supplant the old path on
the grounds that the total costs are fuzzily the same while its startup
cost is less.  The patch's change to add_path_precheck causes the startup
cost to be disregarded so that the new path is rejected immediately, and
the path replacement doesn't happen.

Now, what this demonstrates is that add_path_precheck is not operating as
intended --- as I said earlier, it's supposed to just save path
construction cycles, not change the outcome of any path replacement tests.
What we ought to do is fix it to do the cost comparisons fuzzily, and then
it should be possible for it to disregard startup cost when appropriate
without changing the results beyond that.  However making the comparisons
fuzzy will certainly result in some plan changes at the margin, because it
will now let through some paths that it should have let through and
didn't.  Some of those changes will be for the better and others for the
worse.  (I don't have any faith in your conclusion that the patch
as-posted always results in better plans.  There's no reason to believe
that sub-one-percent differences in planner cost estimates will reliably
predict real-world wins.)

What it seems like we should do, if we want to back-patch this, is apply
it without the add_path_precheck changes.  Then as an independent
HEAD-only patch, change add_path_precheck so that it's behaving as
designed.  It looks to me like that will save some planning time in any
case --- changing add_path_precheck to disregard startup cost when
appropriate seems to let it reject a lot more paths than it used to.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Next
From: Fujii Masao
Date:
Subject: Re: why does txid_current() assign new transaction-id?