Re: nested loop semijoin estimates - Mailing list pgsql-hackers

From Tom Lane
Subject Re: nested loop semijoin estimates
Date
Msg-id 3847.1433274898@sss.pgh.pa.us
Whole thread Raw
In response to Re: nested loop semijoin estimates  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: nested loop semijoin estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> So, if required_outer=false and both p->_startup=false, we get 
> consider_startup=false irrespectedly of the required_outer value, so
>      (!consider_startupe) != required_outer
> so that the conditions are not equivalent. And indeed, by reverting the 
> if condition to the previous form, we get the same plans as on master.

Ah, I see we arrived at the same conclusions independently.

> I don't know whether this is correct behavior or not, but in all three 
> cases I've observed on TPC-DS, the new plans performed better

Hm.  In principle, add_path_precheck shouldn't be doing anything except
rejecting paths that would get rejected anyway later on.  However it
strikes me that there's a logic error in it; specifically, this
simplifying assumption is wrong:
        * For speed, we make exact rather than fuzzy cost comparisons. If an        * old path dominates the new path
exactlyon both costs, it will        * surely do so fuzzily.
 

The old path's cost might be fuzzily the same, not fuzzily better.  This
is less likely to be an issue for total cost (since even if they're
fuzzily the same at this point, they probably won't be after we get done
adding more components to the new path's cost estimate).  But it could
well be an issue for startup cost since that's often exactly zero.  So
I guess this must be rejecting or letting through a slightly different set
of non-parameterized paths than it did before.  I'm not prepared to assume
the results are always superior, because we haven't fixed the actual logic
oversight.

> 1) Shouldn't the CONSIDER_PATH_STARTUP_COST(p) be defined rather like
>     this? I mean, if it's a parametric query, use any of the flags?

No.  The paths are parametric, not the query as a whole.

> 2) Is it really possible to get different values for the startup
>     flags on path1 and path2 in compare_path_costs_fuzzily()?

If we were comparing a parameterized and an unparameterized path,
it might be that one's startup cost is of interest while the other
one's isn't.  This is written to let a path win on startup cost
as long as its startup cost is actually of interest.

>     If not
>     (if I get it right, path1->parent == path2->parent anyway), then
>     maybe passing that as a function parameter (just like before)
>     would be better. The macro probably won't be as nice, but I find
>     it easier to understand.

Well, we could pass the parent RelOptInfo explicitly instead of passing
the two flags, but I don't find that to be an improvement particularly.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: nested loop semijoin estimates
Next
From: Andres Freund
Date:
Subject: Re: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1