Re: match_unsorted_outer() vs. cost_nestloop() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: match_unsorted_outer() vs. cost_nestloop()
Date
Msg-id 10464.1252196359@sss.pgh.pa.us
Whole thread Raw
In response to Re: match_unsorted_outer() vs. cost_nestloop()  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: match_unsorted_outer() vs. cost_nestloop()
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> I guess my point is that for node types that dump their output into a
> tuplestore anyway, it doesn't seem like cost_nestloop() should charge
> n * the startup cost.  I believe that at least function, CTE, and
> worktable scans fall into this category.  No?

Yeah, probably.  The comment is correct as is:
    * their sum.  What's not so clear is whether the inner path's    * startup_cost must be paid again on each rescan
ofthe inner path. This    * is not true if the inner path is materialized or is a hashjoin, but    * probably is true
otherwise.

What's not correct is the code's expansion of "is materialized" as
"is a MaterialPath".  However, I'm not sure it's worth just adding
these other tuplestore-using types to the list.  We really ought
to think a bit harder about representing the difference between
initial scan cost and rescan cost.

It might be sufficient to have cost_nestloop just hardwire the knowledge
that certain inner path types have a different behavior here --- that
is, for a rescan there is zero start cost and some very low per-tuple
cost, independent of the path's nominal cost values (which would now
be defined as always the costs for the first scan).  And maybe the same
in cost_mergejoin.  Offhand I don't think anyplace else really needs to
think about rescan costs.

I think this would be enough to deal with the issue for those plan types
that materialize their output, because they all have about the same
runtime behavior in this regard.  What gets more exciting is if you'd
like to model other effects this way --- for example, the one that
rescanning an indexscan is probably a lot cheaper than the original
fetch because of caching effects.  But we already have that sort of
thing accounted for (to some extent anyway) elsewhere, so I think we
can probably ignore it here.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Tightening binary receive functions
Next
From: Tom Lane
Date:
Subject: Re: match_unsorted_outer() vs. cost_nestloop()