match_unsorted_outer() vs. cost_nestloop() - Mailing list pgsql-hackers

From Robert Haas
Subject match_unsorted_outer() vs. cost_nestloop()
Date
Msg-id 603c8f070909041802p18ed2fb1v91245ccfb5c2a24a@mail.gmail.com
Whole thread Raw
Responses Re: match_unsorted_outer() vs. cost_nestloop()
List pgsql-hackers
In joinpath.c, match_unsorted_outer() considers materializing the
inner side of each nested loop if the inner path is not an index scan,
bitmap heap scan, tid scan, material path, function scan, CTE scan, or
worktable scan.  In costsize.c, cost_nestloop() charges the startup
cost only once if the inner path is a hash path or material path;
otherwise, it charges it for every anticipated rescan.

It seems to me, perhaps naively, like the criteria used in these two
places are more different than they maybe should be.  For example,
function scan nodes insert their results into a tuplestore so that
rescans get the same set of tuples, which is why we don't consider
inserting a materialize node over them in match_unsorted_outer() - but
I think that also means that we oughtn't to be counting the startup
cost for every rescan.

I'm not exactly sure which ones should match or not match.  Hash
paths, maybe, shouldn't.  I believe the reason why we don't count the
startup cost of the hash path over again is because we're assuming
that it's attributable to the cost of building the hash table, which
only needs to be done once.  I don't think that's 100% accurate
because the hash path could have inherited some of that cost from its
underlying paths.  At any rate, it's conceivable that materializing
could be enough cheaper than repeating the join that a materialize
nodes makes sense.

Thoughts?

...Robert


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Non-Solaris dtrace support is disabled in 8.4!!!?
Next
From: Robert Haas
Date:
Subject: Re: Eliminating VACUUM FULL WAS: remove flatfiles.c