Re: cost_rescan (was: match_unsorted_outer() vs. cost_nestloop()) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: cost_rescan (was: match_unsorted_outer() vs. cost_nestloop())
Date
Msg-id g2t603c8f071004181939h23f75b52u41a10297b36ef942@mail.gmail.com
Whole thread Raw
Responses Re: cost_rescan (was: match_unsorted_outer() vs. cost_nestloop())  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Sep 12, 2009 at 6:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sep 6, 2009, at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> ... But now that we have a plan for a less obviously broken costing
>>> approach, maybe we should open the floodgates and allow
>>> materialization
>>> to be considered for any inner path that doesn't materialize itself
>>> already
>
>> Maybe.  I think some experimentation will be required.  We also have
>> to be aware of effects on planning time; match_unsorted_outer() is,
>> AIR, a significant part of the CPU cost of planning large join problems.
>
> I've committed some changes pursuant to this discussion.  It may be that
> match_unsorted_outer gets a bit slower, but I'm not too worried about
> that.  My experience is that the code that tries different mergejoin
> options eats way more cycles than the nestloop code does.

One problem with the current implementation of cost_rescan() is that
it ignores caching effects.  It seems to be faster to rescan a
materialize node than it is to rescan a seqscan of a table, even if
there are no restriction clauses, presumably because you get to skip
tuple visibility checks and maybe some other overhead, too.  But
cost_rescan() thinks that rescanning the table will require rereading
the whole thing from disk, which isn't right either - it probably
ought to factor in effective_cache_size much as the estimates for
iterated index scans do.  I'm not sure how many real problems this is
going to create.

Another potential problem is that materializing a whole-table seqscan
to avoid repeating the tuple visibility checks may be a win in some
strict sense, but there are externalities: it's also going to use a
lot more memory/disk than just rescanning the table.

...Robert


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: master in standby mode croaks
Next
From: Nikhil Sontakke
Date:
Subject: CTAS not honoring NOT NULL, DEFAULT modifiers