Home > mailing lists

Re: match_unsorted_outer() vs. cost_nestloop() - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: match_unsorted_outer() vs. cost_nestloop()
Date	September 6, 2009 02:57:17
Msg-id	603c8f070909051929ubdebb27nfde1ea02453c3b79@mail.gmail.com Whole thread Raw
In response to	Re: match_unsorted_outer() vs. cost_nestloop() (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: match_unsorted_outer() vs. cost_nestloop()
List	pgsql-hackers

Tree view

On Sat, Sep 5, 2009 at 8:39 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> It might be sufficient to have cost_nestloop just hardwire the knowledge
>> that certain inner path types have a different behavior here --- that
>> is, for a rescan there is zero start cost and some very low per-tuple
>> cost, independent of the path's nominal cost values (which would now
>> be defined as always the costs for the first scan).  And maybe the same
>> in cost_mergejoin.  Offhand I don't think anyplace else really needs to
>> think about rescan costs.
>
> After thinking about that a bit more, I think the best way might be
> to create a "cost_rescan" function that is given a Path and returns
> the startup cost and total cost to be assumed for a rescan of this Path.
> It would know about the special behavior of MaterialPath and the other
> tuplestore-using plan types, and for everything else would just return
> the path's regular costs.
>
> Alternatively we could create a cost_foo_rescan() function paralleling
> each cost_foo() function, but given the small number of distinct
> behaviors I think that would be fairly redundant and hard to maintain.

That sounds very reasonable.

There's another sort of interesting point here, too, relating to
dealing with imperfect statistics.  When costing a nest-join, the
number of times we expect to rescan the inner side is equal to the
estimated number of outer rows.  If that estimate is > 1,
materialization will often look like a winner, provided that it's not
estimated to overflow work_mem, because the we'll say, oh, look all
these rescans get much cheaper and all it costs us is a little
bookkeeping overhead.  But if the estimate is <= 1, materialization
will certainly look like a loser, because there are no rescans to make
cheaper and we still have the bookkeeping overhead.

But let's say the estimate is off.  If the real number of outer rows
turns out to be 0, then materialization costs nothing, because we'll
never evaluate the inner side at all.  And if it turns out to be 2 or
more, then materialization probably wins, as long as it doesn't spill
to disk.  Only if the number of outer rows turns out to be exactly 1
does materialization lose - and even then it's pretty cheap, unless,
of course, it spills to disk.  So maybe we should consider FORCING
materialization to be used in certain cases, something like this:

1. If the inner path is something that's relatively cheap to rescan
(already materialized or an index scan), then try only the
non-materialized path.  This is what we already do now.

2. Otherwise, estimate the memory cost of materializing the inner
side.  If it seems that it will fit within work_mem, then try the
materialized path ONLY, on the assumption that there's far more upside
than downside.

3. Otherwise, try both plans (?).  Likely both of them are going to be
pretty bad...

It's also worth noting that the comment here doesn't really match the
code.  "If the cheapest inner path is a join or seqscan, we should
consider materializing it."  But that's not what the code actually
tests - instead, it checks that the cheapest inner path is NOT an
index scan, bitmap heap scan, tidscan, material path, function scan,
CTE scan, or worktable scan.  Maybe that works out to the same thing,
but it's not obvious.

And, by the way, is the algorithm proposed in the comment sensible
anyway?  Under what circumstances would it make sense to materialize a
sequential scan?  It seems to me that if the relation is small enough
to fit in memory, then materializing it won't save much - if it's not,
then materialization is going to be slow, too.  I guess if you have
work_mem very large relative to shared buffers, then you might get the
planner to think that it can cache it with materialization but not
without materialization, but I don't that the real world can ever work
that way.  I don't think I've ever seen the planner choose to
materialize a seqscan, but it might become much more common with this
change if we're not careful, because while cost_index() contains some
logic to take into effect the caching effects associated with
rescanning and index, cost_seqscan() does not.

...Robert

pgsql-hackers by date:

From: Tom Lane
Date: 06 September 2009, 00:39:28
Subject: Re: match_unsorted_outer() vs. cost_nestloop()

From: Robert Haas
Date: 06 September 2009, 03:16:58
Subject: updated join removal patch

Re: match_unsorted_outer() vs. cost_nestloop() - Mailing list pgsql-hackers

Previous

Next